万本电子书0元读

万本电子书0元读

顶部广告

Learning Apache Cassandra - Second Edition电子书

售       价:¥

1人正在读 | 0人评论 9.8

作       者:Sandeep Yarabarla

出  版  社:Packt Publishing

出版时间:2017-04-25

字       数:47.0万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Cassandra is a distributed database that stands out thanks to its robust feature set and intuitive interface, while providing high availability and scalability of a distributed data store. This book will introduce you to the rich feature set offered by Cassandra, and empower you to create and manage a highly scalable, performant and fault-tolerant database layer. The book starts by explaining the new features implemented in Cassandra 3.x and get you set up with Cassandra. Then you'll walk through data modeling in Cassandra and the rich feature set available to design a flexible schema. Next you'll learn to create tables with composite partition keys, collections and user-defined types and get to know different methods to avoid denormalization of data. You will then proceed to create user-defined functions and aggregates in Cassandra. Then, you will set up a multi node cluster and see how the dynamics of Cassandra change with it. Finally, you will implement some application-level optimizations using a Java client. By the end of this book, you'll be fully equipped to build powerful, scalable Cassandra database layers for your applications. What you will learn ?Install Cassandra ?Create keyspaces and tables with multiple clustering columns to organize related data ?Use secondary indexes and materialized views to avoid denormalization of data
目录展开

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Getting Up and Running with Cassandra

What is big data?

Challenges of modern applications

Why not relational databases?

How to handle big data

What is Cassandra and why Cassandra?

Horizontal scalability

High availability

Write optimization

Structured records

Secondary indexes

Materialized views

Efficient result ordering

Immediate consistency

Discretely writable collections

Relational joins

MapReduce and Spark

Rich and flexible data model

Lightweight transactions

Multidata center replication

Comparing Cassandra to the alternatives

Installing Cassandra

Installing the JDK

Installing on Debian-based systems (Ubuntu)

Installing on RHEL-based systems

Installing on Windows

Installing on Mac OS X

Installing the binary tarball

Bootstrapping the project

CQL—the Cassandra Query Language

Interacting with Cassandra

Getting started with CQL

Creating a keyspace

Selecting a keyspace

Creating a table

Inserting and reading data

New features in Cassandra 2.2, 3.0, and 3.X

Summary

The First Table

How to configure keyspaces

Creating the users table

Structuring of tables

Table and column options

The type system

Strings

Integers

Floating point and decimal numbers

Timestamp

UUIDs

Booleans

Blobs

Collections

Other data types

The purpose of types

Inserting data

Writing data does not yield feedback

Partial inserts

Selecting data

Missing rows

Selecting more than one row

Retrieving all the rows

Paginating through results

Inserts are always upserts

Developing a mental model for Cassandra

Summary

Organizing Related Data

A table for status updates

Creating a table with a compound primary key

The structure of the status updates table

UUIDs and timestamps

Working with status updates

Extracting timestamps

Looking up a specific status update

Automatically generating UUIDs

Anatomy of a compound primary key

Anatomy of a single-column primary key

Beyond two columns

Multiple clustering columns

Composite partition keys

Composite partition key table

Structure of composite partition key tables

Composite partition key with multiple clustering columns

Compound keys represent parent-child relationships

Coupling parents and children using static columns

Defining static columns

Working with static columns

Interacting only with the static columns

Static-only inserts

Static columns act like predefined joins

When to use static columns

Refining our mental model

Summary

Beyond Key-Value Lookup

Looking up rows by partition

The limits of the WHERE keyword

Restricting by clustering column

Restricting by part of a partition key

Retrieving status updates for a specific time range

Creating time UUID ranges

Selecting a slice of a partition

Paginating over rows in a partition

Counting rows

Reversing the order of rows

Reversing clustering order at query time

Reversing clustering order in the schema

Limitations of ORDER BY

ORDER BY summary

Paginating over multiple partitions

JSON support

INSERT JSON

SELECT JSON

Building an autocomplete function

Summary

Establishing Relationships

Modeling follow relationships

Outbound follows

Inbound follows

Storing follow relationships

Cassandra data modelling

Conceptual data model (entity relationship model)

Logical data model (query-driven design)

Physical data model

Denormalization

Looking up follow relationships

Unfollowing users

Using secondary indexes to avoid denormalization

The form of the single table

Adding a secondary index

Other uses of secondary indexes

Limitations of secondary indexes

Secondary indexes can only have one column

Secondary indexes can only be tested for equality

Secondary index lookup is not as efficient as primary key lookup

Materialized views

Adding a view

Summary

Denormalizing Data for Maximum Performance

A normalized approach

Generating the timeline

Ordering and pagination

Multiple partitions and read efficiency

Partial denormalization

Displaying the home timeline

Read performance and write complexity

Fully denormalizing the home timeline

Creating a status update

Displaying the home timeline

Write complexity and data integrity

Batching in Cassandra

Logged batches

Unlogged batches

When to use unlogged batches

Misuse of BATCH statements

Summary

Expanding Your Data Model

Viewing a keyspace schema

Viewing a table schema in cqlsh

Adding columns to tables

Deleting columns

Updating the existing rows

Updating multiple columns

Updating multiple rows

Removing a value from a column

Missing columns in Cassandra

Deleting specific columns

Syntactic sugar for deletion

Deleting table data (TRUNCATE)

Deleting table/keyspace with schema (DROP)

Inserts, updates, and upserts

Inserts can overwrite existing data

Checking before inserting isn't enough

Another advantage of UUIDs

Conditional inserts and lightweight transactions

Updates can create new rows

Optimistic locking with conditional updates

Optimistic locking in action

Optimistic locking and accidental updates

Lightweight transactions and their cost

When lightweight transactions aren't necessary

Summary

Collections, Tuples, and User-Defined Types

The problem with concurrent updates

Serializing the collection

Introducing concurrency

Collection columns and concurrent updates

Defining collection columns

Reading and writing sets

Advanced set manipulation

Removing values from a set

Sets and uniqueness

Collections and upserts

Using lists for ordered, non-unique values

Defining a list column

Writing a list

Discrete list manipulation

Writing data at a specific index

Removing elements from the list

Using maps to store key-value pairs

Writing a map

Updating discrete values in a map

Removing values from maps

Collections in inserts

Collections and secondary indexes

Secondary indexes on map columns

The limitations of collections

Reading discrete values from collections

Collection size limit

Reading a collection column from multiple rows

Unable to reuse collection names

Performance of collection operations

Working with tuples

Creating a tuple column

Writing to tuples

Indexing tuples

User-defined types

Creating a user-defined type

Assigning a user-defined type to a column

Adding data to a user-defined column

Indexing and querying user-defined types

Partial selection of user-defined types

Choosing between tuples and user-defined types

Nested collections

Nested tuples/UDTs

Comparing data structures

Summary

Aggregating Time-Series Data

Recording discrete analytics observations

Using discrete analytics observations

Slicing and dicing our data

Recording aggregate analytics observations

Answering the right question

Precomputation versus read-time aggregation

The many possibilities for aggregation

The role of discrete observations

Recording analytics observations

Updating a counter column

Counters and upserts

Setting and resetting counter columns

Counter columns and deletion

Counter columns need their own table

Cassandra configuration

Configuration location

Modifying configuration

Restarting Cassandra

User-defined functions

User-defined aggregate functions

Standard aggregate functions

Summary

How Cassandra Distributes Data

Data distribution in Cassandra

Cassandra's partitioning strategy - partition key tokens

Distributing partition tokens

Partitioners

Partition keys group data on the same node

Virtual nodes

Virtual nodes facilitate redistribution

Data replication in Cassandra

Masterless replication

Replication without a master

Gossip protocol

Multidata center cluster

Snitch

Replication strategy

Durable writes

Consistency

Immediate and eventual consistency

Consistency in Cassandra

The anatomy of a successful request

Tuning consistency

Eventual consistency with ONE

Immediate consistency with ALL

Fault-tolerant immediate consistency with QUORUM

Local consistency levels

Comparing consistency levels

Choosing the right consistency level

The CAP theorem

Handling conflicting data

Last-write-wins conflict resolution

Introspecting write timestamps

Overriding write timestamps

Distributed deletion

Stumbling on tombstones

Expiring columns with TTL

Table configuration options

Summary

Cassandra Multi-Node Cluster

3 - node cluster

Prerequisites

Tuning configuration options setting up a 3-node cluster

Tuning configuration

Cassandra.yaml

Cassandra-env.sh

Starting the 3-node cluster

Consistency in action

Write consistency

Consistency QUORUM

Consistency ANY

Cassandra internals

The write path

Compaction

The read path

Cassandra repair mechanisms

Hinted handoff

Read repair

Anti-entropy repair

Summary

Application Development Using the Java Driver

A simple query

Cluster API

Getting metadata

Querying

Prepared statements

QueryBuilder API

Building an INSERT statement

Building an UPDATE statement

Building a SELECT statement

Asynchronous querying

Execute asynchronously

Processing future results

Driver policies

Load-balancing policy

RoundRobinPolicy

DCAwareRoundRobinPolicy

TokenAwarePolicy

Retry Policy

Summary

Peeking under the Hood

Using cassandra-cli

The structure of a simple primary key table

Exploring cells

A model of column families: RowKey and cells

Compound primary keys in column families

A complete mapping

The wide row data structure

The empty cell

Collection columns in column families

Set columns in column families

Map columns in column families

List columns in column families

Appending and prepending values to lists

Other list operations

Summary

Authentication and Authorization

Enabling authentication and authorization

Authentication, authorization, and fault-tolerance

Authentication with cqlsh

Authentication in your application

Setting up a user

Changing a user's password

Viewing user accounts

Controlling access

Viewing permissions

Revoking access

Authorization in action

Authorization as a hedge against mistakes

Security beyond authentication and authorization

Security protects against vulnerabilities

Summary

Wrapping up

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部