售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Fast Data Processing Systems with SMACK Stack
Fast Data Processing Systems with SMACK Stack
Credits
About the Author
About the Reviewers
www.PacktPub.com
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. An Introduction to SMACK
Modern data-processing challenges
The data-processing pipeline architecture
The NoETL manifesto
Lambda architecture
Hadoop
SMACK technologies
Apache Spark
Akka
Apache Cassandra
Apache Kafka
Apache Mesos
Changing the data center operations
From scale-up to scale-out
The open-source predominance
Data store diversification
Data gravity and data locality
DevOps rules
Data expert profiles
Data architects
Data engineers
Data analysts
Data scientists
Is SMACK for me?
Summary
2. The Model - Scala and Akka
The language - Scala
Kata 1 - The collections hierarchy
Sequence
Map
Set
Kata 2 - Choosing the right collection
Sequence
Map
Set
Kata 3 - Iterating with foreach
Kata 4 - Iterating with for
Kata 5 - Iterators
Kata 6 - Transforming with map
Kata 7 - Flattening
Kata 8 - Filtering
Kata 9 - Subsequences
Kata 10 - Splitting
Kata 11 - Extracting unique elements
Kata 12 - Merging
Kata 13 - Lazy views
Kata 14 - Sorting
Kata 15 - Streams
Kata 16 - Arrays
Kata 17 - ArrayBuffer
Kata 18 - Queues
Kata 19 - Stacks
Kata 20 - Ranges
The model - Akka
The Actor Model in a nutshell
Kata 21 - Actors
The actor system
Actor reference
Kata 22 - Actor communication
Kata 23 - Actor life cycle
Kata 24 - Starting actors
Kata 25 - Stopping actors
Kata 26 - Killing actors
Kata 27 - Shutting down the actor system
Kata 28 - Actor monitoring
Kata 29 - Looking up actors
Summary
3. The Engine - Apache Spark
Spark in single mode
Downloading Apache Spark
Testing Apache Spark
Spark core concepts
Resilient distributed datasets
Running Spark applications
Initializing the Spark context
Spark applications
Running programs
RDD operation
Transformations
Actions
Persistence (caching)
Spark in cluster mode
Runtime architecture
Driver
Dividing a program into tasks
Scheduling tasks on executors
Executor
Cluster manager
Program execution
Application deployment
Standalone cluster manager
Launching the standalone manager
Submitting our application
Configuring resources
Working in the cluster
Spark Streaming
Spark Streaming architecture
Transformations
Stateless transformations
Stateful transformations
Windowed operations
Update state by key
Output operations
Fault-tolerant Spark Streaming
Checkpointing
Spark Streaming performance
Parallelism level
Window size and batch size
Garbage collector
Summary
4. The Storage - Apache Cassandra
A bit of history
NoSQL
NoSQL or SQL?
CAP Brewer's theorem
Apache Cassandra installation
Data model
Data storage
Installation
DataStax OpsCenter
Creating a key space
Authentication and authorization (roles)
Setting up a simple authentication and authorization
Backup
Compression
Recovery
Restart node
Printing schema
Logs
Configuring log4j
Log file rotation
User activity log
Transaction log
SQL dump
CQL
CQL commands
DBMS Cluster
Deleting the database
CLI delete commands
CQL shell delete commands
DB and DBMS optimization
Bloom filter
Data cache
Java heap tune up
Java garbage collection tune up
Views, triggers, and stored procedures
Client-server architecture
Drivers
Spark-Cassandra connector
Installing the connector
Establishing the connection
Using the connector
Summary
5. The Broker - Apache Kafka
Introducing Kafka
Features of Apache Kafka
Born to be fast data
Use cases
Installation
Installing Java
Installing Kafka
Importing Kafka
Cluster
Single node - single broker cluster
Starting Zookeeper
Starting the broker
Creating a topic
Starting a producer
Starting a consumer
Single node - Multiple broker cluster
Starting the brokers
Creating a topic
Starting a producer
Starting a consumer
Multiple node - multiple broker cluster
Broker properties
Architecture
Segment files
Offset
Leaders
Groups
Log compaction
Kafka design
Message compression
Replication
Asynchronous replication
Synchronous replication
Producers
Producer API
Scala producers
Step 1: Import classes
Step 2: Define properties
Step 3: Build and send the message
Step 4: Create the topic
Step 5: Compile the producer
Step 6: Run the producer
Step 7: Run a consumer
Producers with custom partitioning
Step 1: Import classes
Step 2: Define properties
Step 3: Implement the partitioner class
Step 4: Build and send the message
Step 5: Create the topic
Step 6: Compile the programs
Step 7: Run the producer
Step 8: Run a consumer
Producer properties
Consumers
Consumer API
Simple Scala consumers
Step 1: Import classes
Step 2: Define properties
Step 3: Code the SimpleConsumer
Step 4: Create the topic
Step 5: Compile the program
Step 6: Run the producer
Step 7: Run the consumer
Multithread Scala consumers
Step 1: Import classes
Step 2: Define properties
Step 3: Code the MultiThreadConsumer
Step 4: Create the topic
Step 5: Compile the program
Step 6: Run the producer
Step 7: Run the consumer
Consumer properties
Integration
Integration with Apache Spark
Administration
Cluster tools
Adding servers
Kafka topic tools
Cluster mirroring
Summary
6. The Manager - Apache Mesos
The Apache Mesos architecture
Frameworks
Existing Mesos frameworks
Frameworks for long running applications
Frameworks for scheduling
Frameworks for storage
Attributes and resources
Attributes
Resources
The Apache Mesos API
Messages
The Executor API
Executor Driver API
The Scheduler API
The Scheduler Driver API
Resource allocation
The DRF algorithm
Weighted DRF algorithm
Resource configuration
Resource reservation
Static reservation
Defining roles
Assigning frameworks to roles
Setting policies
Dynamic reservation
The reserve operation
The unreserve operation
HTTP reserve
HTTP unreserve
Running a Mesos cluster on AWS
AWS instance types
AWS instances launching
Installing Mesos on AWS
Downloading Mesos
Building Mesos
Launching several instances
Running a Mesos cluster on a private data center
Mesos installation
Setting up the environment
Start the master
Start the slaves
Process automation
Common Mesos issues
Missing library dependencies
Directory permissions
Missing library
Debugging
Directory structure
Slaves not connecting with masters
Multiple slaves on the same machine
Scheduling and management frameworks
Marathon
Marathon installation
Installing Apache Zookeeper
Running Marathon in local mode
Multi-node Marathon installation
Running a test application from the web UI
Application scaling
Terminating the application
Chronos
Chronos installation
Job scheduling
Chronos and Marathon
Chronos REST API
Listing running jobs
Starting a job manually
Adding a job
Deleting a job
Deleting all the job tasks
Marathon REST API
Listing the running applications
Adding an application
Changing the application configuration
Deleting the application
Apache Aurora
Installing Aurora
Singularity
Singularity installation
The Singularity configuration file
Apache Spark on Apache Mesos
Submitting jobs in client mode
Submitting jobs in cluster mode
Advanced configuration
Apache Cassandra on Apache Mesos
Advanced configuration
Apache Kafka on Apache Mesos
Kafka log management
Summary
7. Study Case 1 - Spark and Cassandra
Spark Cassandra connector
Requisites
Preparing Cassandra
SparkContext setup
Cassandra and Spark Streaming
Spark Streaming setup
Cassandra setup
Streaming context creation
Stream creation
Kafka Streams
Akka Streams
Enabling Cassandra
Write the Stream to Cassandra
Read the Stream from Cassandra
Saving datasets to Cassandra
Saving a collection of tuples to Cassandra
Saving collections to Cassandra
Modifying collections
Saving objects of Cassandra (user defined types)
Scala options to Cassandra options conversion
Saving RDDs as new tables
Cluster deployment
Spark Cassandra use cases
Study case: The Calliope project
Installing Calliope
CQL3
Read from Cassandra with CQL3
Write to Cassandra with CQL3
Thrift
Read from Cassandra with Thrift
Write to Cassandra with Thrift
Calliope SQL context creation
Calliope SQL Configuration
Loading Cassandra tables programmatically
Summary
8. Study Case 2 - Connectors
Akka and Cassandra
Writing to Cassandra
Reading from Cassandra
Connecting to Cassandra
Scanning tweets
Testing the scanner
Akka and Spark
Kafka and Akka
Kafka and Cassandra
Summary
9. Study Case 3 - Mesos and Docker
Mesos frameworks API
Authentication, authorization, and access control
Framework authentication
Authentication configuration
Framework authorization
Access control lists
Spark Mesos run modes
Coarse-grained
Fine-grained
Apache Mesos API
Scheduler HTTP API
Requests
SUBSCRIBE
TEARDOWN
ACCEPT
DECLINE
REVIVE
KILL
SHUTDOWN
ACKNOWLEDGE
RECONCILE
MESSAGE
REQUEST
Responses
SUBSCRIBED
OFFERS
RESCIND
UPDATE
MESSAGE
FAILURE
ERROR
HEARTBEAT
Mesos containerizers
Containers
Docker containerizers
Containers and containerizers
Types of containerizers
Creating containerizers
Mesos containerizer
Launching Mesos containerizer
Architecture of Mesos containerizer
Shared filesystem
PID namespace
Posix disk
Docker containerizers
Docker containerizer setup
Launching the Docker containerizers
Composing containerizers
Summary
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜