售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Storm Blueprints: Patterns for Distributed Real-time Computation
Table of Contents
Storm Blueprints: Patterns for Distributed Real-time Computation
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Distributed Word Count
Introducing elements of a Storm topology – streams, spouts, and bolts
Streams
Spouts
Bolts
Introducing the word count topology data flow
Sentence spout
Introducing the split sentence bolt
Introducing the word count bolt
Introducing the report bolt
Implementing the word count topology
Setting up a development environment
Implementing the sentence spout
Implementing the split sentence bolt
Implementing the word count bolt
Implementing the report bolt
Implementing the word count topology
Introducing parallelism in Storm
WordCountTopology parallelism
Adding workers to a topology
Configuring executors and tasks
Understanding stream groupings
Guaranteed processing
Reliability in spouts
Reliability in bolts
Reliable word count
Summary
2. Configuring Storm Clusters
Introducing the anatomy of a Storm cluster
Understanding the nimbus daemon
Working with the supervisor daemon
Introducing Apache ZooKeeper
Working with Storm's DRPC server
Introducing the Storm UI
Introducing the Storm technology stack
Java and Clojure
Python
Installing Storm on Linux
Installing the base operating system
Installing Java
ZooKeeper installation
Storm installation
Running the Storm daemons
Configuring Storm
Mandatory settings
Optional settings
The Storm executable
Setting up the Storm executable on a workstation
The daemon commands
Nimbus
Supervisor
UI
DRPC
The management commands
Jar
Kill
Deactivate
Activate
Rebalance
Remoteconfvalue
Local debug/development commands
REPL
Classpath
Localconfvalue
Submitting topologies to a Storm cluster
Automating the cluster configuration
A rapid introduction to Puppet
Puppet manifests
Puppet classes and modules
Puppet templates
Managing environments with Puppet Hiera
Introducing Hiera
Summary
3. Trident Topologies and Sensor Data
Examining our use case
Introducing Trident topologies
Introducing Trident spouts
Introducing Trident operations – filters and functions
Introducing Trident filters
Introducing Trident functions
Introducing Trident aggregators – Combiners and Reducers
CombinerAggregator
ReducerAggregator
Aggregator
Introducing the Trident state
The Repeat Transactional state
The Opaque state
Executing the topology
Summary
4. Real-time Trend Analysis
Use case
Architecture
The source application
The logback Kafka appender
Apache Kafka
Kafka spout
The XMPP server
Installing the required software
Installing Kafka
Installing OpenFire
Introducing the sample application
Sending log messages to Kafka
Introducing the log analysis topology
Kafka spout
The JSON project function
Calculating a moving average
Adding a sliding window
Implementing the moving average function
Filtering on thresholds
Sending notifications with XMPP
The final topology
Running the log analysis topology
Summary
5. Real-time Graph Analysis
Use case
Architecture
The Twitter client
Kafka spout
A titan-distributed graph database
A brief introduction to graph databases
Accessing the graph – the TinkerPop stack
Manipulating the graph with the Blueprints API
Manipulating the graph with the Gremlin shell
Software installation
Titan installation
Setting up Titan to use the Cassandra storage backend
Installing Cassandra
Starting Titan with the Cassandra backend
Graph data model
Connecting to the Twitter stream
Setting up the Twitter4J client
The OAuth configuration
The TwitterStreamConsumer class
The TwitterStatusListener class
Twitter graph topology
The JSONProjectFunction class
Implementing GraphState
GraphFactory
GraphTupleProcessor
GraphStateFactory
GraphState
GraphUpdater
Implementing GraphFactory
Implementing GraphTupleProcessor
Putting it all together – the TwitterGraphTopology class
The TwitterGraphTopology class
Querying the graph with Gremlin
Summary
6. Artificial Intelligence
Designing for our use case
Establishing the architecture
Examining the design challenges
Implementing the recursion
Accessing the function's return values
Immutable tuple field values
Upfront field declaration
Tuple acknowledgement in recursion
Output to multiple streams
Read-before-write
Solving the challenges
Implementing the architecture
The data model
Examining the recursive topology
The queue interaction
Functions and filters
Examining the Scoring Topology
Addressing read-before-write
Distributed locking
Retry when stale
Executing the topology
Enumerating the game tree
Distributed Remote Procedure Call (DRPC)
Remote deployment
Summary
7. Integrating Druid for Financial Analytics
Use case
Integrating a non-transactional system
The topology
The spout
The filter
The state design
Implementing the architecture
DruidState
Implementing the StormFirehose object
Implementing the partition status in ZooKeeper
Executing the implementation
Examining the analytics
Summary
8. Natural Language Processing
Motivating a Lambda architecture
Examining our use case
Realizing a Lambda architecture
Designing the topology for our use case
Implementing the design
TwitterSpout/TweetEmitter
Functions
TweetSplitterFunction
WordFrequencyFunction
PersistenceFunction
Examining the analytics
Batch processing / historical analysis
Hadoop
An overview of MapReduce
The Druid setup
HadoopDruidIndexer
Summary
9. Deploying Storm on Hadoop for Advertising Analysis
Examining the use case
Establishing the architecture
Examining HDFS
Examining YARN
Configuring the infrastructure
The Hadoop infrastructure
Configuring HDFS
Configuring the NameNode
Configuring the DataNode
Configuring YARN
Configuring the ResourceManager
Configuring the NodeManager
Deploying the analytics
Performing a batch analysis with the Pig infrastructure
Performing a real-time analysis with the Storm-YARN infrastructure
Performing the analytics
Executing the batch analysis
Executing real-time analysis
Deploying the topology
Executing the topology
Summary
10. Storm in the Cloud
Introducing Amazon Elastic Compute Cloud (EC2)
Setting up an AWS account
The AWS Management Console
Creating an SSH key pair
Launching an EC2 instance manually
Logging in to the EC2 instance
Introducing Apache Whirr
Installing Whirr
Configuring a Storm cluster with Whirr
Launching the cluster
Introducing Whirr Storm
Setting up Whirr Storm
Cluster configuration
Customizing Storm's configuration
Customizing firewall rules
Introducing Vagrant
Installing Vagrant
Launching your first virtual machine
The Vagrantfile and shared filesystem
Vagrant provisioning
Configuring multimachine clusters with Vagrant
Creating Storm-provisioning scripts
ZooKeeper
Storm
Supervisord
The Storm Vagrantfile
Launching the Storm cluster
Summary
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜