当当云阅读 > 进口书 > 外文原版书 > 电脑/网络 > Storm: Distributed Real-time Computation Blueprints

| | 手机阅读

扫描下载当当云阅读App

Storm: Distributed Real-time Computation Blueprints电子书

售价：¥

1人正在读 | 0人评论

9.8

作者：P. Taylor Goetz

出版社：Packt Publishing

出版时间：2014-03-26

字数：227.0万

所属分类：进口书 > 外文原版书 > 电脑/网络

温馨提示：数字商品不支持退换货，不提供源文件，不支持导出打印

为你推荐

读书简介
目录
累计评论(0条)

读书简介
目录
累计评论(0条)

A blueprints book with 10 different projects built in 10 different chapters which demonstrate the various use cases of storm for both beginner and intermediate users, grounded in realworld example applications. Although the book focuses primarily on Java development with Storm, the patterns are more broadly applicable and the tips, techniques, and approaches described in the book apply to architects, developers, and operations. Additionally, the book should provoke and inspire applications of distributed computing to other industries and domains. Hadoop enthusiasts will also find this book a good introduction to Storm, providing a potential migration path from batch processing to the world of realtime analytics.

目录展开

Storm Blueprints: Patterns for Distributed Real-time Computation

Table of Contents

Storm Blueprints: Patterns for Distributed Real-time Computation

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Distributed Word Count

Introducing elements of a Storm topology – streams, spouts, and bolts

Streams

Spouts

Bolts

Introducing the word count topology data flow

Sentence spout

Introducing the split sentence bolt

Introducing the word count bolt

Introducing the report bolt

Implementing the word count topology

Setting up a development environment

Implementing the sentence spout

Implementing the split sentence bolt

Implementing the word count bolt

Implementing the report bolt

Implementing the word count topology

Introducing parallelism in Storm

WordCountTopology parallelism

Adding workers to a topology

Configuring executors and tasks

Understanding stream groupings

Guaranteed processing

Reliability in spouts

Reliability in bolts

Reliable word count

Summary

2. Configuring Storm Clusters

Introducing the anatomy of a Storm cluster

Understanding the nimbus daemon

Working with the supervisor daemon

Introducing Apache ZooKeeper

Working with Storm's DRPC server

Introducing the Storm UI

Introducing the Storm technology stack

Java and Clojure

Python

Installing Storm on Linux

Installing the base operating system

Installing Java

ZooKeeper installation

Storm installation

Running the Storm daemons

Configuring Storm

Mandatory settings

Optional settings

The Storm executable

Setting up the Storm executable on a workstation

The daemon commands

Nimbus

Supervisor

DRPC

The management commands

Jar

Kill

Deactivate

Activate

Rebalance

Remoteconfvalue

Local debug/development commands

REPL

Classpath

Localconfvalue

Submitting topologies to a Storm cluster

Automating the cluster configuration

A rapid introduction to Puppet

Puppet manifests

Puppet classes and modules

Puppet templates

Managing environments with Puppet Hiera

Introducing Hiera

Summary

3. Trident Topologies and Sensor Data

Examining our use case

Introducing Trident topologies

Introducing Trident spouts

Introducing Trident operations – filters and functions

Introducing Trident filters

Introducing Trident functions

Introducing Trident aggregators – Combiners and Reducers

CombinerAggregator

ReducerAggregator

Aggregator

Introducing the Trident state

The Repeat Transactional state

The Opaque state

Executing the topology

Summary

4. Real-time Trend Analysis

Use case

Architecture

The source application

The logback Kafka appender

Apache Kafka

Kafka spout

The XMPP server

Installing the required software

Installing Kafka

Installing OpenFire

Introducing the sample application

Sending log messages to Kafka

Introducing the log analysis topology

Kafka spout

The JSON project function

Calculating a moving average

Adding a sliding window

Implementing the moving average function

Filtering on thresholds

Sending notifications with XMPP

The final topology

Running the log analysis topology

Summary

5. Real-time Graph Analysis

Use case

Architecture

The Twitter client

Kafka spout

A titan-distributed graph database

A brief introduction to graph databases

Accessing the graph – the TinkerPop stack

Manipulating the graph with the Blueprints API

Manipulating the graph with the Gremlin shell

Software installation

Titan installation

Setting up Titan to use the Cassandra storage backend

Installing Cassandra

Starting Titan with the Cassandra backend

Graph data model

Connecting to the Twitter stream

Setting up the Twitter4J client

The OAuth configuration

The TwitterStreamConsumer class

The TwitterStatusListener class

Twitter graph topology

The JSONProjectFunction class

Implementing GraphState

GraphFactory

GraphTupleProcessor

GraphStateFactory

GraphState

GraphUpdater

Implementing GraphFactory

Implementing GraphTupleProcessor

Putting it all together – the TwitterGraphTopology class

The TwitterGraphTopology class

Querying the graph with Gremlin

Summary

6. Artificial Intelligence

Designing for our use case

Establishing the architecture

Examining the design challenges

Implementing the recursion

Accessing the function's return values

Immutable tuple field values

Upfront field declaration

Tuple acknowledgement in recursion

Output to multiple streams

Read-before-write

Solving the challenges

Implementing the architecture

The data model

Examining the recursive topology

The queue interaction

Functions and filters

Examining the Scoring Topology

Addressing read-before-write

Distributed locking

Retry when stale

Executing the topology

Enumerating the game tree

Distributed Remote Procedure Call (DRPC)

Remote deployment

Summary

7. Integrating Druid for Financial Analytics

Use case

Integrating a non-transactional system

The topology

The spout

The filter

The state design

Implementing the architecture

DruidState

Implementing the StormFirehose object

Implementing the partition status in ZooKeeper

Executing the implementation

Examining the analytics

Summary

8. Natural Language Processing

Motivating a Lambda architecture

Examining our use case

Realizing a Lambda architecture

Designing the topology for our use case

Implementing the design

TwitterSpout/TweetEmitter

Functions

TweetSplitterFunction

WordFrequencyFunction

PersistenceFunction

Examining the analytics

Batch processing / historical analysis

Hadoop

An overview of MapReduce

The Druid setup

HadoopDruidIndexer

Summary

9. Deploying Storm on Hadoop for Advertising Analysis

Examining the use case

Establishing the architecture

Examining HDFS

Examining YARN

Configuring the infrastructure

The Hadoop infrastructure

Configuring HDFS

Configuring the NameNode

Configuring the DataNode

Configuring YARN

Configuring the ResourceManager

Configuring the NodeManager

Deploying the analytics

Performing a batch analysis with the Pig infrastructure

Performing a real-time analysis with the Storm-YARN infrastructure

Performing the analytics

Executing the batch analysis

Executing real-time analysis

Deploying the topology

Executing the topology

Summary

10. Storm in the Cloud

Introducing Amazon Elastic Compute Cloud (EC2)

Setting up an AWS account

The AWS Management Console

Creating an SSH key pair

Launching an EC2 instance manually

Logging in to the EC2 instance

Introducing Apache Whirr

Installing Whirr

Configuring a Storm cluster with Whirr

Launching the cluster

Introducing Whirr Storm

Setting up Whirr Storm

Cluster configuration

Customizing Storm's configuration

Customizing firewall rules

Introducing Vagrant

Installing Vagrant

Launching your first virtual machine

The Vagrantfile and shared filesystem

Vagrant provisioning

Configuring multimachine clusters with Vagrant

Creating Storm-provisioning scripts

ZooKeeper

Storm

Supervisord

The Storm Vagrantfile

Launching the Storm cluster

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论，分享你的想法吧！

发表评论

买过这本书的人还买过

读了这本书的人还在读

支持设备

Hands-On MQTT Programming with Python ￥63.21

Gaston C. Hillar

￥63.21

Creating your MySQL Database: Practical Design Tips and Techniques ￥35.96

Marc Delisle

￥35.96

Building Web Applications with Python and Neo4j ￥63.21

Sumit Gupta

￥63.21

Mastering pandas for Finance ￥80.65

Michael Heydt

￥80.65

NumPy Essentials ￥54.49

Leo (Liang-Huan) Chin

￥54.49

Learning Cython Programming - Second Edition ￥63.21

Philip Herron

￥63.21

Learning Python Application Development ￥80.65

Ninad Sathaye

￥80.65

Learning ServiceNow ￥90.46

Tim Woodruff

￥90.46

Swift Essentials ￥90.46

Dr Alex Blewitt

￥90.46

Learning Python ￥90.46

Fabrizio Romano

￥90.46

更多同类图书 >