万本电子书0元读

万本电子书0元读

顶部广告

Mastering Apache Storm电子书

售       价:¥

2人正在读 | 0人评论 9.8

作       者:Ankit Jain

出  版  社:Packt Publishing

出版时间:2017-08-16

字       数:30.5万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Master the intricacies of Apache Storm and develop real-time stream processing applications with ease About This Book ? Exploit the various real-time processing functionalities offered by Apache Storm such as parallelism, data partitioning, and more ? Integrate Storm with other Big Data technologies like Hadoop, HBase, and Apache Kafka ? An easy-to-understand guide to effortlessly create distributed applications with Storm Who This Book Is For If you are a Java developer who wants to enter into the world of real-time stream processing applications using Apache Storm, then this book is for you. No previous experience in Storm is required as this book starts from the basics. After finishing this book, you will be able to develop not-so-complex Storm applications. What You Will Learn ? Understand the core concepts of Apache Storm and real-time processing ? Follow the steps to deploy multiple nodes of Storm Cluster ? Create Trident topologies to support various message-processing semantics ? Make your cluster sharing effective using Storm scheduling ? Integrate Apache Storm with other Big Data technologies such as Hadoop, HBase, Kafka, and more ? Monitor the health of your Storm cluster In Detail Apache Storm is a real-time Big Data processing framework that processes large amounts of data reliably, guaranteeing that every message will be processed. Storm allows you to scale your data as it grows, making it an excellent platform to solve your big data problems. This extensive guide will help you understand right from the basics to the advanced topics of Storm. The book begins with a detailed introduction to real-time processing and where Storm fits in to solve these problems. You’ll get an understanding of deploying Storm on clusters by writing a basic Storm Hello World example. Next we’ll introduce you to Trident and you’ll get a clear understanding of how you can develop and deploy a trident topology. We cover topics such as monitoring, Storm Parallelism, scheduler and log processing, in a very easy to understand manner. You will also learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, Kafka, and Hadoop to realize the full potential of Storm. With real-world examples and clear explanations, this book will ensure you will have a thorough mastery of Apache Storm. You will be able to use this knowledge to develop efficient, distributed real-time applications to cater to your business needs. Style and approach This easy-to-follow guide is full of examples and real-world applications to help you get an in-depth understanding of Apache Storm. This book covers the basics thoroughly and also delves into the intermediate and slightly advanced concepts of application development with Apache Storm.
目录展开

Title Page

Copyright

Mastering Apache Storm

Credits

About the Author

About the Reviewers

www.PacktPub.com

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Real-Time Processing and Storm Introduction

Apache Storm

Features of Storm

Storm components

Nimbus

Supervisor nodes

The ZooKeeper cluster

The Storm data model

Definition of a Storm topology

Operation modes in Storm

Programming languages

Summary

Storm Deployment, Topology Development, and Topology Options

Storm prerequisites

Installing Java SDK 7

Deployment of the ZooKeeper cluster

Setting up the Storm cluster

Developing the hello world example

The different options of the Storm topology

Deactivate

Activate

Rebalance

Kill

Dynamic log level settings

Walkthrough of the Storm UI

Cluster Summary section

Nimbus Summary section

Supervisor Summary section

Nimbus Configuration section

Topology Summary section

Dynamic log level settings

Updating the log level from the Storm UI

Updating the log level from the Storm CLI

Summary

Storm Parallelism and Data Partitioning

Parallelism of a topology

Worker process

Executor

Task

Configure parallelism at the code level

Worker process, executor, and task distribution

Rebalance the parallelism of a topology

Rebalance the parallelism of a SampleStormClusterTopology topology

Different types of stream grouping in the Storm cluster

Shuffle grouping

Field grouping

All grouping

Global grouping

Direct grouping

Local or shuffle grouping

None grouping

Custom grouping

Guaranteed message processing

Tick tuple

Summary

Trident Introduction

Trident introduction

Understanding Trident's data model

Writing Trident functions, filters, and projections

Trident function

Trident filter

Trident projection

Trident repartitioning operations

Utilizing shuffle operation

Utilizing partitionBy operation

Utilizing global operation

Utilizing broadcast operation

Utilizing batchGlobal operation

Utilizing partition operation

Trident aggregator

partitionAggregate

aggregate

ReducerAggregator

Aggregator

CombinerAggregator

persistentAggregate

Aggregator chaining

Utilizing the groupBy operation

When to use Trident

Summary

Trident Topology and Uses

Trident groupBy operation

groupBy before partitionAggregate

groupBy before aggregate

Non-transactional topology

Trident hello world topology

Trident state

Distributed RPC

When to use Trident

Summary

Storm Scheduler

Introduction to Storm scheduler

Default scheduler

Isolation scheduler

Resource-aware scheduler

Component-level configuration

Memory usage example

CPU usage example

Worker-level configuration

Node-level configuration

Global component configuration

Custom scheduler

Configuration changes in the supervisor node

Configuration setting at component level

Writing a custom supervisor class

Converting component IDs to executors

Converting supervisors to slots

Registering a CustomScheduler class

Summary

Monitoring of Storm Cluster

Cluster statistics using the Nimbus thrift client

Fetching information with Nimbus thrift

Monitoring the Storm cluster using JMX

Monitoring the Storm cluster using Ganglia

Summary

Integration of Storm and Kafka

Introduction to Kafka

Kafka architecture

Producer

Replication

Consumer

Broker

Data retention

Installation of Kafka brokers

Setting up a single node Kafka cluster

Setting up a three node Kafka cluster

Multiple Kafka brokers on a single node

Share ZooKeeper between Storm and Kafka

Kafka producers and publishing data into Kafka

Kafka Storm integration

Deploy the Kafka topology on Storm cluster

Summary

Storm and Hadoop Integration

Introduction to Hadoop

Hadoop Common

Hadoop Distributed File System

Namenode

Datanode

HDFS client

Secondary namenode

YARN

ResourceManager (RM)

NodeManager (NM)

ApplicationMaster (AM)

Installation of Hadoop

Setting passwordless SSH

Getting the Hadoop bundle and setting up environment variables

Setting up HDFS

Setting up YARN

Write Storm topology to persist data into HDFS

Integration of Storm with Hadoop

Setting up Storm-YARN

Storm-Starter topologies on Storm-YARN

Summary

Storm Integration with Redis, Elasticsearch, and HBase

Integrating Storm with HBase

Integrating Storm with Redis

Integrating Storm with Elasticsearch

Integrating Storm with Esper

Summary

Apache Log Processing with Storm

Apache log processing elements

Producing Apache log in Kafka using Logstash

Installation of Logstash

What is Logstash?

Why are we using Logstash?

Installation of Logstash

Configuration of Logstash

Why are we using Kafka between Logstash and Storm?

Splitting the Apache log line

Identifying country, operating system type, and browser type from the log file

Calculate the search keyword

Persisting the process data

Kafka spout and define topology

Deploy topology

MySQL queries

Calculate the page hit from each country

Calculate the count for each browser

Calculate the count for each operating system

Summary

Twitter Tweet Collection and Machine Learning

Exploring machine learning

Twitter sentiment analysis

Using Kafka producer to store the tweets in a Kafka cluster

Kafka spout, sentiments bolt, and HDFS bolt

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部