售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Real-Time Big Data Analytics
Table of Contents
Real-Time Big Data Analytics
Credits
About the Authors
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introducing the Big Data Technology Landscape and Analytics Platform
Big Data – a phenomenon
The Big Data dimensional paradigm
The Big Data ecosystem
The Big Data infrastructure
Components of the Big Data ecosystem
The Big Data analytics architecture
Building business solutions
Dataset processing
Solution implementation
Presentation
Distributed batch processing
Batch processing in distributed mode
Push code to data
Distributed databases (NoSQL)
Advantages of NoSQL databases
Choosing a NoSQL database
Real-time processing
The telecoms or cellular arena
Transportation and logistics
The connected vehicle
The financial sector
Summary
2. Getting Acquainted with Storm
An overview of Storm
The journey of Storm
Storm abstractions
Streams
Topology
Spouts
Bolts
Tasks
Workers
Storm architecture and its components
A Zookeeper cluster
A Storm cluster
How and when to use Storm
Storm internals
Storm parallelism
Storm internal message processing
Summary
3. Processing Data with Storm
Storm input sources
Meet Kafka
Getting to know more about Kafka
Other sources for input to Storm
A file as an input source
A socket as an input source
Kafka as an input source
Reliability of data processing
The concept of anchoring and reliability
The Storm acking framework
Storm simple patterns
Joins
Batching
Storm persistence
Storm's JDBC persistence framework
Summary
4. Introduction to Trident and Optimizing Storm Performance
Working with Trident
Transactions
Trident topology
Trident tuples
Trident spout
Trident operations
Merging and joining
Filter
Function
Aggregation
Grouping
State maintenance
Understanding LMAX
Memory and cache
Ring buffer – the heart of the disruptor
Producers
Consumers
Storm internode communication
ZeroMQ
Storm ZeroMQ configurations
Netty
Understanding the Storm UI
Storm UI landing page
Topology home page
Optimizing Storm performance
Summary
5. Getting Acquainted with Kinesis
Architectural overview of Kinesis
Benefits and use cases of Amazon Kinesis
High-level architecture
Components of Kinesis
Creating a Kinesis streaming service
Access to AWS Kinesis
Configuring the development environment
Creating Kinesis streams
Creating Kinesis stream producers
Creating Kinesis stream consumers
Generating and consuming crime alerts
Summary
6. Getting Acquainted with Spark
An overview of Spark
Batch data processing
Real-time data processing
Apache Spark – a one-stop solution
When to use Spark – practical use cases
The architecture of Spark
High-level architecture
Spark extensions/libraries
Spark packaging structure and core APIs
The Spark execution model – master-worker view
Resilient distributed datasets (RDD)
RDD – by definition
Fault tolerance
Storage
Persistence
Shuffling
Writing and executing our first Spark program
Hardware requirements
Installation of the basic software
Spark
Java
Scala
Eclipse
Configuring the Spark cluster
Coding a Spark job in Scala
Coding a Spark job in Java
Troubleshooting – tips and tricks
Port numbers used by Spark
Classpath issues – class not found exception
Other common exceptions
Summary
7. Programming with RDDs
Understanding Spark transformations and actions
RDD APIs
RDD transformation operations
RDD action operations
Programming Spark transformations and actions
Handling persistence in Spark
Summary
8. SQL Query Engine for Spark – Spark SQL
The architecture of Spark SQL
The emergence of Spark SQL
The components of Spark SQL
The DataFrame API
DataFrames and RDD
User-defined functions
DataFrames and SQL
The Catalyst optimizer
SQL and Hive contexts
Coding our first Spark SQL job
Coding a Spark SQL job in Scala
Coding a Spark SQL job in Java
Converting RDDs to DataFrames
Automated process
The manual process
Working with Parquet
Persisting Parquet data in HDFS
Partitioning and schema evolution or merging
Partitioning
Schema evolution/merging
Working with Hive tables
Performance tuning and best practices
Partitioning and parallelism
Serialization
Caching
Memory tuning
Summary
9. Analysis of Streaming Data Using Spark Streaming
High-level architecture
The components of Spark Streaming
The packaging structure of Spark Streaming
Spark Streaming APIs
Spark Streaming operations
Coding our first Spark Streaming job
Creating a stream producer
Writing our Spark Streaming job in Scala
Writing our Spark Streaming job in Java
Executing our Spark Streaming job
Querying streaming data in real time
The high-level architecture of our job
Coding the crime producer
Coding the stream consumer and transformer
Executing the SQL Streaming Crime Analyzer
Deployment and monitoring
Cluster managers for Spark Streaming
Executing Spark Streaming applications on Yarn
Executing Spark Streaming applications on Apache Mesos
Monitoring Spark Streaming applications
Summary
10. Introducing Lambda Architecture
What is Lambda Architecture
The need for Lambda Architecture
Layers/components of Lambda Architecture
The technology matrix for Lambda Architecture
Realization of Lambda Architecture
high-level architecture
Configuring Apache Cassandra and Spark
Coding the custom producer
Coding the real-time layer
Coding the batch layer
Coding the serving layer
Executing all the layers
Summary
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜