万本电子书0元读

万本电子书0元读

顶部广告

Learning Apache Apex: Real-time streaming applications with Apex电子书

售       价:¥

0人正在读 | 0人评论 9.8

作       者:Thomas Weise, Munagala V. Ramanath, David Yan, Kenneth Knowles

出  版  社:Packt Publishing

出版时间:2017-11-30

字       数:39.3万

所属分类: 进口书 > 外文原版书 > 励志自助/心灵

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Designing and writing a real-time streaming publication with Apache Apex About This Book Get a clear, practical approach to real-time data processing Program Apache Apex streaming applications This book shows you Apex integration with the open source Big Data ecosystem Who This Book Is For This book assumes knowledge of application development with Java and familiarity with distributed systems. Familiarity with other real-time streaming frameworks is not required, but some practical experience with other big data processing utilities might be helpful. What You Will Learn Put together a functioning Apex application from scratch Scale an Apex application and configure it for optimal performance Understand how to deal with failures via the fault tolerance features of the platform Use Apex via other frameworks such as Beam Understand the DevOps implications of deploying Apex In Detail Apache Apex is a next-generation stream processing framework designed to operate on data at large scale, with minimum latency, maximum reliability, and strict correctness guarantees. Half of the book consists of Apex applications, showing you key aspects of data processing pipelines such as connectors for sources and sinks, and common data transformations. The other half of the book is evenly split into explaining the Apex framework, and tuning, testing, and scaling Apex applications. Much of our economic world depends on growing streams of data, such as social media feeds, financial records, data from mobile devices, sensors and machines (the Internet of Things - IoT). The projects in the book show how to process such streams to gain valuable, timely, and actionable insights. Traditional use cases, such as ETL, that currently consume a significant chunk of data engineering resources are also covered. The final chapter shows you future possibilities emerging in the streaming space, and how Apache Apex can contribute to it. Style and approach This book is divided into two major parts: first it explains what Apex is, what its relevant parts are, and how to write well-built Apex applications. The second part is entirely application-driven, walking you through Apex applications of increasing complexity.
目录展开

Title Page

Copyright

Learning Apache Apex

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Introduction to Apex

Unbounded data and continuous processing

Stream processing

Stream processing systems

What is Apex and why is it important?

Use cases and case studies

Real-time insights for Advertising Tech (PubMatic)

Industrial IoT applications (GE)

Real-time threat detection (Capital One)

Silver Spring Networks (SSN)

Application Model and API

Directed Acyclic Graph (DAG)

Apex DAG Java API

High-level Stream Java API

SQL

JSON

Windowing and time

Value proposition of Apex

Low latency and stateful processing

Native streaming versus micro-batch

Performance

Where Apex excels

Where Apex is not suitable

Summary

Getting Started with Application Development

Development process and methodology

Setting up the development environment

Creating a new Maven project

Application specifications

Custom operator development

The Apex operator model

CheckpointListener/CheckpointNotificationListener

ActivationListener

IdleTimeHandler

Application configuration

Testing in the IDE

Writing the integration test

Running the application on YARN

Execution layer components

Installing Apex Docker sandbox

Running the application

Working on the cluster

YARN web UI

Apex CLI

Logging

Dynamically adjusting logging levels

Summary

The Apex Library

An overview of the library

Integrations

Apache Kafka

Kafka input

Kafka output

Other streaming integrations

JMS (ActiveMQ, SQS, and so on)

Kinesis streams

Files

File input

File splitter and block reader

File writer

Databases

JDBC input

JDBC output

Other databases

Transformations

Parser

Filter

Enrichment

Map transform

Custom functions

Windowed transformations

Windowing

Global Window

Time Windows

Sliding Time Windows

Session Windows

Window propagation

State

Accumulation

Accumulation Mode

State storage

Watermarks

Allowed lateness

Triggering

Merging of streams

The windowing example

Dedup

Join

State Management

Summary

Scalability, Low Latency, and Performance

Partitioning and how it works

Elasticity

Partitioning toolkit

Configuring and triggering partitioning

StreamCodec

Unifier

Custom dynamic partitioning

Performance optimizations

Affinity and anti-affinity

Low-latency versus throughput

Sample application for dynamic partitioning

Performance – other aspects for custom operators

Summary

Fault Tolerance and Reliability

Distributed systems need to be resilient

Fault-tolerance components and mechanism in Apex

Checkpointing

When to checkpoint

How to checkpoint

What to checkpoint

Incremental state saving

Incremental recovery

Processing guarantees

Example – exactly-once counting

The exactly-once output to JDBC

Summary

Example Project – Real-Time Aggregation and Visualization

Streaming ETL and beyond

The application pattern in a real-world use case

Analyzing Twitter feed

Top Hashtags

TweetStats

Running the application

Configuring Twitter API access

Enabling WebSocket output

The Pub/Sub server

Grafana visualization

Installing Grafana

Installing Grafana Simple JSON Datasource

The Grafana Pub/Sub adapter server

Setting up the dashboard

Summary

Example Project – Real-Time Ride Service Data Processing

The goal

Datasource

The pipeline

Simulation of a real-time feed using historical data

Parsing the data

Looking up of the zip code and preparing for the windowing operation

Windowed operator configuration

Serving the data with WebSocket

Running the application

Running the application on GCP Dataproc

Summary

Example Project – ETL Using SQL

The application pipeline

Building and running the application

Application configuration

The application code

Partitioning

Application testing

Understanding application logs

Calcite integration

Summary

Introduction to Apache Beam

Introduction to Apache Beam

Beam concepts

Pipelines, PTransforms, and PCollections

ParDo – elementwise computation

GroupByKey/CombinePerKey – aggregation across elements

Windowing, watermarks, and triggering in Beam

Windowing in Beam

Watermarks in Beam

Triggering in Beam

Advanced topic – stateful ParDo

WordCount in Apache Beam

Setting up your pipeline

Reading the works of Shakespeare in parallel

Splitting each line on spaces

Eliminating empty strings

Counting the occurrences of each word

Format your results

Writing to a sharded text file in parallel

Testing the pipeline at small scale with DirectRunner

Running Apache Beam WordCount on Apache Apex

Summary

The Future of Stream Processing

Lower barrier for building streaming pipelines

Visual development tools

Streaming SQL

Better programming API

Bridging the gap between data science and engineering

Machine learning integration

State management

State query and data consistency

Containerized infrastructure

Management tools

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部