万本电子书0元读

万本电子书0元读

顶部广告

Learning Apache Flink电子书

售       价:¥

138人正在读 | 0人评论 6.2

作       者:Tanmay Deshpande

出  版  社:Packt Publishing

出版时间:2017-02-01

字       数:251.3万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:此类商品不支持退换货,不支持下载打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Discover the definitive guide to crafting lightning-fast data processing for distributed systems with Apache Flink About This Book Build your expertize in processing real-time data with Apache Flink and its ecosystem Gain insights into the working of all components of Apache Flink such as FlinkML, Gelly, and Table API filled with real world use cases Exploit Apache Flink's capabilities like distributed data streaming, in-memory processing, pipelining and iteration operators to improve performance. Solve real world big-data problems with real time in-memory and disk-based processing capabilities of Apache Flink. Who This Book Is For Big data developers who are looking to process batch and real-time data on distributed systems. Basic knowledge of Hadoop and big data is assumed. Reasonable knowledge of Java or Scala is expected. What You Will Learn Learn how to build end to end real time analytics projects Integrate with existing big data stack and utilize existing infrastructure Build predictive analytics applications using FlinkML Use graph library to perform graph querying and search. Understand Flink's - "Streaming First" architecture to implementing real streaming applications Learn Flink Logging and Monitoring best practices in order to efficiently design your data pipelines Explore the detailed processes to deploy Flink cluster on Amazon Web Services(AWS) and Google Cloud Platform (GCP). In Detail With the advent of massive computer systems, organizations in different domains generate large amounts of data on a real-time basis. The latest entrant to big data processing, Apache Flink, is designed to process continuous streams of data at a lightning fast pace. This book will be your definitive guide to batch and stream data processing with Apache Flink. The book begins with introducing the Apache Flink ecosystem, setting it up and using the DataSet and DataStream API for processing batch and streaming datasets. Bringing the power of SQL to Flink, this book will then explore the Table API for querying and manipulating data. In the latter half of the book, readers will get to learn the remaining ecosystem of Apache Flink to achieve complex tasks such as event processing, machine learning, and graph processing. The final part of the book would consist of topics such as scaling Flink solutions, performance optimization and integrating Flink with other tools such as ElasticSearch. Whether you want to dive deeper into Apache Flink, or want to investigate how to get more out of this powerful technology, you’ll find everything you need inside. Style and approach This book is a comprehensive guide that covers advanced features of the Apache Flink, and communicates them with a practical understanding of the underlying concepts for how, when, and why to use them.
目录展开

Learning Apache Flink

Learning Apache Flink

Learning Apache Flink

Learning Apache Flink

Learning Apache Flink

Learning Apache Flink

Credits

Credits

Credits

About the Author

About the Author

About the Author

About the Reviewers

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

www.PacktPub.com

Why subscribe?

Why subscribe?

Why subscribe?

Customer Feedback

Customer Feedback

Customer Feedback

Preface

Preface

Preface

What this book covers

What this book covers

What this book covers

What you need for this book

What you need for this book

What you need for this book

Who this book is for

Who this book is for

Who this book is for

Conventions

Conventions

Conventions

Reader feedback

Reader feedback

Reader feedback

Customer support

Customer support

Customer support

Downloading the example code

Downloading the example code

Downloading the example code

Downloading the color images of this book

Downloading the color images of this book

Downloading the color images of this book

Errata

Errata

Errata

Piracy

Piracy

Piracy

Questions

Questions

Questions

1. Introduction to Apache Flink

1. Introduction to Apache Flink

1. Introduction to Apache Flink

History

History

History

Architecture

Architecture

Architecture

Distributed execution

Distributed execution

Distributed execution

Job Manager

Job Manager

Job Manager

Actor system

Actor system

Actor system

Scheduler

Scheduler

Scheduler

Check pointing

Check pointing

Check pointing

Task manager

Task manager

Task manager

Job client

Job client

Job client

Features

Features

Features

High performance

High performance

High performance

Exactly-once stateful computation

Exactly-once stateful computation

Exactly-once stateful computation

Flexible streaming windows

Flexible streaming windows

Flexible streaming windows

Fault tolerance

Fault tolerance

Fault tolerance

Memory management

Memory management

Memory management

Optimizer

Optimizer

Optimizer

Stream and batch in one platform

Stream and batch in one platform

Stream and batch in one platform

Libraries

Libraries

Libraries

Event time semantics

Event time semantics

Event time semantics

Quick start setup

Quick start setup

Quick start setup

Pre-requisite

Pre-requisite

Pre-requisite

Installing on Windows

Installing on Windows

Installing on Windows

Installing on Linux

Installing on Linux

Installing on Linux

Cluster setup

Cluster setup

Cluster setup

SSH configurations

SSH configurations

SSH configurations

Java installation

Java installation

Java installation

Flink installation

Flink installation

Flink installation

Configurations

Configurations

Configurations

Starting daemons

Starting daemons

Starting daemons

Adding additional Job/Task Managers

Adding additional Job/Task Managers

Adding additional Job/Task Managers

Stopping daemons and cluster

Stopping daemons and cluster

Stopping daemons and cluster

Running sample application

Running sample application

Running sample application

Summary

Summary

Summary

2. Data Processing Using the DataStream API

2. Data Processing Using the DataStream API

2. Data Processing Using the DataStream API

Execution environment

Execution environment

Execution environment

Data sources

Data sources

Data sources

Socket-based

Socket-based

Socket-based

File-based

File-based

File-based

Transformations

Transformations

Transformations

Map

Map

Map

FlatMap

FlatMap

FlatMap

Filter

Filter

Filter

KeyBy

KeyBy

KeyBy

Reduce

Reduce

Reduce

Fold

Fold

Fold

Aggregations

Aggregations

Aggregations

Window

Window

Window

Global windows

Global windows

Global windows

Tumbling windows

Tumbling windows

Tumbling windows

Sliding windows

Sliding windows

Sliding windows

Session windows

Session windows

Session windows

WindowAll

WindowAll

WindowAll

Union

Union

Union

Window join

Window join

Window join

Split

Split

Split

Select

Select

Select

Project

Project

Project

Physical partitioning

Physical partitioning

Physical partitioning

Custom partitioning

Custom partitioning

Custom partitioning

Random partitioning

Random partitioning

Random partitioning

Rebalancing partitioning

Rebalancing partitioning

Rebalancing partitioning

Rescaling

Rescaling

Rescaling

Broadcasting

Broadcasting

Broadcasting

Data sinks

Data sinks

Data sinks

Event time and watermarks

Event time and watermarks

Event time and watermarks

Event time

Event time

Event time

Processing time

Processing time

Processing time

Ingestion time

Ingestion time

Ingestion time

Connectors

Connectors

Connectors

Kafka connector

Kafka connector

Kafka connector

Twitter connector

Twitter connector

Twitter connector

RabbitMQ connector

RabbitMQ connector

RabbitMQ connector

ElasticSearch connector

ElasticSearch connector

ElasticSearch connector

Embedded node mode

Embedded node mode

Embedded node mode

Transport client mode

Transport client mode

Transport client mode

Cassandra connector

Cassandra connector

Cassandra connector

Use case - sensor data analytics

Use case - sensor data analytics

Use case - sensor data analytics

Summary

Summary

Summary

3. Data Processing Using the Batch Processing API

3. Data Processing Using the Batch Processing API

3. Data Processing Using the Batch Processing API

Data sources

Data sources

Data sources

File-based

File-based

File-based

Collection-based

Collection-based

Collection-based

Generic sources

Generic sources

Generic sources

Compressed files

Compressed files

Compressed files

Transformations

Transformations

Transformations

Map

Map

Map

Flat map

Flat map

Flat map

Filter

Filter

Filter

Project

Project

Project

Reduce on grouped datasets

Reduce on grouped datasets

Reduce on grouped datasets

Reduce on grouped datasets by field position key

Reduce on grouped datasets by field position key

Reduce on grouped datasets by field position key

Group combine

Group combine

Group combine

Aggregate on a grouped tuple dataset

Aggregate on a grouped tuple dataset

Aggregate on a grouped tuple dataset

MinBy on a grouped tuple dataset

MinBy on a grouped tuple dataset

MinBy on a grouped tuple dataset

MaxBy on a grouped tuple dataset

MaxBy on a grouped tuple dataset

MaxBy on a grouped tuple dataset

Reduce on full dataset

Reduce on full dataset

Reduce on full dataset

Group reduce on a full dataset

Group reduce on a full dataset

Group reduce on a full dataset

Aggregate on a full tuple dataset

Aggregate on a full tuple dataset

Aggregate on a full tuple dataset

MinBy on a full tuple dataset

MinBy on a full tuple dataset

MinBy on a full tuple dataset

MaxBy on a full tuple dataset

MaxBy on a full tuple dataset

MaxBy on a full tuple dataset

Distinct

Distinct

Distinct

Join

Join

Join

Cross

Cross

Cross

Union

Union

Union

Rebalance

Rebalance

Rebalance

Hash partition

Hash partition

Hash partition

Range partition

Range partition

Range partition

Sort partition

Sort partition

Sort partition

First-n

First-n

First-n

Broadcast variables

Broadcast variables

Broadcast variables

Data sinks

Data sinks

Data sinks

Connectors

Connectors

Connectors

Filesystems

Filesystems

Filesystems

HDFS

HDFS

HDFS

Amazon S3

Amazon S3

Amazon S3

Alluxio

Alluxio

Alluxio

Avro

Avro

Avro

Microsoft Azure storage

Microsoft Azure storage

Microsoft Azure storage

MongoDB

MongoDB

MongoDB

Iterations

Iterations

Iterations

Iterator operator

Iterator operator

Iterator operator

Delta iterator

Delta iterator

Delta iterator

Use case - Athletes data insights using Flink batch API

Use case - Athletes data insights using Flink batch API

Use case - Athletes data insights using Flink batch API

Summary

Summary

Summary

4. Data Processing Using the Table API

4. Data Processing Using the Table API

4. Data Processing Using the Table API

Registering tables

Registering tables

Registering tables

Registering a dataset

Registering a dataset

Registering a dataset

Registering a datastream

Registering a datastream

Registering a datastream

Registering a table

Registering a table

Registering a table

Registering external table sources

Registering external table sources

Registering external table sources

CSV table source

CSV table source

CSV table source

Kafka JSON table source

Kafka JSON table source

Kafka JSON table source

Accessing the registered table

Accessing the registered table

Accessing the registered table

Operators

Operators

Operators

The select operator

The select operator

The select operator

The where operator

The where operator

The where operator

The filter operator

The filter operator

The filter operator

The as operator

The as operator

The as operator

The groupBy operator

The groupBy operator

The groupBy operator

The join operator

The join operator

The join operator

The leftOuterJoin operator

The leftOuterJoin operator

The leftOuterJoin operator

The rightOuterJoin operator

The rightOuterJoin operator

The rightOuterJoin operator

The fullOuterJoin operator

The fullOuterJoin operator

The fullOuterJoin operator

The union operator

The union operator

The union operator

The unionAll operator

The unionAll operator

The unionAll operator

The intersect operator

The intersect operator

The intersect operator

The intersectAll operator

The intersectAll operator

The intersectAll operator

The minus operator

The minus operator

The minus operator

The minusAll operator

The minusAll operator

The minusAll operator

The distinct operator

The distinct operator

The distinct operator

The orderBy operator

The orderBy operator

The orderBy operator

The limit operator

The limit operator

The limit operator

Data types

Data types

Data types

SQL

SQL

SQL

SQL on datastream

SQL on datastream

SQL on datastream

Supported SQL syntax

Supported SQL syntax

Supported SQL syntax

Scalar functions

Scalar functions

Scalar functions

Scalar functions in the table API

Scalar functions in the table API

Scalar functions in the table API

Scala functions in SQL

Scala functions in SQL

Scala functions in SQL

Use case - Athletes data insights using Flink Table API

Use case - Athletes data insights using Flink Table API

Use case - Athletes data insights using Flink Table API

Summary

Summary

Summary

5. Complex Event Processing

5. Complex Event Processing

5. Complex Event Processing

What is complex event processing?

What is complex event processing?

What is complex event processing?

Flink CEP

Flink CEP

Flink CEP

Event streams

Event streams

Event streams

Pattern API

Pattern API

Pattern API

Begin

Begin

Begin

Filter

Filter

Filter

Subtype

Subtype

Subtype

OR

OR

OR

Continuity

Continuity

Continuity

Strict continuity

Strict continuity

Strict continuity

Non-strict continuity

Non-strict continuity

Non-strict continuity

Within

Within

Within

Detecting patterns

Detecting patterns

Detecting patterns

Selecting from patterns

Selecting from patterns

Selecting from patterns

Select

Select

Select

flatSelect

flatSelect

flatSelect

Handling timed-out partial patterns

Handling timed-out partial patterns

Handling timed-out partial patterns

Use case - complex event processing on a temperature sensor

Use case - complex event processing on a temperature sensor

Use case - complex event processing on a temperature sensor

Summary

Summary

Summary

6. Machine Learning Using FlinkML

6. Machine Learning Using FlinkML

6. Machine Learning Using FlinkML

What is machine learning?

What is machine learning?

What is machine learning?

Supervised learning

Supervised learning

Supervised learning

Regression

Regression

Regression

Classification

Classification

Classification

Unsupervised learning

Unsupervised learning

Unsupervised learning

Clustering

Clustering

Clustering

Association

Association

Association

Semi-supervised learning

Semi-supervised learning

Semi-supervised learning

FlinkML

FlinkML

FlinkML

Supported algorithms

Supported algorithms

Supported algorithms

Supervised learning

Supervised learning

Supervised learning

Support Vector Machine

Support Vector Machine

Support Vector Machine

Multiple Linear Regression

Multiple Linear Regression

Multiple Linear Regression

Optimization framework

Optimization framework

Optimization framework

Recommendations

Recommendations

Recommendations

Alternating Least Squares

Alternating Least Squares

Alternating Least Squares

Unsupervised learning

Unsupervised learning

Unsupervised learning

k Nearest Neighbour join

k Nearest Neighbour join

k Nearest Neighbour join

Utilities

Utilities

Utilities

Data pre processing and pipelines

Data pre processing and pipelines

Data pre processing and pipelines

Polynomial features

Polynomial features

Polynomial features

Standard scaler

Standard scaler

Standard scaler

MinMax scaler

MinMax scaler

MinMax scaler

Summary

Summary

Summary

7. Flink Graph API - Gelly

7. Flink Graph API - Gelly

7. Flink Graph API - Gelly

What is a graph?

What is a graph?

What is a graph?

Flink graph API - Gelly

Flink graph API - Gelly

Flink graph API - Gelly

Graph representation

Graph representation

Graph representation

Graph nodes

Graph nodes

Graph nodes

Graph edges

Graph edges

Graph edges

Graph creation

Graph creation

Graph creation

From dataset of edges and vertices

From dataset of edges and vertices

From dataset of edges and vertices

From dataset of tuples representing edges

From dataset of tuples representing edges

From dataset of tuples representing edges

From CSV files

From CSV files

From CSV files

From collection lists

From collection lists

From collection lists

Graph properties

Graph properties

Graph properties

Graph transformations

Graph transformations

Graph transformations

Map

Map

Map

Translate

Translate

Translate

Filter

Filter

Filter

Join

Join

Join

Reverse

Reverse

Reverse

Undirected

Undirected

Undirected

Union

Union

Union

Intersect

Intersect

Intersect

Graph mutations

Graph mutations

Graph mutations

Neighborhood methods

Neighborhood methods

Neighborhood methods

Graph validation

Graph validation

Graph validation

Iterative graph processing

Iterative graph processing

Iterative graph processing

Vertex-Centric iterations

Vertex-Centric iterations

Vertex-Centric iterations

Scatter-Gather iterations

Scatter-Gather iterations

Scatter-Gather iterations

Gather-Sum-Apply iterations

Gather-Sum-Apply iterations

Gather-Sum-Apply iterations

Use case - Airport Travel Optimization

Use case - Airport Travel Optimization

Use case - Airport Travel Optimization

Summary

Summary

Summary

8. Distributed Data Processing with Flink and Hadoop

8. Distributed Data Processing with Flink and Hadoop

8. Distributed Data Processing with Flink and Hadoop

Quick overview of Hadoop

Quick overview of Hadoop

Quick overview of Hadoop

HDFS

HDFS

HDFS

YARN

YARN

YARN

Flink on YARN

Flink on YARN

Flink on YARN

Configurations

Configurations

Configurations

Starting a Flink YARN session

Starting a Flink YARN session

Starting a Flink YARN session

Submitting a job to Flink

Submitting a job to Flink

Submitting a job to Flink

Stopping Flink YARN session

Stopping Flink YARN session

Stopping Flink YARN session

Running a single Flink job on YARN

Running a single Flink job on YARN

Running a single Flink job on YARN

Recovery behavior for Flink on YARN

Recovery behavior for Flink on YARN

Recovery behavior for Flink on YARN

Working details

Working details

Working details

Summary

Summary

Summary

9. Deploying Flink on Cloud

9. Deploying Flink on Cloud

9. Deploying Flink on Cloud

Flink on Google Cloud

Flink on Google Cloud

Flink on Google Cloud

Installing Google Cloud SDK

Installing Google Cloud SDK

Installing Google Cloud SDK

Installing BDUtil

Installing BDUtil

Installing BDUtil

Launching a Flink cluster

Launching a Flink cluster

Launching a Flink cluster

Executing a sample job

Executing a sample job

Executing a sample job

Shutting down the cluster

Shutting down the cluster

Shutting down the cluster

Flink on AWS

Flink on AWS

Flink on AWS

Launching an EMR cluster

Launching an EMR cluster

Launching an EMR cluster

Installing Flink on EMR

Installing Flink on EMR

Installing Flink on EMR

Executing Flink on EMR-YARN

Executing Flink on EMR-YARN

Executing Flink on EMR-YARN

Starting a Flink YARN session

Starting a Flink YARN session

Starting a Flink YARN session

Executing Flink job on YARN session

Executing Flink job on YARN session

Executing Flink job on YARN session

Shutting down the cluster

Shutting down the cluster

Shutting down the cluster

Flink on EMR 5.3+

Flink on EMR 5.3+

Flink on EMR 5.3+

Using S3 in Flink applications

Using S3 in Flink applications

Using S3 in Flink applications

Summary

Summary

Summary

10. Best Practices

10. Best Practices

10. Best Practices

Logging best practices

Logging best practices

Logging best practices

Configuring Log4j

Configuring Log4j

Configuring Log4j

Configuring Logback

Configuring Logback

Configuring Logback

Logging in applications

Logging in applications

Logging in applications

Using ParameterTool

Using ParameterTool

Using ParameterTool

From system properties

From system properties

From system properties

From command line arguments

From command line arguments

From command line arguments

From .properties file

From .properties file

From .properties file

Naming large TupleX types

Naming large TupleX types

Naming large TupleX types

Registering a custom serializer

Registering a custom serializer

Registering a custom serializer

Metrics

Metrics

Metrics

Registering metrics

Registering metrics

Registering metrics

Counters

Counters

Counters

Gauges

Gauges

Gauges

Histograms

Histograms

Histograms

Meters

Meters

Meters

Reporters

Reporters

Reporters

Monitoring REST API

Monitoring REST API

Monitoring REST API

Config API

Config API

Config API

Overview API

Overview API

Overview API

Overview of the jobs

Overview of the jobs

Overview of the jobs

Details of a specific job

Details of a specific job

Details of a specific job

User defined job configuration

User defined job configuration

User defined job configuration

Back pressure monitoring

Back pressure monitoring

Back pressure monitoring

Summary

Summary

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部