售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Title Page
Copyright
Apache Spark 2: Data Processing and Real-Time Analytics
About Packt
Why Subscribe?
Packt.com
Contributors
About the Authors
Packt Is Searching for Authors Like You
Preface
Who This Book Is For
What This Book Covers
To Get the Most out of This Book
Download the Example Code Files
Conventions Used
Get in Touch
Reviews
A First Taste and What's New in Apache Spark V2
Spark machine learning
Spark Streaming
Spark SQL
Spark graph processing
Extended ecosystem
What's new in Apache Spark V2?
Cluster design
Cluster management
Local
Standalone
Apache YARN
Apache Mesos
Cloud-based deployments
Performance
The cluster structure
Hadoop Distributed File System
Data locality
Memory
Coding
Cloud
Summary
Apache Spark Streaming
Overview
Errors and recovery
Checkpointing
Streaming sources
TCP stream
File streams
Flume
Kafka
Summary
Structured Streaming
The concept of continuous applications
True unification - same code, same engine
Windowing
How streaming engines use windowing
How Apache Spark improves windowing
Increased performance with good old friends
How transparent fault tolerance and exactly-once delivery guarantee is achieved
Replayable sources can replay streams from a given offset
Idempotent sinks prevent data duplication
State versioning guarantees consistent results after reruns
Example - connection to a MQTT message broker
Controlling continuous applications
More on stream life cycle management
Summary
Apache Spark MLlib
Architecture
The development environment
Classification with Naive Bayes
Theory on Classification
Naive Bayes in practice
Clustering with K-Means
Theory on Clustering
K-Means in practice
Artificial neural networks
ANN in practice
Summary
Apache SparkML
What does the new API look like?
The concept of pipelines
Transformers
String indexer
OneHotEncoder
VectorAssembler
Pipelines
Estimators
RandomForestClassifier
Model evaluation
CrossValidation and hyperparameter tuning
CrossValidation
Hyperparameter tuning
Winning a Kaggle competition with Apache SparkML
Data preparation
Feature engineering
Testing the feature engineering pipeline
Training the machine learning model
Model evaluation
CrossValidation and hyperparameter tuning
Using the evaluator to assess the quality of the cross-validated and tuned model
Summary
Apache SystemML
Why do we need just another library?
Why on Apache Spark?
The history of Apache SystemML
A cost-based optimizer for machine learning algorithms
An example - alternating least squares
ApacheSystemML architecture
Language parsing
High-level operators are generated
How low-level operators are optimized on
Performance measurements
Apache SystemML in action
Summary
Apache Spark GraphX
Overview
Graph analytics/processing with GraphX
The raw data
Creating a graph
Example 1 – counting
Example 2 – filtering
Example 3 – PageRank
Example 4 – triangle counting
Example 5 – connected components
Summary
Spark Tuning
Monitoring Spark jobs
Spark web interface
Jobs
Stages
Storage
Environment
Executors
SQL
Visualizing Spark application using web UI
Observing the running and completed Spark jobs
Debugging Spark applications using logs
Logging with log4j with Spark
Spark configuration
Spark properties
Environmental variables
Logging
Common mistakes in Spark app development
Application failure
Slow jobs or unresponsiveness
Optimization techniques
Data serialization
Memory tuning
Memory usage and management
Tuning the data structures
Serialized RDD storage
Garbage collection tuning
Level of parallelism
Broadcasting
Data locality
Summary
Testing and Debugging Spark
Testing in a distributed environment
Distributed environment
Issues in a distributed system
Challenges of software testing in a distributed environment
Testing Spark applications
Testing Scala methods
Unit testing
Testing Spark applications
Method 1: Using Scala JUnit test
Method 2: Testing Scala code using FunSuite
Method 3: Making life easier with Spark testing base
Configuring Hadoop runtime on Windows
Debugging Spark applications
Logging with log4j with Spark recap
Debugging the Spark application
Debugging Spark application on Eclipse as Scala debug
Debugging Spark jobs running as local and standalone mode
Debugging Spark applications on YARN or Mesos cluster
Debugging Spark application using SBT
Summary
Practical Machine Learning with Spark Using Scala
Introduction
Apache Spark
Machine learning
Scala
Software versions and libraries used in this book
Configuring IntelliJ to work with Spark and run Spark ML sample codes
Getting ready
How to do it...
There's more...
See also
Running a sample ML code from Spark
Getting ready
How to do it...
Identifying data sources for practical machine learning
Getting ready
How to do it...
See also
Running your first program using Apache Spark 2.0 with the IntelliJ IDE
How to do it...
How it works...
There's more...
See also
How to add graphics to your Spark program
How to do it...
How it works...
There's more...
See also
Spark's Three Data Musketeers for Machine Learning - Perfect Together
Introduction
RDDs - what started it all...
DataFrame - a natural evolution to unite API and SQL via a high-level API
Dataset - a high-level unifying Data API
Creating RDDs with Spark 2.0 using internal data sources
How to do it...
How it works...
Creating RDDs with Spark 2.0 using external data sources
How to do it...
How it works...
There's more...
See also
Transforming RDDs with Spark 2.0 using the filter() API
How to do it...
How it works...
There's more...
See also
Transforming RDDs with the super useful flatMap() API
How to do it...
How it works...
There's more...
See also
Transforming RDDs with set operation APIs
How to do it...
How it works...
See also
RDD transformation/aggregation with groupBy() and reduceByKey()
How to do it...
How it works...
There's more...
See also
Transforming RDDs with the zip() API
How to do it...
How it works...
See also
Join transformation with paired key-value RDDs
How to do it...
How it works...
There's more...
Reduce and grouping transformation with paired key-value RDDs
How to do it...
How it works...
See also
Creating DataFrames from Scala data structures
How to do it...
How it works...
There's more...
See also
Operating on DataFrames programmatically without SQL
How to do it...
How it works...
There's more...
See also
Loading DataFrames and setup from an external source
How to do it...
How it works...
There's more...
See also
Using DataFrames with standard SQL language - SparkSQL
How to do it...
How it works...
There's more...
See also
Working with the Dataset API using a Scala Sequence
How to do it...
How it works...
There's more...
See also
Creating and using Datasets from RDDs and back again
How to do it...
How it works...
There's more...
See also
Working with JSON using the Dataset API and SQL together
How to do it...
How it works...
There's more...
See also
Functional programming with the Dataset API using domain objects
How to do it...
How it works...
There's more...
See also
Common Recipes for Implementing a Robust Machine Learning System
Introduction
Spark's basic statistical API to help you build your own algorithms
How to do it...
How it works...
There's more...
See also
ML pipelines for real-life machine learning applications
How to do it...
How it works...
There's more...
See also
Normalizing data with Spark
How to do it...
How it works...
There's more...
See also
Splitting data for training and testing
How to do it...
How it works...
There's more...
See also
Common operations with the new Dataset API
How to do it...
How it works...
There's more...
See also
Creating and using RDD versus DataFrame versus Dataset from a text file in Spark 2.0
How to do it...
How it works...
There's more...
See also
LabeledPoint data structure for Spark ML
How to do it...
How it works...
There's more...
See also
Getting access to Spark cluster in Spark 2.0
How to do it...
How it works...
There's more...
See also
Getting access to Spark cluster pre-Spark 2.0
How to do it...
How it works...
There's more...
See also
Getting access to SparkContext vis-a-vis SparkSession object in Spark 2.0
How to do it...
How it works...
There's more...
See also
New model export and PMML markup in Spark 2.0
How to do it...
How it works...
There's more...
See also
Regression model evaluation using Spark 2.0
How to do it...
How it works...
There's more...
See also
Binary classification model evaluation using Spark 2.0
How to do it...
How it works...
There's more...
See also
Multiclass classification model evaluation using Spark 2.0
How to do it...
How it works...
There's more...
See also
Multilabel classification model evaluation using Spark 2.0
How to do it...
How it works...
There's more...
See also
Using the Scala Breeze library to do graphics in Spark 2.0
How to do it...
How it works...
There's more...
See also
Recommendation Engine that Scales with Spark
Introduction
Content filtering
Collaborative filtering
Neighborhood method
Latent factor models techniques
Setting up the required data for a scalable recommendation engine in Spark 2.0
How to do it...
How it works...
There's more...
See also
Exploring the movies data details for the recommendation system in Spark 2.0
How to do it...
How it works...
There's more...
See also
Exploring the ratings data details for the recommendation system in Spark 2.0
How to do it...
How it works...
There's more...
See also
Building a scalable recommendation engine using collaborative filtering in Spark 2.0
How to do it...
How it works...
There's more...
See also
Dealing with implicit input for training
Unsupervised Clustering with Apache Spark 2.0
Introduction
Building a KMeans classifying system in Spark 2.0
How to do it...
How it works...
KMeans (Lloyd Algorithm)
KMeans++ (Arthur's algorithm)
KMeans|| (pronounced as KMeans Parallel)
There's more...
See also
Bisecting KMeans, the new kid on the block in Spark 2.0
How to do it...
How it works...
There's more...
See also
Using Gaussian Mixture and Expectation Maximization (EM) in Spark to classify data
How to do it...
How it works...
New GaussianMixture()
There's more...
See also
Classifying the vertices of a graph using Power Iteration Clustering (PIC) in Spark 2.0
How to do it...
How it works...
There's more...
See also
Latent Dirichlet Allocation (LDA) to classify documents and text into topics
How to do it...
How it works...
There's more...
See also
Streaming KMeans to classify data in near real-time
How to do it...
How it works...
There's more...
See also
Implementing Text Analytics with Spark 2.0 ML Library
Introduction
Doing term frequency with Spark - everything that counts
How to do it...
How it works...
There's more...
See also
Displaying similar words with Spark using Word2Vec
How to do it...
How it works...
There's more...
See also
Downloading a complete dump of Wikipedia for a real-life Spark ML project
How to do it...
There's more...
See also
Using Latent Semantic Analysis for text analytics with Spark 2.0
How to do it...
How it works...
There's more...
See also
Topic modeling with Latent Dirichlet allocation in Spark 2.0
How to do it...
How it works...
There's more...
See also
Spark Streaming and Machine Learning Library
Introduction
Structured streaming for near real-time machine learning
How to do it...
How it works...
There's more...
See also
Streaming DataFrames for real-time machine learning
How to do it...
How it works...
There's more...
See also
Streaming Datasets for real-time machine learning
How to do it...
How it works...
There's more...
See also
Streaming data and debugging with queueStream
How to do it...
How it works...
See also
Downloading and understanding the famous Iris data for unsupervised classification
How to do it...
How it works...
There's more...
See also
Streaming KMeans for a real-time on-line classifier
How to do it...
How it works...
There's more...
See also
Downloading wine quality data for streaming regression
How to do it...
How it works...
There's more...
Streaming linear regression for a real-time regression
How to do it...
How it works...
There's more...
See also
Downloading Pima Diabetes data for supervised classification
How to do it...
How it works...
There's more...
See also
Streaming logistic regression for an on-line classifier
How to do it...
How it works...
There's more...
See also
Other Books You May Enjoy
Leave a review - let other readers know what you think
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜