售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Title Page
Copyright and Credits
Machine Learning with Apache Spark Quick Start Guide
Dedication
About Packt
Why subscribe?
Packt.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
The Big Data Ecosystem
A brief history of data
Vertical scaling
Master/slave architecture
Sharding
Data processing and analysis
Data becomes big
Big data ecosystem
Horizontal scaling
Distributed systems
Distributed data stores
Distributed filesystems
Distributed databases
NoSQL databases
Document databases
Columnar databases
Key-value databases
Graph databases
CAP theorem
Distributed search engines
Distributed processing
MapReduce
Apache Spark
RDDs, DataFrames, and datasets
RDDs
DataFrames
Datasets
Jobs, stages, and tasks
Job
Stage
Tasks
Distributed messaging
Distributed streaming
Distributed ledgers
Artificial intelligence and machine learning
Cloud computing platforms
Data insights platform
Reference logical architecture
Data sources layer
Ingestion layer
Persistent data storage layer
Data processing layer
Serving data storage layer
Data intelligence layer
Unified access layer
Data insights and reporting layer
Platform governance, management, and administration
Open source implementation
Summary
Setting Up a Local Development Environment
CentOS Linux 7 virtual machine
Java SE Development Kit 8
Scala 2.11
Anaconda 5 with Python 3
Basic conda commands
Additional Python packages
Jupyter Notebook
Starting Jupyter Notebook
Troubleshooting Jupyter Notebook
Apache Spark 2.3
Spark binaries
Local working directories
Spark configuration
Spark properties
Environmental variables
Standalone master server
Spark worker node
PySpark and Jupyter Notebook
Apache Kafka 2.0
Kafka binaries
Local working directories
Kafka configuration
Start the Kafka server
Testing Kafka
Summary
Artificial Intelligence and Machine Learning
Artificial intelligence
Machine learning
Supervised learning
Unsupervised learning
Reinforced learning
Deep learning
Natural neuron
Artificial neuron
Weights
Activation function
Heaviside step function
Sigmoid function
Hyperbolic tangent function
Artificial neural network
Single-layer perceptron
Multi-layer perceptron
NLP
Cognitive computing
Machine learning pipelines in Apache Spark
Summary
Supervised Learning Using Apache Spark
Linear regression
Case study – predicting bike sharing demand
Univariate linear regression
Residuals
Root mean square error
R-squared
Univariate linear regression in Apache Spark
Multivariate linear regression
Correlation
Multivariate linear regression in Apache Spark
Logistic regression
Threshold value
Confusion matrix
Receiver operator characteristic curve
Area under the ROC curve
Case study – predicting breast cancer
Classification and Regression Trees
Case study – predicting political affiliation
Random forests
K-Fold cross validation
Summary
Unsupervised Learning Using Apache Spark
Clustering
Euclidean distance
Hierarchical clustering
K-means clustering
Case study – detecting brain tumors
Feature vectors from images
Image segmentation
K-means cost function
K-means clustering in Apache Spark
Principal component analysis
Case study – movie recommendation system
Covariance matrix
Identity matrix
Eigenvectors and eigenvalues
PCA in Apache Spark
Summary
Natural Language Processing Using Apache Spark
Feature transformers
Document
Corpus
Preprocessing pipeline
Tokenization
Stop words
Stemming
Lemmatization
Normalization
Feature extractors
Bag of words
Term frequency–inverse document frequency
Case study – sentiment analysis
NLP pipeline
NLP in Apache Spark
Summary
Deep Learning Using Apache Spark
Artificial neural networks
Multilayer perceptrons
MLP classifier
Input layer
Hidden layers
Output layer
Case study 1 – OCR
Input data
Training architecture
Detecting patterns in the hidden layer
Classifying in the output layer
MLPs in Apache Spark
Convolutional neural networks
End-to-end neural architecture
Input layer
Convolution layers
Rectified linear units
Pooling layers
Fully connected layer
Output layer
Case study 2 – image recognition
InceptionV3 via TensorFlow
Deep learning pipelines for Apache Spark
Image library
PySpark image recognition application
Spark submit
Image-recognition results
Case study 3 – image prediction
PySpark image-prediction application
Image-prediction results
Summary
Real-Time Machine Learning Using Apache Spark
Distributed streaming platform
Distributed stream processing engines
Streaming using Apache Spark
Spark Streaming (DStreams)
Structured Streaming
Stream processing pipeline
Case study – real-time sentiment analysis
Start Zookeeper and Kafka Servers
Kafka topic
Twitter developer account
Twitter apps and the Twitter API
Application configuration
Kafka Twitter producer application
Preprocessing and feature vectorization pipelines
Kafka Twitter consumer application
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜