售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Large Scale Machine Learning with Python
Table of Contents
Large Scale Machine Learning with Python
Credits
About the Authors
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. First Steps to Scalability
Explaining scalability in detail
Making large scale examples
Introducing Python
Scale up with Python
Scale out with Python
Python for large scale machine learning
Choosing between Python 2 and Python 3
Installing Python
Step-by-step installation
The installation of packages
Package upgrades
Scientific distributions
Introducing Jupyter/IPython
Python packages
NumPy
SciPy
Pandas
Scikit-learn
The matplotlib package
Gensim
H2O
XGBoost
Theano
TensorFlow
The sknn library
Theanets
Keras
Other useful packages to install on your system
Summary
2. Scalable Learning in Scikit-learn
Out-of-core learning
Subsampling as a viable option
Optimizing one instance at a time
Building an out-of-core learning system
Streaming data from sources
Datasets to try the real thing yourself
The first example – streaming the bike-sharing dataset
Using pandas I/O tools
Working with databases
Paying attention to the ordering of instances
Stochastic learning
Batch gradient descent
Stochastic gradient descent
The Scikit-learn SGD implementation
Defining SGD learning parameters
Feature management with data streams
Describing the target
The hashing trick
Other basic transformations
Testing and validation in a stream
Trying SGD in action
Summary
3. Fast SVM Implementations
Datasets to experiment with on your own
The bike-sharing dataset
The covertype dataset
Support Vector Machines
Hinge loss and its variants
Understanding the Scikit-learn SVM implementation
Pursuing nonlinear SVMs by subsampling
Achieving SVM at scale with SGD
Feature selection by regularization
Including non-linearity in SGD
Trying explicit high-dimensional mappings
Hyperparameter tuning
Other alternatives for SVM fast learning
Nonlinear and faster with Vowpal Wabbit
Installing VW
Understanding the VW data format
Python integration
A few examples using reductions for SVM and neural nets
Faster bike-sharing
The covertype dataset crunched by VW
Summary
4. Neural Networks and Deep Learning
The neural network architecture
What and how neural networks learn
Choosing the right architecture
The input layer
The hidden layer
The output layer
Neural networks in action
Parallelization for sknn
Neural networks and regularization
Neural networks and hyperparameter optimization
Neural networks and decision boundaries
Deep learning at scale with H2O
Large scale deep learning with H2O
Gridsearch on H2O
Deep learning and unsupervised pretraining
Deep learning with theanets
Autoencoders and unsupervised learning
Autoencoders
Summary
5. Deep Learning with TensorFlow
TensorFlow installation
TensorFlow operations
GPU computing
Linear regression with SGD
A neural network from scratch in TensorFlow
Machine learning on TensorFlow with SkFlow
Deep learning with large files – incremental learning
Keras and TensorFlow installation
Convolutional Neural Networks in TensorFlow through Keras
The convolution layer
The pooling layer
The fully connected layer
CNN's with an incremental approach
GPU Computing
Summary
6. Classification and Regression Trees at Scale
Bootstrap aggregation
Random forest and extremely randomized forest
Fast parameter optimization with randomized search
Extremely randomized trees and large datasets
CART and boosting
Gradient Boosting Machines
max_depth
learning_rate
Subsample
Faster GBM with warm_start
Speeding up GBM with warm_start
Training and storing GBM models
XGBoost
XGBoost regression
XGBoost and variable importance
XGBoost streaming large datasets
XGBoost model persistence
Out-of-core CART with H2O
Random forest and gridsearch on H2O
Stochastic gradient boosting and gridsearch on H2O
Summary
7. Unsupervised Learning at Scale
Unsupervised methods
Feature decomposition – PCA
Randomized PCA
Incremental PCA
Sparse PCA
PCA with H2O
Clustering – K-means
Initialization methods
K-means assumptions
Selection of the best K
Scaling K-means – mini-batch
K-means with H2O
LDA
Scaling LDA – memory, CPUs, and machines
Summary
8. Distributed Environments – Hadoop and Spark
From a standalone machine to a bunch of nodes
Why do we need a distributed framework?
Setting up the VM
VirtualBox
Vagrant
Using the VM
The Hadoop ecosystem
Architecture
HDFS
MapReduce
YARN
Spark
pySpark
Summary
9. Practical Machine Learning with Spark
Setting up the VM for this chapter
Sharing variables across cluster nodes
Broadcast read-only variables
Accumulators write-only variables
Broadcast and accumulators together – an example
Data preprocessing in Spark
JSON files and Spark DataFrames
Dealing with missing data
Grouping and creating tables in-memory
Writing the preprocessed DataFrame or RDD to disk
Working with Spark DataFrames
Machine learning with Spark
Spark on the KDD99 dataset
Reading the dataset
Feature engineering
Training a learner
Evaluating a learner's performance
The power of the ML pipeline
Manual tuning
Cross-validation
Final cleanup
Summary
A. Introduction to GPUs and Theano
GPU computing
Theano – parallel computing on the GPU
Installing Theano
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜