万本电子书0元读

万本电子书0元读

顶部广告

Python: Real World Machine Learning电子书

售       价:¥

22人正在读 | 0人评论 9.8

作       者:Prateek Joshi

出  版  社:Packt Publishing

出版时间:2016-11-01

字       数:928.0万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:此类商品不支持退换货,不支持下载打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Learn to solve challenging data science problems by building powerful machine learning models using Python About This Book Understand which algorithms to use in a given context with the help of this exciting recipe-based guide This practical tutorial tackles real-world computing problems through a rigorous and effective approach Build state-of-the-art models and develop personalized recommendations to perform machine learning at scale Who This Book Is For This Learning Path is for Python programmers who are looking to use machine learning algorithms to create real-world applications. It is ideal for Python professionals who want to work with large and complex datasets and Python developers and analysts or data scientists who are looking to add to their existing skills by accessing some of the most powerful recent trends in data science. Experience with Python, Jupyter Notebooks, and command-line execution together with a good level of mathematical knowledge to understand the concepts is expected. Machine learning basic knowledge is also expected. What You Will Learn Use predictive modeling and apply it to real-world problems Understand how to perform market segmentation using unsupervised learning Apply your new-found skills to solve real problems, through clearly-explained code for every technique and test Compete with top data scientists by gaining a practical and theoretical understanding of cutting-edge deep learning algorithms Increase predictive accuracy with deep learning and scalable data-handling techniques Work with modern state-of-the-art large-scale machine learning techniques Learn to use Python code to implement a range of machine learning algorithms and techniques In Detail Machine learning is increasingly spreading in the modern data-driven world. It is used extensively across many fields such as search engines, robotics, self-driving cars, and more. Machine learning is transforming the way we understand and interact with the world around us. In the first module, Python Machine Learning Cookbook, you will learn how to perform various machine learning tasks using a wide variety of machine learning algorithms to solve real-world problems and use Python to implement these algorithms. The second module, Advanced Machine Learning with Python, is designed to take you on a guided tour of the most relevant and powerful machine learning techniques and you’ll acquire a broad set of powerful skills in the area of feature selection and feature engineering. The third module in this learning path, Large Scale Machine Learning with Python, dives into scalable machine learning and the three forms of scalability. It covers the most effective machine learning techniques on a map reduce framework in Hadoop and Spark in Python. This Learning Path will teach you Python machine learning for the real world. The machine learning techniques covered in this Learning Path are at the forefront of commercial practice. This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: Python Machine Learning Cookbook by Prateek Joshi Advanced Machine Learning with Python by John Hearty Large Scale Machine Learning with Python by Bastiaan Sjardin, Alberto Boschetti, Luca Massaron Style and approach This course is a smooth learning path that will teach you how to get started with Python machine learning for the real world, and develop solutions to real-world problems. Through this comprehensive course, you’ll learn to create the most effective machine learning techniques from scratch and more!
目录展开

Python: Real World Machine Learning

Table of Contents

Python: Real World Machine Learning

Python: Real World Machine Learning

Credits

Preface

What this learning path covers

What you need for this learning path

Who this learning path is for

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

I. Module 1

1. The Realm of Supervised Learning

Introduction

Preprocessing data using different techniques

Getting ready

How to do it…

Mean removal

Scaling

Normalization

Binarization

One Hot Encoding

Label encoding

How to do it…

Building a linear regressor

Getting ready

How to do it…

Computing regression accuracy

Getting ready

How to do it…

Achieving model persistence

How to do it…

Building a ridge regressor

Getting ready

How to do it…

Building a polynomial regressor

Getting ready

How to do it…

Estimating housing prices

Getting ready

How to do it…

Computing the relative importance of features

How to do it…

Estimating bicycle demand distribution

Getting ready

How to do it…

There's more…

2. Constructing a Classifier

Introduction

Building a simple classifier

How to do it…

There's more…

Building a logistic regression classifier

How to do it…

Building a Naive Bayes classifier

How to do it…

Splitting the dataset for training and testing

How to do it…

Evaluating the accuracy using cross-validation

Getting ready…

How to do it…

Visualizing the confusion matrix

How to do it…

Extracting the performance report

How to do it…

Evaluating cars based on their characteristics

Getting ready

How to do it…

Extracting validation curves

How to do it…

Extracting learning curves

How to do it…

Estimating the income bracket

How to do it…

3. Predictive Modeling

Introduction

Building a linear classifier using Support Vector Machine (SVMs)

Getting ready

How to do it…

Building a nonlinear classifier using SVMs

How to do it…

Tackling class imbalance

How to do it…

Extracting confidence measurements

How to do it…

Finding optimal hyperparameters

How to do it…

Building an event predictor

Getting ready

How to do it…

Estimating traffic

Getting ready

How to do it…

4. Clustering with Unsupervised Learning

Introduction

Clustering data using the k-means algorithm

How to do it…

Compressing an image using vector quantization

How to do it…

Building a Mean Shift clustering model

How to do it…

Grouping data using agglomerative clustering

How to do it…

Evaluating the performance of clustering algorithms

How to do it…

Automatically estimating the number of clusters using DBSCAN algorithm

How to do it…

Finding patterns in stock market data

How to do it…

Building a customer segmentation model

How to do it…

5. Building Recommendation Engines

Introduction

Building function compositions for data processing

How to do it…

Building machine learning pipelines

How to do it…

How it works…

Finding the nearest neighbors

How to do it…

Constructing a k-nearest neighbors classifier

How to do it…

How it works…

Constructing a k-nearest neighbors regressor

How to do it…

How it works…

Computing the Euclidean distance score

How to do it…

Computing the Pearson correlation score

How to do it…

Finding similar users in the dataset

How to do it…

Generating movie recommendations

How to do it…

6. Analyzing Text Data

Introduction

Preprocessing data using tokenization

How to do it…

Stemming text data

How to do it…

How it works…

Converting text to its base form using lemmatization

How to do it…

Dividing text using chunking

How to do it…

Building a bag-of-words model

How to do it…

How it works…

Building a text classifier

How to do it…

How it works…

Identifying the gender

How to do it…

Analyzing the sentiment of a sentence

How to do it…

How it works…

Identifying patterns in text using topic modeling

How to do it…

How it works…

7. Speech Recognition

Introduction

Reading and plotting audio data

How to do it…

Transforming audio signals into the frequency domain

How to do it…

Generating audio signals with custom parameters

How to do it…

Synthesizing music

How to do it…

Extracting frequency domain features

How to do it…

Building Hidden Markov Models

How to do it…

Building a speech recognizer

How to do it…

8. Dissecting Time Series and Sequential Data

Introduction

Transforming data into the time series format

How to do it…

Slicing time series data

How to do it…

Operating on time series data

How to do it…

Extracting statistics from time series data

How to do it…

Building Hidden Markov Models for sequential data

Getting ready

How to do it…

Building Conditional Random Fields for sequential text data

Getting ready

How to do it…

Analyzing stock market data using Hidden Markov Models

How to do it…

9. Image Content Analysis

Introduction

Operating on images using OpenCV-Python

How to do it…

Detecting edges

How to do it…

Histogram equalization

How to do it…

Detecting corners

How to do it…

Detecting SIFT feature points

How to do it…

Building a Star feature detector

How to do it…

Creating features using visual codebook and vector quantization

How to do it…

Training an image classifier using Extremely Random Forests

How to do it…

Building an object recognizer

How to do it…

10. Biometric Face Recognition

Introduction

Capturing and processing video from a webcam

How to do it…

Building a face detector using Haar cascades

How to do it…

Building eye and nose detectors

How to do it…

Performing Principal Components Analysis

How to do it…

Performing Kernel Principal Components Analysis

How to do it…

Performing blind source separation

How to do it…

Building a face recognizer using Local Binary Patterns Histogram

How to do it…

11. Deep Neural Networks

Introduction

Building a perceptron

How to do it…

Building a single layer neural network

How to do it…

Building a deep neural network

How to do it…

Creating a vector quantizer

How to do it…

Building a recurrent neural network for sequential data analysis

How to do it…

Visualizing the characters in an optical character recognition database

How to do it…

Building an optical character recognizer using neural networks

How to do it…

12. Visualizing Data

Introduction

Plotting 3D scatter plots

How to do it…

Plotting bubble plots

How to do it…

Animating bubble plots

How to do it…

Drawing pie charts

How to do it…

Plotting date-formatted time series data

How to do it…

Plotting histograms

How to do it…

Visualizing heat maps

How to do it…

Animating dynamic signals

How to do it…

II. Module 2

1. Unsupervised Machine Learning

Principal component analysis

PCA – a primer

Employing PCA

Introducing k-means clustering

Clustering – a primer

Kick-starting clustering analysis

Tuning your clustering configurations

Self-organizing maps

SOM – a primer

Employing SOM

Further reading

Summary

2. Deep Belief Networks

Neural networks – a primer

The composition of a neural network

Network topologies

Restricted Boltzmann Machine

Introducing the RBM

Topology

Training

Applications of the RBM

Further applications of the RBM

Deep belief networks

Training a DBN

Applying the DBN

Validating the DBN

Further reading

Summary

3. Stacked Denoising Autoencoders

Autoencoders

Introducing the autoencoder

Topology

Training

Denoising autoencoders

Applying a dA

Stacked Denoising Autoencoders

Applying the SdA

Assessing SdA performance

Further reading

Summary

4. Convolutional Neural Networks

Introducing the CNN

Understanding the convnet topology

Understanding convolution layers

Understanding pooling layers

Training a convnet

Putting it all together

Applying a CNN

Further Reading

Summary

5. Semi-Supervised Learning

Introduction

Understanding semi-supervised learning

Semi-supervised algorithms in action

Self-training

Implementing self-training

Finessing your self-training implementation

Improving the selection process

Contrastive Pessimistic Likelihood Estimation

Further reading

Summary

6. Text Feature Engineering

Introduction

Text feature engineering

Cleaning text data

Text cleaning with BeautifulSoup

Managing punctuation and tokenizing

Tagging and categorising words

Tagging with NLTK

Sequential tagging

Backoff tagging

Creating features from text data

Stemming

Bagging and random forests

Testing our prepared data

Further reading

Summary

7. Feature Engineering Part II

Introduction

Creating a feature set

Engineering features for ML applications

Using rescaling techniques to improve the learnability of features

Creating effective derived variables

Reinterpreting non-numeric features

Using feature selection techniques

Performing feature selection

Correlation

LASSO

Recursive Feature Elimination

Genetic models

Feature engineering in practice

Acquiring data via RESTful APIs

Testing the performance of our model

Twitter

Translink Twitter

Consumer comments

The Bing Traffic API

Deriving and selecting variables using feature engineering techniques

The weather API

Further reading

Summary

8. Ensemble Methods

Introducing ensembles

Understanding averaging ensembles

Using bagging algorithms

Using random forests

Applying boosting methods

Using XGBoost

Using stacking ensembles

Applying ensembles in practice

Using models in dynamic applications

Understanding model robustness

Identifying modeling risk factors

Strategies to managing model robustness

Further reading

Summary

9. Additional Python Machine Learning Tools

Alternative development tools

Introduction to Lasagne

Getting to know Lasagne

Introduction to TensorFlow

Getting to know TensorFlow

Using TensorFlow to iteratively improve our models

Knowing when to use these libraries

Further reading

Summary

A. Chapter Code Requirements

III. Module 3

1. First Steps to Scalability

Explaining scalability in detail

Making large scale examples

Introducing Python

Scale up with Python

Scale out with Python

Python for large scale machine learning

Choosing between Python 2 and Python 3

Package upgrades

Scientific distributions

Introducing Jupyter/IPython

Python packages

NumPy

SciPy

Pandas

Scikit-learn

The matplotlib package

Gensim

H2O

XGBoost

Theano

TensorFlow

The sknn library

Theanets

Keras

Other useful packages to install on your system

Summary

2. Scalable Learning in Scikit-learn

Out-of-core learning

Subsampling as a viable option

Optimizing one instance at a time

Building an out-of-core learning system

Streaming data from sources

Datasets to try the real thing yourself

The first example – streaming the bike-sharing dataset

Using pandas I/O tools

Working with databases

Paying attention to the ordering of instances

Stochastic learning

Batch gradient descent

Stochastic gradient descent

The Scikit-learn SGD implementation

Defining SGD learning parameters

Feature management with data streams

Describing the target

The hashing trick

Other basic transformations

Testing and validation in a stream

Trying SGD in action

Summary

3. Fast SVM Implementations

Datasets to experiment with on your own

The bike-sharing dataset

The covertype dataset

Support Vector Machines

Hinge loss and its variants

Understanding the Scikit-learn SVM implementation

Pursuing nonlinear SVMs by subsampling

Achieving SVM at scale with SGD

Feature selection by regularization

Including non-linearity in SGD

Trying explicit high-dimensional mappings

Hyperparameter tuning

Other alternatives for SVM fast learning

Nonlinear and faster with Vowpal Wabbit

Installing VW

Understanding the VW data format

Python integration

A few examples using reductions for SVM and neural nets

Faster bike-sharing

The covertype dataset crunched by VW

Summary

4. Neural Networks and Deep Learning

The neural network architecture

What and how neural networks learn

Choosing the right architecture

The input layer

The hidden layer

The output layer

Neural networks in action

Parallelization for sknn

Neural networks and regularization

Neural networks and hyperparameter optimization

Neural networks and decision boundaries

Deep learning at scale with H2O

Large scale deep learning with H2O

Gridsearch on H2O

Deep learning and unsupervised pretraining

Deep learning with theanets

Autoencoders and unsupervised learning

Autoencoders

Summary

5. Deep Learning with TensorFlow

TensorFlow installation

TensorFlow operations

GPU computing

Linear regression with SGD

A neural network from scratch in TensorFlow

Machine learning on TensorFlow with SkFlow

Deep learning with large files – incremental learning

Keras and TensorFlow installation

Convolutional Neural Networks in TensorFlow through Keras

The convolution layer

The pooling layer

The fully connected layer

CNN's with an incremental approach

GPU Computing

Summary

6. Classification and Regression Trees at Scale

Bootstrap aggregation

Random forest and extremely randomized forest

Fast parameter optimization with randomized search

Extremely randomized trees and large datasets

CART and boosting

Gradient Boosting Machines

max_depth

learning_rate

Subsample

Faster GBM with warm_start

Speeding up GBM with warm_start

Training and storing GBM models

XGBoost

XGBoost regression

XGBoost and variable importance

XGBoost streaming large datasets

XGBoost model persistence

Out-of-core CART with H2O

Random forest and gridsearch on H2O

Stochastic gradient boosting and gridsearch on H2O

Summary

7. Unsupervised Learning at Scale

Unsupervised methods

Feature decomposition – PCA

Randomized PCA

Incremental PCA

Sparse PCA

PCA with H2O

Clustering – K-means

Initialization methods

K-means assumptions

Selection of the best K

Scaling K-means – mini-batch

K-means with H2O

LDA

Scaling LDA – memory, CPUs, and machines

Summary

8. Distributed Environments – Hadoop and Spark

From a standalone machine to a bunch of nodes

Why do we need a distributed framework?

Setting up the VM

VirtualBox

Vagrant

Using the VM

The Hadoop ecosystem

Architecture

HDFS

MapReduce

YARN

Spark

pySpark

Summary

9. Practical Machine Learning with Spark

Setting up the VM for this chapter

Sharing variables across cluster nodes

Broadcast read-only variables

Accumulators write-only variables

Broadcast and accumulators together – an example

Data preprocessing in Spark

JSON files and Spark DataFrames

Dealing with missing data

Grouping and creating tables in-memory

Writing the preprocessed DataFrame or RDD to disk

Working with Spark DataFrames

Machine learning with Spark

Spark on the KDD99 dataset

Reading the dataset

Feature engineering

Training a learner

Evaluating a learner's performance

The power of the ML pipeline

Manual tuning

Cross-validation

Final cleanup

Summary

A. Introduction to GPUs and Theano

GPU computing

Theano – parallel computing on the GPU

Installing Theano

A. Bibliography

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部