万本电子书0元读

万本电子书0元读

顶部广告

scikit-learn : Machine Learning Simplified电子书

售       价:¥

0人正在读 | 0人评论 9.8

作       者:Raul Garreta, Guillermo Moncecchi, Trent Hauck, Gavin Hackeling

出  版  社:Packt Publishing

出版时间:2017-11-10

字       数:205.4万

所属分类: 进口书 > 外文原版书 > 小说

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Implement scikit-learn into every step of the data science pipeline About This Book Use Python and scikit-learn to create intelligent applications Discover how to apply algorithms in a variety of situations to tackle common and not-so common challenges in the machine learning domain A practical, example-based guide to help you gain expertise in implementing and evaluating machine learning systems using scikit-learn Who This Book Is For If you are a programmer and want to explore machine learning and data-based methods to build intelligent applications and enhance your programming skills, this is the course for you. No previous experience with machine-learning algorithms is required. What You Will Learn Review fundamental concepts including supervised and unsupervised experiences, common tasks, and performance metrics Classify objects (from documents to human faces and flower species) based on some of their features, using a variety of methods from Support Vector Machines to Naive Bayes Use Decision Trees to explain the main causes of certain phenomena such as passenger survival on the Titanic Evaluate the performance of machine learning systems in common tasks Master algorithms of various levels of complexity and learn how to analyze data at the same time Learn just enough math to think about the connections between various algorithms Customize machine learning algorithms to fit your problem, and learn how to modify them when the situation calls for it Incorporate other packages from the Python ecosystem to munge and visualize your dataset Improve the way you build your models using parallelization techniques In Detail Machine learning, the art of creating applications that learn from experience and data, has been around for many years. Python is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility; moreover, within the Python data space, scikit-learn is the unequivocal choice for machine learning. The course combines an introduction to some of the main concepts and methods in machine learning with practical, hands-on examples of real-world problems. The course starts by walking through different methods to prepare your data—be it a dataset with missing values or text columns that require the categories to be turned into indicator variables. After the data is ready, you'll learn different techniques aligned with different objectives—be it a dataset with known outcomes such as sales by state, or more complicated problems such as clustering similar customers. Finally, you'll learn how to polish your algorithm to ensure that it's both accurate and resilient to new datasets. You will learn to incorporate machine learning in your applications. Ranging from handwritten digit recognition to document classification, examples are solved step-by-step using scikit-learn and Python. By the end of this course you will have learned how to build applications that learn from experience, by applying the main concepts and techniques of machine learning. Style and Approach Implement scikit-learn using engaging examples and fun exercises, and with a gentle and friendly but comprehensive "learn-by-doing" approach. This is a practical course, which analyzes compelling data about life, health, and death with the help of tutorials. It offers you a useful way of interpreting the data that's specific to this course, but that can also be applied to any other data. This course is designed to be both a guide and a reference for moving beyond the basics of scikit-learn.
目录展开

scikit-learn: Machine Learning Simplified

Credits

Preface

What this learning path covers

What you need for this learning path

Who this learning path is for

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Module 1

1. Machine Learning – A Gentle Introduction

Installing scikit-learn

Linux

Mac

Windows

Checking your installation

Datasets

Our first machine learning method –linear classification

Evaluating our results

Machine learning categories

Important concepts related to machine learning

Summary

2. Supervised Learning

Image recognition with Support Vector Machines

Training a Support Vector Machine

Text classification with Naïve Bayes

Preprocessing the data

Training a Naïve Bayes classifier

Evaluating the performance

Explaining Titanic hypothesis with decision trees

Preprocessing the data

Training a decision tree classifier

Interpreting the decision tree

Random Forests – randomizing decisions

Evaluating the performance

Predicting house prices with regression

First try – a linear model

Second try – Support Vector Machines for regression

Third try – Random Forests revisited

Evaluation

Summary

3. Unsupervised Learning

Principal Component Analysis

Clustering handwritten digits with k-means

Alternative clustering methods

Summary

4. Advanced Features

Feature extraction

Feature selection

Model selection

Grid search

Parallel grid search

Summary

2. Module 2

1. Premodel Workflow

Introduction

Getting sample data from external sources

Getting ready

How to do it…

How it works…

There's more…

See also

Creating sample data for toy analysis

Getting ready

How to do it...

How it works...

Scaling data to the standard normal

Getting ready

How to do it...

How it works...

There's more...

Creating idempotent scalar objects

Handling sparse imputations

Creating binary features through thresholding

Getting ready

How to do it...

How it works...

There's more...

Sparse matrices

The fit method

Working with categorical variables

Getting ready

How to do it...

How it works...

There's more...

DictVectorizer

Patsy

Binarizing label features

Getting ready

How to do it...

How it works...

There's more...

Imputing missing values through various strategies

Getting ready

How to do it...

How it works...

There's more...

Using Pipelines for multiple preprocessing steps

Getting ready

How to do it...

How it works...

Reducing dimensionality with PCA

Getting ready

How to do it...

How it works...

There's more...

Using factor analysis for decomposition

Getting ready

How to do it...

How it works...

Kernel PCA for nonlinear dimensionality reduction

Getting ready

How to do it...

How it works...

Using truncated SVD to reduce dimensionality

Getting ready

How to do it...

How it works...

There's more...

Sign flipping

Sparse matrices

Decomposition to classify with DictionaryLearning

Getting ready

How to do it...

How it works...

Putting it all together with Pipelines

Getting ready

How to do it...

How it works...

There's more...

Using Gaussian processes for regression

Getting ready

How to do it…

How it works…

There's more…

Defining the Gaussian process object directly

Getting ready

How to do it…

How it works…

Using stochastic gradient descent for regression

Getting ready

How to do it…

How it works…

2. Working with Linear Models

Introduction

Fitting a line through data

Getting ready

How to do it...

How it works...

There's more...

Evaluating the linear regression model

Getting ready

How to do it...

How it works...

There's more...

Using ridge regression to overcome linear regression's shortfalls

Getting ready

How to do it...

How it works...

Optimizing the ridge regression parameter

Getting ready

How to do it...

How it works...

There's more...

Using sparsity to regularize models

Getting ready

How to do it...

How it works...

Lasso cross-validation

Lasso for feature selection

Taking a more fundamental approach to regularization with LARS

Getting ready

How to do it...

How it works...

There's more...

Using linear methods for classification – logistic regression

Getting ready

How to do it...

There's more...

Directly applying Bayesian ridge regression

Getting ready

How to do it...

How it works...

There's more...

Using boosting to learn from errors

Getting ready

How to do it...

How it works...

3. Building Models with Distance Metrics

Introduction

Using KMeans to cluster data

Getting ready

How to do it…

How it works...

Optimizing the number of centroids

Getting ready

How to do it…

How it works…

Assessing cluster correctness

Getting ready

How to do it...

There's more...

Using MiniBatch KMeans to handle more data

Getting ready

How to do it...

How it works...

Quantizing an image with KMeans clustering

Getting ready

How do it…

How it works…

Finding the closest objects in the feature space

Getting ready

How to do it...

How it works...

There's more...

Probabilistic clustering with Gaussian Mixture Models

Getting ready

How to do it...

How it works...

Using KMeans for outlier detection

Getting ready

How to do it...

How it works...

Using k-NN for regression

Getting ready

How to do it…

How it works...

4. Classifying Data with scikit-learn

Introduction

Doing basic classifications with Decision Trees

Getting ready

How to do it…

How it works…

Tuning a Decision Tree model

Getting ready

How to do it…

How it works…

Using many Decision Trees – random forests

Getting ready

How to do it…

How it works…

There's more…

Tuning a random forest model

Getting ready

How to do it…

How it works…

There's more…

Classifying data with support vector machines

Getting ready

How to do it…

How it works…

There's more…

Generalizing with multiclass classification

Getting ready

How to do it…

How it works…

Using LDA for classification

Getting ready

How to do it…

How it works…

Working with QDA – a nonlinear LDA

Getting ready

How to do it…

How it works…

Using Stochastic Gradient Descent for classification

Getting ready

How to do it…

Classifying documents with Naïve Bayes

Getting ready

How to do it…

How it works…

There's more…

Label propagation with semi-supervised learning

Getting ready

How to do it…

How it works…

5. Postmodel Workflow

Introduction

K-fold cross validation

Getting ready

How to do it...

How it works...

Automatic cross validation

Getting ready

How to do it...

How it works...

Cross validation with ShuffleSplit

Getting ready

How to do it...

Stratified k-fold

Getting ready

How to do it...

How it works...

Poor man's grid search

Getting ready

How to do it...

How it works...

Brute force grid search

Getting ready

How to do it...

How it works...

Using dummy estimators to compare results

Getting ready

How to do it...

How it works...

Regression model evaluation

Getting ready

How to do it...

How it works...

Feature selection

Getting ready

How to do it...

How it works...

Feature selection on L1 norms

Getting ready

How to do it...

How it works...

Persisting models with joblib

Getting ready

How to do it...

How it works...

There's more...

3. Module 3

1. The Fundamentals of Machine Learning

Learning from experience

Machine learning tasks

Training data and test data

Performance measures, bias, and variance

An introduction to scikit-learn

Installing scikit-learn

Installing scikit-learn on Windows

Installing scikit-learn on Linux

Installing scikit-learn on OS X

Verifying the installation

Installing pandas and matplotlib

Summary

2. Linear Regression

Simple linear regression

Evaluating the fitness of a model with a cost function

Solving ordinary least squares for simple linear regression

Evaluating the model

Multiple linear regression

Polynomial regression

Regularization

Applying linear regression

Exploring the data

Fitting and evaluating the model

Fitting models with gradient descent

Summary

3. Feature Extraction and Preprocessing

Extracting features from categorical variables

Extracting features from text

The bag-of-words representation

Stop-word filtering

Stemming and lemmatization

Extending bag-of-words with TF-IDF weights

Space-efficient feature vectorizing with the hashing trick

Extracting features from images

Extracting features from pixel intensities

Extracting points of interest as features

SIFT and SURF

Data standardization

Summary

4. From Linear Regression to Logistic Regression

Binary classification with logistic regression

Spam filtering

Binary classification performance metrics

Accuracy

Precision and recall

Calculating the F1 measure

ROC AUC

Tuning models with grid search

Multi-class classification

Multi-class classification performance metrics

Multi-label classification and problem transformation

Multi-label classification performance metrics

Summary

5. Nonlinear Classification and Regression with Decision Trees

Decision trees

Training decision trees

Selecting the questions

Information gain

Gini impurity

Decision trees with scikit-learn

Tree ensembles

The advantages and disadvantages of decision trees

Summary

6. Clustering with K-Means

Clustering with the K-Means algorithm

Local optima

The elbow method

Evaluating clusters

Image quantization

Clustering to learn features

Summary

7. Dimensionality Reduction with PCA

An overview of PCA

Performing Principal Component Analysis

Variance, Covariance, and Covariance Matrices

Eigenvectors and eigenvalues

Dimensionality reduction with Principal Component Analysis

Using PCA to visualize high-dimensional data

Face recognition with PCA

Summary

8. The Perceptron

Activation functions

The perceptron learning algorithm

Binary classification with the perceptron

Document classification with the perceptron

Limitations of the perceptron

Summary

9. From the Perceptron to Support Vector Machines

Kernels and the kernel trick

Maximum margin classification and support vectors

Classifying characters in scikit-learn

Classifying handwritten digits

Classifying characters in natural images

Summary

10. From the Perceptron to Artificial Neural Networks

Nonlinear decision boundaries

Feedforward and feedback artificial neural networks

Multilayer perceptrons

Minimizing the cost function

Forward propagation

Backpropagation

Approximating XOR with Multilayer perceptrons

Classifying handwritten digits

Summary

Bibliography

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部