售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Regression Analysis with Python
Table of Contents
Regression Analysis with Python
Credits
About the Authors
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Regression – The Workhorse of Data Science
Regression analysis and data science
Exploring the promise of data science
The challenge
The linear models
What you are going to find in the book
Python for data science
Installing Python
Choosing between Python 2 and Python 3
Step-by-step installation
Installing packages
Package upgrades
Scientific distributions
Introducing Jupyter or IPython
Python packages and functions for linear models
NumPy
SciPy
Statsmodels
Scikit-learn
Summary
2. Approaching Simple Linear Regression
Defining a regression problem
Linear models and supervised learning
Reflecting on predictive variables
Reflecting on response variables
The family of linear models
Preparing to discover simple linear regression
Starting from the basics
A measure of linear relationship
Extending to linear regression
Regressing with Statsmodels
The coefficient of determination
Meaning and significance of coefficients
Evaluating the fitted values
Correlation is not causation
Predicting with a regression model
Regressing with Scikit-learn
Minimizing the cost function
Explaining the reason for using squared errors
Pseudoinverse and other optimization methods
Gradient descent at work
Summary
3. Multiple Regression in Action
Using multiple features
Model building with Statsmodels
Using formulas as an alternative
The correlation matrix
Revisiting gradient descent
Feature scaling
Unstandardizing coefficients
Estimating feature importance
Inspecting standardized coefficients
Comparing models by R-squared
Interaction models
Discovering interactions
Polynomial regression
Testing linear versus cubic transformation
Going for higher-degree solutions
Introducing underfitting and overfitting
Summary
4. Logistic Regression
Defining a classification problem
Formalization of the problem: binary classification
Assessing the classifier's performance
Defining a probability-based approach
More on the logistic and logit functions
Let's see some code
Pros and cons of logistic regression
Revisiting gradient descent
Multiclass Logistic Regression
An example
Summary
5. Data Preparation
Numeric feature scaling
Mean centering
Standardization
Normalization
The logistic regression case
Qualitative feature encoding
Dummy coding with Pandas
DictVectorizer and one-hot encoding
Feature hasher
Numeric feature transformation
Observing residuals
Summarizations by binning
Missing data
Missing data imputation
Keeping track of missing values
Outliers
Outliers on the response
Outliers among the predictors
Removing or replacing outliers
Summary
6. Achieving Generalization
Checking on out-of-sample data
Testing by sample split
Cross-validation
Bootstrapping
Greedy selection of features
The Madelon dataset
Univariate selection of features
Recursive feature selection
Regularization optimized by grid-search
Ridge (L2 regularization)
Grid search for optimal parameters
Random grid search
Lasso (L1 regularization)
Elastic net
Stability selection
Experimenting with the Madelon
Summary
7. Online and Batch Learning
Batch learning
Online mini-batch learning
A real example
Streaming scenario without a test set
Summary
8. Advanced Regression Methods
Least Angle Regression
Visual showcase of LARS
A code example
LARS wrap up
Bayesian regression
Bayesian regression wrap up
SGD classification with hinge loss
Comparison with logistic regression
SVR
SVM wrap up
Regression trees (CART)
Regression tree wrap up
Bagging and boosting
Bagging
Boosting
Ensemble wrap up
Gradient Boosting Regressor with LAD
GBM with LAD wrap up
Summary
9. Real-world Applications for Regression Models
Downloading the datasets
Time series problem dataset
Regression problem dataset
Multiclass classification problem dataset
Ranking problem dataset
A regression problem
Testing a classifier instead of a regressor
An imbalanced and multiclass classification problem
A ranking problem
A time series problem
Open questions
Summary
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜