万本电子书0元读

万本电子书0元读

顶部广告

Regression Analysis with Python电子书

售       价:¥

5人正在读 | 0人评论 6.2

作       者:Luca Massaron

出  版  社:Packt Publishing

出版时间:2016-02-29

字       数:170.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Learn the art of regression analysis with PythonAbout This BookBecome competent at implementing regression analysis in PythonSolve some of the complex data science problems related to predicting outcomesGet to grips with various types of regression for effective data analysisWho This Book Is ForThe book targets Python developers, with a basic understanding of data science, statistics, and math, who want to learn how to do regression analysis on a dataset. It is beneficial if you have some knowledge of statistics and data science.What You Will LearnFormat a dataset for regression and evaluate its performanceApply multiple linear regression to real-world problemsLearn to classify training pointsCreate an observation matrix, using different techniques of data analysis and cleaningApply several techniques to decrease (and eventually fix) any overfitting problemLearn to scale linear models to a big dataset and deal with incremental dataIn DetailRegression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. There are many kinds of regression algorithms, and the aim of this book is to explain which is the right one to use for each set of problems and how to prepare real-world data for it. With this book you will learn to define a simple regression problem and evaluate its performance. The book will help you understand how to properly parse a dataset, clean it, and create an output matrix optimally built for regression. You will begin with a simple regression algorithm to solve some data science problems and then progress to more complex algorithms. The book will enable you to use regression models to predict outcomes and take critical business decisions. Through the book, you will gain knowledge to use Python for building fast better linear models and to apply the results in Python or in any computer language you prefer.Style and approach This is a practical tutorial-based book. You will be given an example problem and then supplied with the relevant code and how to walk through it. The details are provided in a step by step manner, followed by a thorough explanation of the math underlying the solution. This approach will help you leverage your own data using the same techniques.
目录展开

Regression Analysis with Python

Table of Contents

Regression Analysis with Python

Credits

About the Authors

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Regression – The Workhorse of Data Science

Regression analysis and data science

Exploring the promise of data science

The challenge

The linear models

What you are going to find in the book

Python for data science

Installing Python

Choosing between Python 2 and Python 3

Step-by-step installation

Installing packages

Package upgrades

Scientific distributions

Introducing Jupyter or IPython

Python packages and functions for linear models

NumPy

SciPy

Statsmodels

Scikit-learn

Summary

2. Approaching Simple Linear Regression

Defining a regression problem

Linear models and supervised learning

Reflecting on predictive variables

Reflecting on response variables

The family of linear models

Preparing to discover simple linear regression

Starting from the basics

A measure of linear relationship

Extending to linear regression

Regressing with Statsmodels

The coefficient of determination

Meaning and significance of coefficients

Evaluating the fitted values

Correlation is not causation

Predicting with a regression model

Regressing with Scikit-learn

Minimizing the cost function

Explaining the reason for using squared errors

Pseudoinverse and other optimization methods

Gradient descent at work

Summary

3. Multiple Regression in Action

Using multiple features

Model building with Statsmodels

Using formulas as an alternative

The correlation matrix

Revisiting gradient descent

Feature scaling

Unstandardizing coefficients

Estimating feature importance

Inspecting standardized coefficients

Comparing models by R-squared

Interaction models

Discovering interactions

Polynomial regression

Testing linear versus cubic transformation

Going for higher-degree solutions

Introducing underfitting and overfitting

Summary

4. Logistic Regression

Defining a classification problem

Formalization of the problem: binary classification

Assessing the classifier's performance

Defining a probability-based approach

More on the logistic and logit functions

Let's see some code

Pros and cons of logistic regression

Revisiting gradient descent

Multiclass Logistic Regression

An example

Summary

5. Data Preparation

Numeric feature scaling

Mean centering

Standardization

Normalization

The logistic regression case

Qualitative feature encoding

Dummy coding with Pandas

DictVectorizer and one-hot encoding

Feature hasher

Numeric feature transformation

Observing residuals

Summarizations by binning

Missing data

Missing data imputation

Keeping track of missing values

Outliers

Outliers on the response

Outliers among the predictors

Removing or replacing outliers

Summary

6. Achieving Generalization

Checking on out-of-sample data

Testing by sample split

Cross-validation

Bootstrapping

Greedy selection of features

The Madelon dataset

Univariate selection of features

Recursive feature selection

Regularization optimized by grid-search

Ridge (L2 regularization)

Grid search for optimal parameters

Random grid search

Lasso (L1 regularization)

Elastic net

Stability selection

Experimenting with the Madelon

Summary

7. Online and Batch Learning

Batch learning

Online mini-batch learning

A real example

Streaming scenario without a test set

Summary

8. Advanced Regression Methods

Least Angle Regression

Visual showcase of LARS

A code example

LARS wrap up

Bayesian regression

Bayesian regression wrap up

SGD classification with hinge loss

Comparison with logistic regression

SVR

SVM wrap up

Regression trees (CART)

Regression tree wrap up

Bagging and boosting

Bagging

Boosting

Ensemble wrap up

Gradient Boosting Regressor with LAD

GBM with LAD wrap up

Summary

9. Real-world Applications for Regression Models

Downloading the datasets

Time series problem dataset

Regression problem dataset

Multiclass classification problem dataset

Ranking problem dataset

A regression problem

Testing a classifier instead of a regressor

An imbalanced and multiclass classification problem

A ranking problem

A time series problem

Open questions

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部