万本电子书0元读

万本电子书0元读

顶部广告

Learning Predictive Analytics with Python电子书

售       价:¥

0人正在读 | 0人评论 9.8

作       者:Ashish Kumar

出  版  社:Packt Publishing

出版时间:2016-02-15

字       数:219.8万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Gain practical insights into predictive modelling by implementing Predictive Analytics algorithms on public datasets with PythonAbout This BookA step-by-step guide to predictive modeling including lots of tips, tricks, and best practicesGet to grips with the basics of Predictive Analytics with PythonLearn how to use the popular predictive modeling algorithms such as Linear Regression, Decision Trees, Logistic Regression, and ClusteringWho This Book Is ForIf you wish to learn how to implement Predictive Analytics algorithms using Python libraries, then this is the book for you. If you are familiar with coding in Python (or some other programming/statistical/*ing language) but have never used or read about Predictive Analytics algorithms, this book will also help you. The book will be beneficial to and can be read by any Data Science enthusiasts. Some familiarity with Python will be useful to get the most out of this book, but it is certainly not a prerequisite.What You Will LearnUnderstand the statistical and mathematical concepts behind Predictive Analytics algorithms and implement Predictive Analytics algorithms using Python librariesAnalyze the result parameters arising from the implementation of Predictive Analytics algorithmsWrite Python modules/functions from scratch to execute segments or the whole of these algorithmsRecognize and mitigate various contingencies and issues related to the implementation of Predictive Analytics algorithmsGet to know various methods of importing, cleaning, sub-setting, merging, joining, concatenating, exploring, grouping, and plotting data with pandas and numpyCreate dummy datasets and simple mathematical simulations using the Python numpy and pandas librariesUnderstand the best practices while handling datasets in Python and creating predictive models out of themIn DetailSocial Media and the Internet of Things have resulted in an avalanche of data. Data is powerful but not in its raw form - It needs to be processed and modeled, and Python is one of the most robust tools out there to do so. It has an array of packages for predictive modeling and a suite of IDEs to choose from. Learning to predict who would win, lose, buy, lie, or die with Python is an indispensable skill set to have in this data age.This book is your guide to getting started with Predictive Analytics using Python. You will see how to process data and make predictive models from it. We balance both statistical and mathematical concepts, and implement them in Python using libraries such as pandas, scikit-learn, and numpy.You’ll start by getting an understanding of the basics of predictive modeling, then you will see how to cleanse your data of impurities and get it ready it for predictive modeling. You will also learn more about the best predictive modeling algorithms such as Linear Regression, Decision Trees, and Logistic Regression. Finally, you will see the best practices in predictive modeling, as well as the different applications of predictive modeling in the modern world.Style and approachAll the concepts in this book been explained and illustrated using a dataset, and in a step-by-step manner. The Python code snippet to implement a method or concept is followed by the output, such as charts, dataset heads, pictures, and so on. The statistical concepts are explained in detail wherever required.
目录展开

Learning Predictive Analytics with Python

Table of Contents

Learning Predictive Analytics with Python

Credits

Foreword

About the Author

Acknowledgments

About the Reviewer

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started with Predictive Modelling

Introducing predictive modelling

Scope of predictive modelling

Ensemble of statistical algorithms

Statistical tools

Historical data

Mathematical function

Business context

Knowledge matrix for predictive modelling

Task matrix for predictive modelling

Applications and examples of predictive modelling

LinkedIn's "People also viewed" feature

What it does?

How is it done?

Correct targeting of online ads

How is it done?

Santa Cruz predictive policing

How is it done?

Determining the activity of a smartphone user using accelerometer data

How is it done?

Sport and fantasy leagues

How was it done?

Python and its packages – download and installation

Anaconda

Standalone Python

Installing a Python package

Installing pip

Installing Python packages with pip

Python and its packages for predictive modelling

IDEs for Python

Summary

2. Data Cleaning

Reading the data – variations and examples

Data frames

Delimiters

Various methods of importing data in Python

Case 1 – reading a dataset using the read_csv method

The read_csv method

Use cases of the read_csv method

Passing the directory address and filename as variables

Reading a .txt dataset with a comma delimiter

Specifying the column names of a dataset from a list

Case 2 – reading a dataset using the open method of Python

Reading a dataset line by line

Changing the delimiter of a dataset

Case 3 – reading data from a URL

Case 4 – miscellaneous cases

Reading from an .xls or .xlsx file

Writing to a CSV or Excel file

Basics – summary, dimensions, and structure

Handling missing values

Checking for missing values

What constitutes missing data?

How missing values are generated and propagated

Treating missing values

Deletion

Imputation

Creating dummy variables

Visualizing a dataset by basic plotting

Scatter plots

Histograms

Boxplots

Summary

3. Data Wrangling

Subsetting a dataset

Selecting columns

Selecting rows

Selecting a combination of rows and columns

Creating new columns

Generating random numbers and their usage

Various methods for generating random numbers

Seeding a random number

Generating random numbers following probability distributions

Probability density function

Cumulative density function

Uniform distribution

Normal distribution

Using the Monte-Carlo simulation to find the value of pi

Geometry and mathematics behind the calculation of pi

Generating a dummy data frame

Grouping the data – aggregation, filtering, and transformation

Aggregation

Filtering

Transformation

Miscellaneous operations

Random sampling – splitting a dataset in training and testing datasets

Method 1 – using the Customer Churn Model

Method 2 – using sklearn

Method 3 – using the shuffle function

Concatenating and appending data

Merging/joining datasets

Inner Join

Left Join

Right Join

An example of the Inner Join

An example of the Left Join

An example of the Right Join

Summary of Joins in terms of their length

Summary

4. Statistical Concepts for Predictive Modelling

Random sampling and the central limit theorem

Hypothesis testing

Null versus alternate hypothesis

Z-statistic and t-statistic

Confidence intervals, significance levels, and p-values

Different kinds of hypothesis test

A step-by-step guide to do a hypothesis test

An example of a hypothesis test

Chi-square tests

Correlation

Summary

5. Linear Regression with Python

Understanding the maths behind linear regression

Linear regression using simulated data

Fitting a linear regression model and checking its efficacy

Finding the optimum value of variable coefficients

Making sense of result parameters

p-values

F-statistics

Residual Standard Error

Implementing linear regression with Python

Linear regression using the statsmodel library

Multiple linear regression

Multi-collinearity

Variance Inflation Factor

Model validation

Training and testing data split

Summary of models

Linear regression with scikit-learn

Feature selection with scikit-learn

Handling other issues in linear regression

Handling categorical variables

Transforming a variable to fit non-linear relations

Handling outliers

Other considerations and assumptions for linear regression

Summary

6. Logistic Regression with Python

Linear regression versus logistic regression

Understanding the math behind logistic regression

Contingency tables

Conditional probability

Odds ratio

Moving on to logistic regression from linear regression

Estimation using the Maximum Likelihood Method

Likelihood function:

Log likelihood function:

Building the logistic regression model from scratch

Making sense of logistic regression parameters

Wald test

Likelihood Ratio Test statistic

Chi-square test

Implementing logistic regression with Python

Processing the data

Data exploration

Data visualization

Creating dummy variables for categorical variables

Feature selection

Implementing the model

Model validation and evaluation

Cross validation

Model validation

The ROC curve

Confusion matrix

Summary

7. Clustering with Python

Introduction to clustering – what, why, and how?

What is clustering?

How is clustering used?

Why do we do clustering?

Mathematics behind clustering

Distances between two observations

Euclidean distance

Manhattan distance

Minkowski distance

The distance matrix

Normalizing the distances

Linkage methods

Single linkage

Compete linkage

Average linkage

Centroid linkage

Ward's method

Hierarchical clustering

K-means clustering

Implementing clustering using Python

Importing and exploring the dataset

Normalizing the values in the dataset

Hierarchical clustering using scikit-learn

K-Means clustering using scikit-learn

Interpreting the cluster

Fine-tuning the clustering

The elbow method

Silhouette Coefficient

Summary

8. Trees and Random Forests with Python

Introducing decision trees

A decision tree

Understanding the mathematics behind decision trees

Homogeneity

Entropy

Information gain

ID3 algorithm to create a decision tree

Gini index

Reduction in Variance

Pruning a tree

Handling a continuous numerical variable

Handling a missing value of an attribute

Implementing a decision tree with scikit-learn

Visualizing the tree

Cross-validating and pruning the decision tree

Understanding and implementing regression trees

Regression tree algorithm

Implementing a regression tree using Python

Understanding and implementing random forests

The random forest algorithm

Implementing a random forest using Python

Why do random forests work?

Important parameters for random forests

Summary

9. Best Practices for Predictive Modelling

Best practices for coding

Commenting the codes

Defining functions for substantial individual tasks

Example 1

Example 2

Example 3

Avoid hard-coding of variables as much as possible

Version control

Using standard libraries, methods, and formulas

Best practices for data handling

Best practices for algorithms

Best practices for statistics

Best practices for business contexts

Summary

A. A List of Links

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部