售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Preface
About the Book
About the Author
Objectives
Audience
Approach
Hardware Requirements
Software Requirements
Installation and Setup
Conventions
Chapter 1:
Data Exploration and Cleaning
Introduction
Python and the Anaconda Package Management System
Indexing and the Slice Operator
Exercise 1: Examining Anaconda and Getting Familiar with Python
Different Types of Data Science Problems
Loading the Case Study Data with Jupyter and pandas
Exercise 2: Loading the Case Study Data in a Jupyter Notebook
Getting Familiar with Data and Performing Data Cleaning
The Business Problem
Data Exploration Steps
Exercise 3: Verifying Basic Data Integrity
Boolean Masks
Exercise 4: Continuing Verification of Data Integrity
Exercise 5: Exploring and Cleaning the Data
Data Quality Assurance and Exploration
Exercise 6: Exploring the Credit Limit and Demographic Features
Deep Dive: Categorical Features
Exercise 7: Implementing OHE for a Categorical Feature
Exploring the Financial History Features in the Dataset
Activity 1: Exploring Remaining Financial Features in the Dataset
Summary
Chapter 2:
Introduction to Scikit-Learn and Model Evaluation
Introduction
Exploring the Response Variable and Concluding the Initial Exploration
Introduction to Scikit-Learn
Generating Synthetic Data
Data for a Linear Regression
Exercise 8: Linear Regression in Scikit-Learn
Model Performance Metrics for Binary Classification
Splitting the Data: Training and Testing sets
Classification Accuracy
True Positive Rate, False Positive Rate, and Confusion Matrix
Exercise 9: Calculating the True and False Positive and Negative Rates and Confusion Matrix in Python
Discovering Predicted Probabilities: How Does Logistic Regression Make Predictions?
Exercise 10: Obtaining Predicted Probabilities from a Trained Logistic Regression Model
The Receiver Operating Characteristic (ROC) Curve
Precision
Activity 2: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve
Summary
Chapter 3:
Details of Logistic Regression and Feature Exploration
Introduction
Examining the Relationships between Features and the Response
Pearson Correlation
F-test
Exercise 11: F-test and Univariate Feature Selection
Finer Points of the F-test: Equivalence to t-test for Two Classes and Cautions
Hypotheses and Next Steps
Exercise 12: Visualizing the Relationship between Features and Response
Univariate Feature Selection: What It Does and Doesn't Do
Understanding Logistic Regression with function Syntax in Python and the Sigmoid Function
Exercise 13: Plotting the Sigmoid Function
Scope of Functions
Why is Logistic Regression Considered a Linear Model?
Exercise 14: Examining the Appropriateness of Features for Logistic Regression
From Logistic Regression Coefficients to Predictions Using the Sigmoid
Exercise 15: Linear Decision Boundary of Logistic Regression
Activity 3: Fitting a Logistic Regression Model and Directly Using the Coefficients
Summary
Chapter 4:
The Bias-Variance Trade-off
Introduction
Estimating the Coefficients and Intercepts of Logistic Regression
Gradient Descent to Find Optimal Parameter Values
Exercise 16: Using Gradient Descent to Minimize a Cost Function
Assumptions of Logistic Regression
The Motivation for Regularization: The Bias-Variance Trade-off
Exercise 17: Generating and Modeling Synthetic Classification Data
Lasso (L1) and Ridge (L2) Regularization
Cross Validation: Choosing the Regularization Parameter and Other Hyperparameters
Exercise 18: Reducing Overfitting on the Synthetic Data Classification Problem
Options for Logistic Regression in Scikit-Learn
Scaling Data, Pipelines, and Interaction Features in Scikit-Learn
Activity 4: Cross-Validation and Feature Engineering with the Case Study Data
Summary
Chapter 5:
Decision Trees and Random Forests
Introduction
Decision trees
The Terminology of Decision Trees and Connections to Machine Learning
Exercise 19: A Decision Tree in scikit-learn
Training Decision Trees: Node Impurity
Features Used for the First splits: Connections to Univariate Feature Selection and Interactions
Training Decision Trees: A Greedy Algorithm
Training Decision Trees: Different Stopping Criteria
Using Decision Trees: Advantages and Predicted Probabilities
A More Convenient Approach to Cross-Validation
Exercise 20: Finding Optimal Hyperparameters for a Decision Tree
Random Forests: Ensembles of Decision Trees
Random Forest: Predictions and Interpretability
Exercise 21: Fitting a Random Forest
Checkerboard Graph
Activity 5: Cross-Validation Grid Search with Random Forest
Summary
Chapter 6:
Imputation of Missing Data, Financial Analysis, and Delivery to Client
Introduction
Review of Modeling Results
Dealing with Missing Data: Imputation Strategies
Preparing Samples with Missing Data
Exercise 22: Cleaning the Dataset
Exercise 23: Mode and Random Imputation of PAY_1
A Predictive Model for PAY_1
Exercise 24: Building a Multiclass Classification Model for Imputation
Using the Imputation Model and Comparing it to Other Methods
Confirming Model Performance on the Unseen Test Set
Financial Analysis
Financial Conversation with the Client
Exercise 25: Characterizing Costs and Savings
Activity 6: Deriving Financial Insights
Final Thoughts on Delivering the Predictive Model to the Client
Summary
Appendix
Chapter 1: Data Exploration and Cleaning
Activity 1: Exploring Remaining Financial Features in the Dataset
Chapter 2: Introduction to Scikit-Learn and Model Evaluation
Activity 2: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve
Chapter 3: Details of Logistic Regression and Feature Exploration
Activity 3: Fitting a Logistic Regression Model and Directly Using the Coefficients
Chapter 4: The Bias-Variance Trade-off
Activity 4: Cross-Validation and Feature Engineering with the Case Study Data
Chapter 5: Decision Trees and Random Forests
Activity 5: Cross-Validation Grid Search with Random Forest
Chapter 6: Imputation of Missing Data, Financial Analysis, and Delivery to Client
Activity 6: Deriving Financial Insights
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜