售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Preface
About the Book
About the Authors
Learning Objectives
Audience
Approach
Minimum Hardware Requirements
Software Requirements
Conventions
Installation and Setup
Installing the Code Bundle
Additional Resources
Chapter 1:
R for Advanced Analytics
Introduction
Working with Real-World Datasets
Exercise 1: Using the unzip Method for Unzipping a Downloaded File
Reading Data from Various Data Formats
CSV Files
Exercise 2: Reading a CSV File and Summarizing its Column
JSON
Exercise 3: Reading a JSON file and Storing the Data in DataFrame
Text
Exercise 4: Reading a CSV File with Text Column and Storing the Data in VCorpus
Write R Markdown Files for Code Reproducibility
Activity 1: Create an R Markdown File to Read a CSV File and Write a Summary of Data
Data Structures in R
Vector
Matrix
Exercise 5: Performing Transformation on the Data to Make it Available for the Analysis
List
Exercise 6: Using the List Method for Storing Integers and Characters Together
Activity 2: Create a List of Two Matrices and Access the Values
DataFrame
Exercise 7: Performing Integrity Checks Using DataFrame
Data Table
Exercise 8: Exploring the File Read Operation
Data Processing and Transformation
cbind
Exercise 9: Exploring the cbind Function
rbind
Exercise 10: Exploring the rbind Function
The merge Function
Exercise 11: Exploring the merge Function
Inner Join
Left Join
Right Join
Full Join
The reshape Function
Exercise 12: Exploring the reshape Function
The aggregate Function
The Apply Family of Functions
The apply Function
Exercise 13: Implementing the apply Function
The lapply Function
Exercise 14: Implementing the lapply Function
The sapply Function
The tapply Function
Useful Packages
The dplyr Package
Exercise 15: Implementing the dplyr Package
The tidyr Package
Exercise 16: Implementing the tidyr Package
Activity 3: Create a DataFrame with Five Summary Statistics for All Numeric Variables from Bank Data Using dplyr and tidyr
The plyr Package
Exercise 17: Exploring the plyr Package
The caret Package
Data Visualization
Scatterplot
Scatter Plot between Age and Balance split by Marital Status
Line Charts
Histogram
Boxplot
Summary
Chapter 2:
Exploratory Analysis of Data
Introduction
Defining the Problem Statement
Problem-Designing Artifacts
Understanding the Science Behind EDA
Exploratory Data Analysis
Exercise 18: Studying the Data Dimensions
Univariate Analysis
Exploring Numeric/Continuous Features
Exercise 19: Visualizing Data Using a Box Plot
Exercise 20: Visualizing Data Using a Histogram
Exercise 21: Visualizing Data Using a Density Plot
Exercise 22: Visualizing Multiple Variables Using a Histogram
Activity 4: Plotting Multiple Density Plots and Boxplots
Exercise 23: Plotting a Histogram for the nr.employed, euribor3m, cons.conf.idx, and duration Variables
Exploring Categorical Features
Exercise 24: Exploring Categorical Features
Exercise 25: Exploring Categorical Features Using a Bar Chart
Exercise 26: Exploring Categorical Features using Pie Chart
Exercise 27: Automate Plotting Categorical Variables
Exercise 28: Automate Plotting for the Remaining Categorical Variables
Exercise 29: Exploring the Last Remaining Categorical Variable and the Target Variable
Bivariate Analysis
Studying the Relationship between Two Numeric Variables
Exercise 30: Studying the Relationship between Employee Variance Rate and Number of Employees
Studying the Relationship between a Categorical and a Numeric Variable
Exercise 31: Studying the Relationship between the y and age Variables
Exercise 32: Studying the Relationship between the Average Value and the y Variable
Exercise 33: Studying the Relationship between the cons.price.idx, cons.conf.idx, curibor3m, and nr.employed Variables
Studying the Relationship Between Two Categorical Variables
Exercise 34: Studying the Relationship Between the Target y and marital status Variables
Exercise 35: Studying the Relationship between the job and education Variables
Multivariate Analysis
Validating Insights Using Statistical Tests
Categorical Dependent and Numeric/Continuous Independent Variables
Exercise 36: Hypothesis 1 Testing for Categorical Dependent Variables and Continuous Independent Variables
Exercise 37: Hypothesis 2 Testing for Categorical Dependent Variables and Continuous Independent Variables
Categorical Dependent and Categorical Independent Variables
Exercise 38: Hypothesis 3 Testing for Categorical Dependent Variables and Categorical Independent Variables
Exercise 39: Hypothesis 4 and 5 Testing for a Categorical Dependent Variable and a Categorical Independent Variable
Collating Insights – Refine the Solution to the Problem
Summary
Chapter 3:
Introduction to Supervised Learning
Introduction
Summary of the Beijing PM2.5 Dataset
Exercise 40: Exploring the Data
Regression and Classification Problems
Machine Learning Workflow
Design the Problem
Source and Prepare Data
Code the Model
Train and Evaluate
Exercise 41: Creating a Train-and-Test Dataset Randomly Generated by the Beijing PM2.5 Dataset
Deploy the Model
Regression
Simple and Multiple Linear Regression
Assumptions in Linear Regression Models
Exploratory Data Analysis (EDA)
Exercise 42: Exploring the Time Series Views of PM2.5, DEWP, TEMP, and PRES variables of the Beijing PM2.5 Dataset
Exercise 43: Undertaking Correlation Analysis
Exercise 44: Drawing a Scatterplot to Explore the Relationship between PM2.5 Levels and Other Factors
Activity 5: Draw a Scatterplot between PRES and PM2.5 Split by Months
Model Building
Exercise 45: Exploring Simple and Multiple Regression Models
Model Interpretation
Classification
Logistic Regression
A Brief Introduction
Mechanics of Logistic Regression
Model Building
Exercise 46: Storing the Rolling 3-Hour Average in the Beijing PM2.5 Dataset
Activity 6: Transforming Variables and Deriving New Variables to Build a Model
Interpreting a Model
Evaluation Metrics
Mean Absolute Error (MAE)
Root Mean Squared Error (RMSE)
R-squared
Adjusted R-square
Mean Reciprocal Rank (MRR)
Exercise 47: Finding Evaluation Metrics
Confusion Matrix-Based Metrics
Accuracy
Sensitivity
Specificity
F1 Score
Exercise 48: Working with Model Evaluation on Training Data
Receiver Operating Characteristic (ROC) Curve
Exercise 49: Creating an ROC Curve
Summary
Chapter 4:
Regression
Introduction
Linear Regression
Exercise 50: Print the Coefficient and Residual Values Using the multiple_PM_25_linear_model Object
Activity 7: Printing Various Attributes Using Model Object Without Using the Summary Function
Exercise 51: Add the Interaction Term DEWP:TEMP:month in the lm() Function
Model Diagnostics
Exercise 52: Generating and Fitting Models Using the Linear and Quadratic Equations
Residual versus Fitted Plot
Normal Q-Q Plot
Scale-Location Plot
Residual versus Leverage
Improving the Model
Transform the Predictor or Target Variable
Choose a Non-Linear Model
Remove an Outlier or Influential Point
Adding the Interaction Effect
Quantile Regression
Exercise 53: Fit a Quantile Regression on the Beijing PM2.5 Dataset
Exercise 54: Plotting Various Quantiles with More Granularity
Polynomial Regression
Exercise 55: Performing Uniform Distribution Using the runif() Function
Ridge Regression
Regularization Term – L2 Norm
Exercise 56: Ridge Regression on the Beijing PM2.5 dataset
LASSO Regression
Exercise 57: LASSO Regression
Elastic Net Regression
Exercise 58: Elastic Net Regression
Comparison between Coefficients and Residual Standard Error
Exercise 59: Computing the RSE of Linear, Ridge, LASSO, and Elastic Net Regressions
Poisson Regression
Exercise 60: Performing Poisson Regression
Exercise 61: Computing Overdispersion
Cox Proportional-Hazards Regression Model
NCCTG Lung Cancer Data
Exercise 62: Exploring the NCCTG Lung Cancer Data Using Cox-Regression
Summary
Chapter 5:
Classification
Introduction
Getting Started with the Use Case
Some Background on the Use Case
Defining the Problem Statement
Data Gathering
Exercise 63: Exploring Data for the Use Case
Exercise 64: Calculating the Null Value Percentage in All Columns
Exercise 65: Removing Null Values from the Dataset
Exercise 66: Engineer Time-Based Features from the Date Variable
Exercise 67: Exploring the Location Frequency
Exercise 68: Engineering the New Location with Reduced Levels
Classification Techniques for Supervised Learning
Logistic Regression
How Does Logistic Regression Work?
Exercise 69: Build a Logistic Regression Model
Interpreting the Results of Logistic Regression
Evaluating Classification Models
Confusion Matrix and Its Derived Metrics
What Metric Should You Choose?
Evaluating Logistic Regression
Exercise 70: Evaluate a Logistic Regression Model
Exercise 71: Develop a Logistic Regression Model with All of the Independent Variables Available in Our Use Case
Activity 8: Building a Logistic Regression Model with Additional Features
Decision Trees
How Do Decision Trees Work?
Exercise 72: Create a Decision Tree Model in R
Activity 9: Create a Decision Tree Model with Additional Control Parameters
Ensemble Modelling
Random Forest
Why Are Ensemble Models Used?
Bagging – Predecessor to Random Forest
How Does Random Forest Work?
Exercise 73: Building a Random Forest Model in R
Activity 10: Build a Random Forest Model with a Greater Number of Trees
XGBoost
How Does the Boosting Process Work?
What Are Some Popular Boosting Techniques?
How Does XGBoost Work?
Implementing XGBoost in R
Exercise 74: Building an XGBoost Model in R
Exercise 75: Improving the XGBoost Model's Performance
Deep Neural Networks
A Deeper Look into Deep Neural Networks
How Does the Deep Learning Model Work?
What Framework Do We Use for Deep Learning Models?
Building a Deep Neural Network in Keras
Exercise 76: Build a Deep Neural Network in R using R Keras
Choosing the Right Model for Your Use Case
Summary
Chapter 6:
Feature Selection and Dimensionality Reduction
Introduction
Feature Engineering
Discretization
Exercise 77: Performing Binary Discretization
Multi-Category Discretization
Exercise 78: Demonstrating the Use of Quantile Function
One-Hot Encoding
Exercise 79: Using One-Hot Encoding
Activity 11: Converting the CBWD Feature of the Beijing PM2.5 Dataset into One-Hot Encoded Columns
Log Transformation
Exercise 80: Performing Log Transformation
Feature Selection
Univariate Feature Selection
Exercise 81: Exploring Chi-Squared
Highly Correlated Variables
Exercise 82: Plotting a Correlated Matrix
Model-Based Feature Importance Ranking
Exercise 83: Exploring RFE Using RF
Exercise 84: Exploring the Variable Importance using the Random Forest Model
Feature Reduction
Principal Component Analysis (PCA)
Exercise 85: Performing PCA
Variable Clustering
Exercise 86: Using Variable Clustering
Linear Discriminant Analysis for Feature Reduction
Exercise 87: Exploring LDA
Summary
Chapter 7:
Model Improvements
Introduction
Bias-Variance Trade-off
What is Bias and Variance in Machine Learning Models?
Underfitting and Overfitting
Defining a Sample Use Case
Exercise 88: Loading and Exploring Data
Cross-Validation
Holdout Approach/Validation
Exercise 89: Performing Model Assessment Using Holdout Validation
K-Fold Cross-Validation
Exercise 90: Performing Model Assessment Using K-Fold Cross-Validation
Hold-One-Out Validation
Exercise 91: Performing Model Assessment Using Hold-One-Out Validation
Hyperparameter Optimization
Grid Search Optimization
Exercise 92: Performing Grid Search Optimization – Random Forest
Exercise 93: Grid Search Optimization – XGBoost
Random Search Optimization
Exercise 94: Using Random Search Optimization on a Random Forest Model
Exercise 95: Random Search Optimization – XGBoost
Bayesian Optimization
Exercise 96: Performing Bayesian Optimization on the Random Forest Model
Exercise 97: Performing Bayesian Optimization using XGBoost
Activity 12: Performing Repeated K-Fold Cross Validation and Grid Search Optimization
Summary
Chapter 8:
Model Deployment
Introduction
What is an API?
Introduction to plumber
Exercise 98: Developing an ML Model and Deploying It as a Web Service Using Plumber
Challenges in Deploying Models with plumber
A Brief History of the Pre-Docker Era
Docker
Deploying the ML Model Using Docker and plumber
Exercise 99: Create a Docker Container for the R plumber Application
Disadvantages of Using plumber to Deploy R Models
Amazon Web Services
Introducing AWS SageMaker
Deploying an ML Model Endpoint Using SageMaker
Exercise 100: Deploy the ML Model as a SageMaker Endpoint
What is Amazon Lambda?
What is Amazon API Gateway?
Building Serverless ML Applications
Exercise 101: Building a Serverless Application Using API Gateway, AWS Lambda, and SageMaker
Deleting All Cloud Resources to Stop Billing
Activity 13: Deploy an R Model Using plumber
Summary
Chapter 9:
Capstone Project - Based on Research Papers
Introduction
Exploring Research Work
The mlr Package
OpenML Package
Problem Design from the Research Paper
Features in Scene Dataset
Implementing Multilabel Classifier Using the mlr and OpenML Packages
Exercise 102: Downloading the Scene Dataset from OpenML
Constructing a Learner
Adaptation Methods
Transformation Methods
Binary Relevance Method
Classifier Chains Method
Nested Stacking
Dependent Binary Relevance Method
Stacking
Exercise 103: Generating Decision Tree Model Using the classif.rpart Method
Train the Model
Exercise 104: Train the Model
Predicting the Output
Performance of the Model
Resampling the Data
Binary Performance for Each Label
Benchmarking Model
Conducting Benchmark Experiments
Exercise 105: Exploring How to Conduct a Benchmarking on Various Learners
Accessing Benchmark Results
Learner Performances
Predictions
Learners and measures
Activity 14: Getting the Binary Performance Step with classif.C50 Learner Instead of classif.rpart
Working with OpenML Upload Functions
Summary
Appendix
Chapter 1: R for Advanced Analytics
Activity 1: Create an R Markdown File to Read a CSV File and Write a Summary of Data
Activity 2: Create a List of Two Matrices and Access the Values
Activity 3: Create a DataFrame with Five Summary Statistics for All Numeric Variables from Bank Data Using dplyr and tidyr
Chapter 2: Exploratory Analysis of Data
Activity 4: Plotting Multiple Density Plots and Boxplots
Chapter 3: Introduction to Supervised Learning
Activity 5: Draw a Scatterplot between PRES and PM2.5 Split by Months
Activity 6: Transforming Variables and Deriving New Variables to Build a Model
Chapter 4: Regression
Activity 7: Printing Various Attributes Using Model Object Without Using the summary Function
Chapter 5: Classification
Activity 8: Building a Logistic Regression Model with Additional Features
Activity 9: Create a Decision Tree Model with Additional Control Parameters
Activity 10: Build a Random Forest Model with a Greater Number of Trees
Chapter 6: Feature Selection and Dimensionality Reduction
Activity 11: Converting the CBWD Feature of the Beijing PM2.5 Dataset into One-Hot Encoded Columns
Chapter 7: Model Improvements
Activity 12: Perform Repeated K-Fold Cross Validation and Grid Search Optimization
Chapter 8: Model Deployment
Activity 13: Deploy an R Model Using Plumber
Chapter 9: Capstone Project - Based on Research Papers
Activity 14: Getting the Binary Performance Step with classif.C50 Learner Instead of classif.rpart
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜