万本电子书0元读

万本电子书0元读

顶部广告

Machine Learning with R电子书

售       价:¥

6人正在读 | 0人评论 6.2

作       者:Brett Lantz

出  版  社:Packt Publishing

出版时间:2019-04-15

字       数:571.8万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Solve real-world data problems with R and machine learning Key Features * Third edition of the bestselling, widely acclaimed R machine learning book, updated and improved for R 3.5 and beyond * Harness the power of R to build flexible, effective, and transparent machine learning models * Learn quickly with a clear, hands-on guide by experienced machine learning teacher and practitioner, Brett Lantz Book Description Machine learning, at its core, is concerned with transforming data into actionable knowledge. R offers a powerful set of machine learning methods to quickly and easily gain insight from your data. Machine Learning with R, Third Edition provides a hands-on, readable guide to applying machine learning to real-world problems. Whether you are an experienced R user or new to the language, Brett Lantz teaches you everything you need to uncover key insights, make new predictions, and visualize your findings. This new 3rd edition updates the classic R data science book with newer and better libraries, advice on ethical and bias issues in machine learning, and an introduction to deep learning. Find powerful new insights in your data; discover machine learning with R. What you will learn * Discover the origins of machine learning and how exactly a computer learns by example * Prepare your data for machine learning work with the R programming language * Classify important outcomes using nearest neighbor and Bayesian methods * Predict future events using decision trees, rules, and support vector machines * Forecast numeric data and estimate financial values using regression methods * Model complex processes with artificial neural networks — the basis of deep learning * Avoid bias in machine learning models * Evaluate your models and improve their performance * Connect R to SQL databases and emerging big data technologies such as Spark, H2O, and TensorFlow Who this book is for Data scientists, students, and other practitioners who want a clear, accessible guide to machine learning with R.
目录展开

Machine Learning with R - Third Edition

Machine Learning with R - Third Edition

Why subscribe?

Packt.com

Contributors

About the authors

About the reviewer

Preface

Who this book is for

What this book covers

What you need for this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

1. Introducing Machine Learning

The origins of machine learning

Uses and abuses of machine learning

Machine learning successes

The limits of machine learning

Machine learning ethics

How machines learn

Data storage

Abstraction

Generalization

Evaluation

Machine learning in practice

Types of input data

Types of machine learning algorithms

Matching input data to algorithms

Machine learning with R

Installing R packages

Loading and unloading R packages

Installing RStudio

Summary

2. Managing and Understanding Data

R data structures

Vectors

Factors

Lists

Data frames

Matrices and arrays

Managing data with R

Saving, loading, and removing R data structures

Importing and saving data from CSV files

Exploring and understanding data

Exploring the structure of data

Exploring numeric variables

Measuring the central tendency – mean and median

Measuring spread – quartiles and the five-number summary

Visualizing numeric variables – boxplots

Visualizing numeric variables – histograms

Understanding numeric data – uniform and normal distributions

Measuring spread – variance and standard deviation

Exploring categorical variables

Measuring the central tendency – the mode

Exploring relationships between variables

Visualizing relationships – scatterplots

Examining relationships – two-way cross-tabulations

Summary

3. Lazy Learning – Classification Using Nearest Neighbors

Understanding nearest neighbor classification

The k-NN algorithm

Measuring similarity with distance

Choosing an appropriate k

Preparing data for use with k-NN

Why is the k-NN algorithm lazy?

Example – diagnosing breast cancer with the k-NN algorithm

Step 1 – collecting data

Step 2 – exploring and preparing the data

Transformation – normalizing numeric data

Data preparation – creating training and test datasets

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Transformation – z-score standardization

Testing alternative values of k

Summary

4. Probabilistic Learning – Classification Using Naive Bayes

Understanding Naive Bayes

Basic concepts of Bayesian methods

Understanding probability

Understanding joint probability

Computing conditional probability with Bayes' theorem

The Naive Bayes algorithm

Classification with Naive Bayes

The Laplace estimator

Using numeric features with Naive Bayes

Example – filtering mobile phone spam with the Naive Bayes algorithm

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – cleaning and standardizing text data

Data preparation – splitting text documents into words

Data preparation – creating training and test datasets

Visualizing text data – word clouds

Data preparation – creating indicator features for frequent words

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Summary

5. Divide and Conquer – Classification Using Decision Trees and Rules

Understanding decision trees

Divide and conquer

The C5.0 decision tree algorithm

Choosing the best split

Pruning the decision tree

Example – identifying risky bank loans using C5.0 decision trees

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – creating random training and test datasets

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Boosting the accuracy of decision trees

Making some mistakes cost more than others

Understanding classification rules

Separate and conquer

The 1R algorithm

The RIPPER algorithm

Rules from decision trees

What makes trees and rules greedy?

Example – identifying poisonous mushrooms with rule learners

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Summary

6. Forecasting Numeric Data – Regression Methods

Understanding regression

Simple linear regression

Ordinary least squares estimation

Correlations

Multiple linear regression

Example – predicting medical expenses using linear regression

Step 1 – collecting data

Step 2 – exploring and preparing the data

Exploring relationships among features – the correlation matrix

Visualizing relationships among features – the scatterplot matrix

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Model specification – adding nonlinear relationships

Transformation – converting a numeric variable to a binary indicator

Model specification – adding interaction effects

Putting it all together – an improved regression model

Making predictions with a regression model

Understanding regression trees and model trees

Adding regression to trees

Example – estimating the quality of wines with regression trees and model trees

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Visualizing decision trees

Step 4 – evaluating model performance

Measuring performance with the mean absolute error

Step 5 – improving model performance

Summary

7. Black Box Methods – Neural Networks and Support Vector Machines

Understanding neural networks

From biological to artificial neurons

Activation functions

Network topology

The number of layers

The direction of information travel

The number of nodes in each layer

Training neural networks with backpropagation

Example – modeling the strength of concrete with ANNs

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Understanding support vector machines

Classification with hyperplanes

The case of linearly separable data

The case of nonlinearly separable data

Using kernels for nonlinear spaces

Example – performing OCR with SVMs

Step 1 – collecting data

Step 2 – exploring and preparing the data

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Changing the SVM kernel function

Identifying the best SVM cost parameter

Summary

8. Finding Patterns – Market Basket Analysis Using Association Rules

Understanding association rules

The Apriori algorithm for association rule learning

Measuring rule interest – support and confidence

Building a set of rules with the Apriori principle

Example – identifying frequently purchased groceries with association rules

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – creating a sparse matrix for transaction data

Visualizing item support – item frequency plots

Visualizing the transaction data – plotting the sparse matrix

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Sorting the set of association rules

Taking subsets of association rules

Saving association rules to a file or data frame

Summary

9. Finding Groups of Data – Clustering with k-means

Understanding clustering

Clustering as a machine learning task

The k-means clustering algorithm

Using distance to assign and update clusters

Choosing the appropriate number of clusters

Finding teen market segments using k-means clustering

Step 1 – collecting data

Step 2 – exploring and preparing the data

Data preparation – dummy coding missing values

Data preparation – imputing the missing values

Step 3 – training a model on the data

Step 4 – evaluating model performance

Step 5 – improving model performance

Summary

10. Evaluating Model Performance

Measuring performance for classification

Understanding a classifier's predictions

A closer look at confusion matrices

Using confusion matrices to measure performance

Beyond accuracy – other measures of performance

The kappa statistic

Sensitivity and specificity

Precision and recall

The F-measure

Visualizing performance tradeoffs with ROC curves

Estimating future performance

The holdout method

Cross-validation

Bootstrap sampling

Summary

11. Improving Model Performance

Tuning stock models for better performance

Using caret for automated parameter tuning

Creating a simple tuned model

Customizing the tuning process

Improving model performance with meta-learning

Understanding ensembles

Bagging

Boosting

Random forests

Training random forests

Evaluating random forest performance in a simulated competition

Summary

12. Specialized Machine Learning Topics

Managing and preparing real-world data

Making data "tidy" with the tidyverse packages

Generalizing tabular data structures with tibble

Speeding and simplifying data preparation with dplyr

Reading and writing to external data files

Importing tidy tables with readr

Importing Microsoft Excel, SAS, SPSS, and Stata files with rio

Querying data in SQL databases

The tidy approach to managing database connections

Using a database backend with dplyr

A traditional approach to SQL connectivity with RODBC

Working with online data and services

Downloading the complete text of web pages

Parsing the data within web pages

Parsing XML documents

Parsing JSON from web APIs

Working with domain-specific data

Analyzing bioinformatics data

Analyzing and visualizing network data

Improving the performance of R

Managing very large datasets

Making data frames faster with data.table

Creating disk-based data frames with ff

Using massive matrices with bigmemory

Learning faster with parallel computing

Measuring execution time

Working in parallel with multicore and snow

Taking advantage of parallel with foreach and doParallel

Training and evaluating models in parallel with caret

Parallel cloud computing with MapReduce and Hadoop

Parallel cloud computing with Apache Spark

Deploying optimized learning algorithms

Building bigger regression models with biglm

Growing random forests faster with ranger

Growing massive random forests with bigrf

A faster machine learning computing engine with H2O

GPU computing

Flexible numeric computing and machine learning with TensorFlow

An interface for deep learning with Keras

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部