万本电子书0元读

万本电子书0元读

顶部广告

Learning Predictive Analytics with R电子书

售       价:¥

4人正在读 | 0人评论 6.2

作       者:Eric Mayor

出  版  社:Packt Publishing

出版时间:2015-09-24

字       数:165.8万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Get to grips with key data visualization and predictive analytic skills using R About This Book Acquire predictive analytic skills using various tools of R Make predictions about future events by discovering valuable information from data using R Comprehensible guidelines that focus on predictive model design with real-world data Who This Book Is For If you are a statistician, chief information officer, data scientist, ML engineer, ML practitioner, quantitative analyst, and student of machine learning, this is the book for you. You should have basic knowledge of the use of R. Readers without previous experience of programming in R will also be able to use the tools in the book. What You Will Learn Customize R by installing and loading new packages Explore the structure of data using clustering algorithms Turn unstructured text into ordered data, and acquire knowledge from the data Classify your observations using Na?ve Bayes, k-NN, and decision trees Reduce the dimensionality of your data using principal component analysis Discover association rules using Apriori Understand how statistical distributions can help retrieve information from data using correlations, linear regression, and multilevel regression Use PMML to deploy the models generated in R In Detail R is statistical software that is used for data analysis. There are two main types of learning from data: unsupervised learning, where the structure of data is extracted automatically; and supervised learning, where a labeled part of the data is used to learn the relationship or scores in a target attribute. As important information is often hidden in a lot of data, R helps to extract that information with its many standard and cutting-edge statistical functions. This book is packed with easy-to-follow guidelines that explain the workings of the many key data mining tools of R, which are used to discover knowledge from your data. You will learn how to perform key predictive analytics tasks using R, such as train and test predictive models for classification and regression tasks, score new data sets and so on. All chapters will guide you in acquiring the skills in a practical way. Most chapters also include a theoretical introduction that will sharpen your understanding of the subject matter and invite you to go further. The book familiarizes you with the most common data mining tools of R, such as k-means, hierarchical regression, linear regression, association rules, principal component analysis, multilevel modeling, k-NN, Na?ve Bayes, decision trees, and text mining. It also provides a de*ion of visualization techniques using the basic visualization tools of R as well as lattice for visualizing patterns in data organized in groups. This book is invaluable for anyone fascinated by the data mining opportunities offered by GNU R and its packages. Style and approach This is a practical book, which analyzes compelling data about life, health, and death with the help of tutorials. It offers you a useful way of interpreting the data that’s specific to this book, but that can also be applied to any other data.
目录展开

Learning Predictive Analytics with R

Table of Contents

Learning Predictive Analytics with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

Prediction

Supervised and unsupervised learning

Unsupervised learning

Supervised learning

Classification and regression problems

Classification

Regression

The role of field knowledge in data modeling

Caveats

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

eBooks, discount offers, and more

Questions

1. Setting GNU R for Predictive Analytics

Installing GNU R

The R graphic user interface

The menu bar of the R console

A quick look at the File menu

A quick look at the Misc menu

Packages

Installing packages in R

Loading packages in R

Summary

2. Visualizing and Manipulating Data Using R

The roulette case

Histograms and bar plots

Scatterplots

Boxplots

Line plots

Application – Outlier detection

Formatting plots

Summary

3. Data Visualization with Lattice

Loading and discovering the lattice package

Discovering multipanel conditioning with xyplot()

Discovering other lattice plots

Histograms

Stacked bars

Dotplots

Displaying data points as text

Updating graphics

Case study – exploring cancer-related deaths in the US

Discovering the dataset

Integrating supplementary external data

Summary

4. Cluster Analysis

Distance measures

Learning by doing – partition clustering with kmeans()

Setting the centroids

Computing distances to centroids

Computing the closest cluster for each case

Tasks performed by the main function

Internal validation

Using k-means with public datasets

Understanding the data with the all.us.city.crime.1970 dataset

Finding the best number of clusters in the life.expectancy.1971 dataset

External validation

Summary

5. Agglomerative Clustering Using hclust()

The inner working of agglomerative clustering

Agglomerative clustering with hclust()

Exploring the results of votes in Switzerland

The use of hierarchical clustering on binary attributes

Summary

6. Dimensionality Reduction with Principal Component Analysis

The inner working of Principal Component Analysis

Learning PCA in R

Dealing with missing values

Selecting how many components are relevant

Naming the components using the loadings

PCA scores

Accessing the PCA scores

PCA scores for analysis

PCA diagnostics

Summary

7. Exploring Association Rules with Apriori

Apriori – basic concepts

Association rules

Itemsets

Support

Confidence

Lift

The inner working of apriori

Generating itemsets with support-based pruning

Generating rules by using confidence-based pruning

Analyzing data with apriori in R

Using apriori for basic analysis

Detailed analysis with apriori

Preparing the data

Analyzing the data

Coercing association rules to a data frame

Visualizing association rules

Summary

8. Probability Distributions, Covariance, and Correlation

Probability distributions

Introducing probability distributions

Discrete uniform distribution

The normal distribution

The Student's t-distribution

The binomial distribution

The importance of distributions

Covariance and correlation

Covariance

Correlation

Pearson's correlation

Spearman's correlation

Summary

9. Linear Regression

Understanding simple regression

Computing the intercept and slope coefficient

Obtaining the residuals

Computing the significance of the coefficient

Working with multiple regression

Analyzing data in R: correlation and regression

First steps in the data analysis

Performing the regression

Checking for the normality of residuals

Checking for variance inflation

Examining potential mediations and comparing models

Predicting new data

Robust regression

Bootstrapping

Summary

10. Classification with k-Nearest Neighbors and Naïve Bayes

Understanding k-NN

Working with k-NN in R

How to select k

Understanding Naïve Bayes

Working with Naïve Bayes in R

Computing the performance of classification

Summary

11. Classification Trees

Understanding decision trees

ID3

Entropy

Information gain

C4.5

The gain ratio

Post-pruning

C5.0

Classification and regression trees and random forest

CART

Random forest

Bagging

Conditional inference trees and forests

Installing the packages containing the required functions

Installing C4.5

Installing C5.0

Installing CART

Installing random forest

Installing conditional inference trees

Loading and preparing the data

Performing the analyses in R

Classification with C4.5

The unpruned tree

The pruned tree

C50

CART

Pruning

Random forests in R

Examining the predictions on the testing set

Conditional inference trees in R

Caret – a unified framework for classification

Summary

12. Multilevel Analyses

Nested data

Multilevel regression

Random intercepts and fixed slopes

Random intercepts and random slopes

Multilevel modeling in R

The null model

Random intercepts and fixed slopes

Random intercepts and random slopes

Predictions using multilevel models

Using the predict() function

Assessing prediction quality

Summary

13. Text Analytics with R

An introduction to text analytics

Loading the corpus

Data preparation

Preprocessing and inspecting the corpus

Computing new attributes

Creating the training and testing data frames

Classification of the reviews

Document classification with k-NN

Document classification with Naïve Bayes

Classification using logistic regression

Document classification with support vector machines

Mining the news with R

A successful document classification

Extracting the topics of the articles

Collecting news articles in R from the New York Times article search API

Summary

14. Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML

Cross-validation and bootstrapping of predictive models using the caret package

Cross-validation

Performing cross-validation in R with caret

Bootstrapping

Performing bootstrapping in R with caret

Predicting new data

Exporting models using PMML

What is PMML?

A brief description of the structure of PMML objects

Examples of predictive model exportation

Exporting k-means objects

Hierarchical clustering

Exporting association rules (apriori objects)

Exporting Naïve Bayes objects

Exporting decision trees (rpart objects)

Exporting random forest objects

Exporting logistic regression objects

Exporting support vector machine objects

Summary

A. Exercises and Solutions

Exercises

Chapter 1 – Setting GNU R for Predictive Modeling

Chapter 2 – Visualizing and Manipulating Data Using R

Chapter 3 – Data Visualization with Lattice

Chapter 4 – Cluster Analysis

Chapter 5 – Agglomerative Clustering Using hclust()

Chapter 6 – Dimensionality Reduction with Principal Component Analysis

Chapter 7 – Exploring Association Rules with Apriori

Chapter 8 – Probability Distributions, Covariance, and Correlation

Chapter 9 – Linear Regression

Chapter 10 – Classification with k-Nearest Neighbors and Naïve Bayes

Chapter 11 – Classification Trees

Chapter 12 – Multilevel Analyses

Chapter 13 – Text Analytics with R

Solutions

Chapter 1 – Setting GNU R for Predictive Modeling

Chapter 2 – Visualizing and Manipulating Data Using R

Chapter 3 – Data Visualization with Lattice

Chapter 4 – Cluster Analysis

Chapter 5 – Agglomerative Clustering Using hclust()

Chapter 6 – Dimensionality Reduction with Principal Component Analysis

Chapter 7 – Exploring Association Rules with Apriori

Chapter 8 – Probability Distributions, Covariance, and Correlation

Chapter 9 – Linear Regression

Chapter 10 – Classification with k-Nearest Neighbors and Naïve Bayes

Chapter 11 – Classification Trees

Chapter 12 – Multilevel Analyses

Chapter 13 – Text Analytics with R

B. Further Reading and References

Preface

Chapter 1 – Setting GNU R for Predictive Modeling

Chapter 2 – Visualizing and Manipulating Data Using R

Chapter 3 – Data Visualization with Lattice

Chapter 4 – Cluster Analysis

Chapter 5 – Agglomerative Clustering Using hclust()

Chapter 6 – Dimensionality Reduction with Principal Component Analysis

Chapter 7 – Exploring Association Rules with Apriori

Chapter 8 – Probability Distributions, Covariance, and Correlation

Chapter 9 – Linear Regression

Chapter 10 – Classification with k-Nearest Neighbors and Naïve Bayes

Chapter 11 – Classification Trees

Chapter 12 – Multilevel Analyses

Chapter 13 – Text Analytics with R

Chapter 14 – Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部