售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Title Page
Copyright and Credits
Data Analysis with R Second Edition
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
RefresheR
Navigating the basics
Arithmetic and assignment
Logicals and characters
Flow of control
Getting help in R
Vectors
Subsetting
Vectorized functions
Advanced subsetting
Recycling
Functions
Matrices
Loading data into R
Working with packages
Exercises
Summary
The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Exercises
Summary
Describing Relationships
Multivariate data
Relationships between a categorical and continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Covariance
Correlation coefficients
Comparing multiple correlations
Visualization methods
Categorical and continuous variables
Two categorical variables
Two continuous variables
More than two continuous variables
Exercises
Summary
Probability
Basic probability
A tale of two interpretations
Sampling from distributions
Parameters
The binomial distribution
The normal distribution
The three-sigma rule and using z-tables
Exercises
Summary
Using Data To Reason About The World
Estimating means
The sampling distribution
Interval estimation
How did we get 1.96?
Smaller samples
Exercises
Summary
Testing Hypotheses
The null hypothesis significance testing framework
One and two-tailed tests
Errors in NHST
A warning about significance
A warning about p-values
Testing the mean of one sample
Assumptions of the one sample t-test
Testing two means
Assumptions of the independent samples t-test
Testing more than two means
Assumptions of ANOVA
Testing independence of proportions
What if my assumptions are unfounded?
Exercises
Summary
Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC – stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Exercises
Summary
The Bootstrap
What's... uhhh... the deal with the bootstrap?
Performing the bootstrap in R (more elegantly)
Confidence intervals
A one-sample test of means
Bootstrapping statistics other than the mean
Busting bootstrap myths
What have we left out?
Exercises
Summary
Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
A word of warning
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Cross-validation
Striking a balance
Linear regression diagnostics
Second Anscombe relationship
Third Anscombe relationship
Fourth Anscombe relationship
Advanced topics
Exercises
Summary
Predicting Categorical Variables
k-Nearest neighbors
Using k-NN in R
Confusion matrices
Limitations of k-NN
Logistic regression
Generalized Linear Model (GLM)
Using logistic regression in R
Decision trees
Random forests
Choosing a classifier
The vertical decision boundary
The diagonal decision boundary
The crescent decision boundary
The circular decision boundary
Exercises
Summary
Predicting Changes with Time
What is a time series?
What is forecasting?
Uncertainty
Difficulties in forecasting
Creating and plotting time series
Components of time series
Time series decomposition
White noise
Autocorrelation
Smoothing
Simple exponential smoothing for forecasting
Accuracy assessment
Double exponential smoothing
Triple exponential smoothing
ETS and the state space model
Interventions for improvement
What we didn't cover
Citations for the climate change data
Exercises
Summary
Sources of Data
Relational databases
Why didn't we just do that in SQL?
Using JSON
XML
Other data formats
Online repositories
Exercises
Summary
Dealing with Missing Data
Analysis with missing data
Visualizing missing data
Types of missing data
So which one is it?
Unsophisticated methods for dealing with missing data
Complete case analysis
Pairwise deletion
Mean substitution
Hot deck imputation
Regression imputation
Stochastic regression imputation
Multiple imputation
So how does mice come up with the imputed values?
Methods of imputation
Multiple imputation in practice
Exercises
Summary
Dealing with Messy Data
Checking unsanitized data
Checking for out-of-bounds data
Checking the data type of a column
Checking for unexpected categories
Checking for outliers, entry errors, or unlikely data points
Chaining assertions
Regular expressions
What are regular expressions?
Getting started
Regex for data normalization
More normalization
Other tools for messy data
OpenRefine
Fuzzy matching
Exercises
Summary
Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Allocation of memory
Vectorization
Using optimized packages
Using another R implementation
Using parallelization
Getting started with parallel R
An example of (some) substance
Using Rcpp
Being smarter about your code
Exercises
Summary
Working with Popular R Packages
The data.table package
The i in DT [i, j, by]
What in the world are by reference semantics?
The j in DT[i, j, by]
Using both i and j
Using the by argument for grouping
Joining data tables
Reshaping, melting, and pivoting data
Using dplyr and tidyr to manipulate data
Functional programming as a main tidyverse principle
Loading data for use in dplyr
Manipulating rows
Selecting and renaming columns
Computing on columns
Grouping in dplyr
Joining data
Reshaping data with tidyr
Exercises
Summary
Reproducibility and Best Practices
R scripting
RStudio
Running R scripts
An example script
Scripting and reproducibility
R projects
Version control
Package version management
Communicating results
Exercises
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜