万本电子书0元读

万本电子书0元读

顶部广告

Data Analysis with R - Second Edition电子书

售       价:¥

7人正在读 | 0人评论 9.8

作       者:Tony Fischetti

出  版  社:Packt Publishing

出版时间:2018-03-28

字       数:70.9万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Learn, by example, the fundamentals of data analysis as well as several intermediate to advanced methods and techniques ranging from classification and regression to Bayesian methods and MCMC, which can be put to immediate use. About This Book ? Analyze your data using R – the most powerful statistical programming language ? Learn how to implement applied statistics using practical use-cases ? Use popular R packages to work with unstructured and structured data Who This Book Is For Budding data scientists and data analysts who are new to the concept of data analysis, or who want to build efficient analytical models in R will find this book to be useful. No prior exposure to data analysis is needed, although a fundamental understanding of the R programming language is required to get the best out of this book. What You Will Learn ? Gain a thorough understanding of statistical reasoning and sampling theory ? Employ hypothesis testing to draw inferences from your data ? Learn Bayesian methods for estimating parameters ? Train regression, classification, and time series models ? Handle missing data gracefully using multiple imputation ? Identify and manage problematic data points ? Learn how to scale your analyses to larger data with Rcpp, data.table, dplyr, and parallelization ? Put best practices into effect to make your job easier and facilitate reproducibility In Detail Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst. Style and approach An easy-to-follow step by step guide which will help you get to grips with real world application of Data Analysis with R
目录展开

Title Page

Copyright and Credits

Data Analysis with R Second Edition

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Reviews

RefresheR

Navigating the basics

Arithmetic and assignment

Logicals and characters

Flow of control

Getting help in R

Vectors

Subsetting

Vectorized functions

Advanced subsetting

Recycling

Functions

Matrices

Loading data into R

Working with packages

Exercises

Summary

The Shape of Data

Univariate data

Frequency distributions

Central tendency

Spread

Populations, samples, and estimation

Probability distributions

Visualization methods

Exercises

Summary

Describing Relationships

Multivariate data

Relationships between a categorical and continuous variable

Relationships between two categorical variables

The relationship between two continuous variables

Covariance

Correlation coefficients

Comparing multiple correlations

Visualization methods

Categorical and continuous variables

Two categorical variables

Two continuous variables

More than two continuous variables

Exercises

Summary

Probability

Basic probability

A tale of two interpretations

Sampling from distributions

Parameters

The binomial distribution

The normal distribution

The three-sigma rule and using z-tables

Exercises

Summary

Using Data To Reason About The World

Estimating means

The sampling distribution

Interval estimation

How did we get 1.96?

Smaller samples

Exercises

Summary

Testing Hypotheses

The null hypothesis significance testing framework

One and two-tailed tests

Errors in NHST

A warning about significance

A warning about p-values

Testing the mean of one sample

Assumptions of the one sample t-test

Testing two means

Assumptions of the independent samples t-test

Testing more than two means

Assumptions of ANOVA

Testing independence of proportions

What if my assumptions are unfounded?

Exercises

Summary

Bayesian Methods

The big idea behind Bayesian analysis

Choosing a prior

Who cares about coin flips

Enter MCMC – stage left

Using JAGS and runjags

Fitting distributions the Bayesian way

The Bayesian independent samples t-test

Exercises

Summary

The Bootstrap

What's... uhhh... the deal with the bootstrap?

Performing the bootstrap in R (more elegantly)

Confidence intervals

A one-sample test of means

Bootstrapping statistics other than the mean

Busting bootstrap myths

What have we left out?

Exercises

Summary

Predicting Continuous Variables

Linear models

Simple linear regression

Simple linear regression with a binary predictor

A word of warning

Multiple regression

Regression with a non-binary predictor

Kitchen sink regression

The bias-variance trade-off

Cross-validation

Striking a balance

Linear regression diagnostics

Second Anscombe relationship

Third Anscombe relationship

Fourth Anscombe relationship

Advanced topics

Exercises

Summary

Predicting Categorical Variables

k-Nearest neighbors

Using k-NN in R

Confusion matrices

Limitations of k-NN

Logistic regression

Generalized Linear Model (GLM)

Using logistic regression in R

Decision trees

Random forests

Choosing a classifier

The vertical decision boundary

The diagonal decision boundary

The crescent decision boundary

The circular decision boundary

Exercises

Summary

Predicting Changes with Time

What is a time series?

What is forecasting?

Uncertainty

Difficulties in forecasting

Creating and plotting time series

Components of time series

Time series decomposition

White noise

Autocorrelation

Smoothing

Simple exponential smoothing for forecasting

Accuracy assessment

Double exponential smoothing

Triple exponential smoothing

ETS and the state space model

Interventions for improvement

What we didn't cover

Citations for the climate change data

Exercises

Summary

Sources of Data

Relational databases

Why didn't we just do that in SQL?

Using JSON

XML

Other data formats

Online repositories

Exercises

Summary

Dealing with Missing Data

Analysis with missing data

Visualizing missing data

Types of missing data

So which one is it?

Unsophisticated methods for dealing with missing data

Complete case analysis

Pairwise deletion

Mean substitution

Hot deck imputation

Regression imputation

Stochastic regression imputation

Multiple imputation

So how does mice come up with the imputed values?

Methods of imputation

Multiple imputation in practice

Exercises

Summary

Dealing with Messy Data

Checking unsanitized data

Checking for out-of-bounds data

Checking the data type of a column

Checking for unexpected categories

Checking for outliers, entry errors, or unlikely data points

Chaining assertions

Regular expressions

What are regular expressions?

Getting started

Regex for data normalization

More normalization

Other tools for messy data

OpenRefine

Fuzzy matching

Exercises

Summary

Dealing with Large Data

Wait to optimize

Using a bigger and faster machine

Be smart about your code

Allocation of memory

Vectorization

Using optimized packages

Using another R implementation

Using parallelization

Getting started with parallel R

An example of (some) substance

Using Rcpp

Being smarter about your code

Exercises

Summary

Working with Popular R Packages

The data.table package

The i in DT [i, j, by]

What in the world are by reference semantics?

The j in DT[i, j, by]

Using both i and j

Using the by argument for grouping

Joining data tables

Reshaping, melting, and pivoting data

Using dplyr and tidyr to manipulate data

Functional programming as a main tidyverse principle

Loading data for use in dplyr

Manipulating rows

Selecting and renaming columns

Computing on columns

Grouping in dplyr

Joining data

Reshaping data with tidyr

Exercises

Summary

Reproducibility and Best Practices

R scripting

RStudio

Running R scripts

An example script

Scripting and reproducibility

R projects

Version control

Package version management

Communicating results

Exercises

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部