售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Mastering Data Analysis with R
Table of Contents
Mastering Data Analysis with R
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Hello, Data!
Loading text files of a reasonable size
Data files larger than the physical memory
Benchmarking text file parsers
Loading a subset of text files
Filtering flat files before loading to R
Loading data from databases
Setting up the test environment
MySQL and MariaDB
PostgreSQL
Oracle database
ODBC database access
Using a graphical user interface to connect to databases
Other database backends
Importing data from other statistical systems
Loading Excel spreadsheets
Summary
2. Getting Data from the Web
Loading datasets from the Internet
Other popular online data formats
Reading data from HTML tables
Reading tabular data from static Web pages
Scraping data from other online sources
R packages to interact with data source APIs
Socrata Open Data API
Finance APIs
Fetching time series with Quandl
Google documents and analytics
Online search trends
Historical weather data
Other online data sources
Summary
3. Filtering and Summarizing Data
Drop needless data
Drop needless data in an efficient way
Drop needless data in another efficient way
Aggregation
Quicker aggregation with base R commands
Convenient helper functions
High-performance helper functions
Aggregate with data.table
Running benchmarks
Summary functions
Adding up the number of cases in subgroups
Summary
4. Restructuring Data
Transposing matrices
Filtering data by string matching
Rearranging data
dplyr versus data.table
Computing new variables
Memory profiling
Creating multiple variables at a time
Computing new variables with dplyr
Merging datasets
Reshaping data in a flexible way
Converting wide tables to the long table format
Converting long tables to the wide table format
Tweaking performance
The evolution of the reshape packages
Summary
5. Building Models (authored by Renata Nemeth and Gergely Toth)
The motivation behind multivariate models
Linear regression with continuous predictors
Model interpretation
Multiple predictors
Model assumptions
How well does the line fit in the data?
Discrete predictors
Summary
6. Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)
The modeling workflow
Logistic regression
Data considerations
Goodness of model fit
Model comparison
Models for count data
Poisson regression
Negative binomial regression
Multivariate non-linear models
Summary
7. Unstructured Data
Importing the corpus
Cleaning the corpus
Visualizing the most frequent words in the corpus
Further cleanup
Stemming words
Lemmatisation
Analyzing the associations among terms
Some other metrics
The segmentation of documents
Summary
8. Polishing Data
The types and origins of missing data
Identifying missing data
By-passing missing values
Overriding the default arguments of a function
Getting rid of missing data
Filtering missing data before or during the actual analysis
Data imputation
Modeling missing values
Comparing different imputation methods
Not imputing missing values
Multiple imputation
Extreme values and outliers
Testing extreme values
Using robust methods
Summary
9. From Big to Small Data
Adequacy tests
Normality
Multivariate normality
Dependence of variables
KMO and Barlett's test
Principal Component Analysis
PCA algorithms
Determining the number of components
Interpreting components
Rotation methods
Outlier-detection with PCA
Factor analysis
Principal Component Analysis versus Factor Analysis
Multidimensional Scaling
Summary
10. Classification and Clustering
Cluster analysis
Hierarchical clustering
Determining the ideal number of clusters
K-means clustering
Visualizing clusters
Latent class models
Latent Class Analysis
LCR models
Discriminant analysis
Logistic regression
Machine learning algorithms
The K-Nearest Neighbors algorithm
Classification trees
Random forest
Other algorithms
Summary
11. Social Network Analysis of the R Ecosystem
Loading network data
Centrality measures of networks
Visualizing network data
Interactive network plots
Custom plot layouts
Analyzing R package dependencies with an R package
Further network analysis resources
Summary
12. Analyzing Time-series
Creating time-series objects
Visualizing time-series
Seasonal decomposition
Holt-Winters filtering
Autoregressive Integrated Moving Average models
Outlier detection
More complex time-series objects
Advanced time-series analysis
Summary
13. Data Around Us
Geocoding
Visualizing point data in space
Finding polygon overlays of point data
Plotting thematic maps
Rendering polygons around points
Contour lines
Voronoi diagrams
Satellite maps
Interactive maps
Querying Google Maps
JavaScript mapping libraries
Alternative map designs
Spatial statistics
Summary
14. Analyzing the R Community
R Foundation members
Visualizing supporting members around the world
R package maintainers
The number of packages per maintainer
The R-help mailing list
Volume of the R-help mailing list
Forecasting the e-mail volume in the future
Analyzing overlaps between our lists of R users
Further ideas on extending the capture-recapture models
The number of R users in social media
R-related posts in social media
Summary
A. References
General good readings on R
Chapter 1 – Hello, Data!
Chapter 2 – Getting Data from the Web
Chapter 3 – Filtering and Summarizing Data
Chapter 4 – Restructuring Data
Chapter 5 – Building Models (authored by Renata Nemeth and Gergely Toth)
Chapter 6 – Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)
Chapter 7 – Unstructured Data
Chapter 8 – Polishing Data
Chapter 9 – From Big to Smaller Data
Chapter 10 – Classification and Clustering
Chapter 11 – Social Network Analysis of the R Ecosystem
Chapter 12 – Analyzing Time-series
Chapter 13 – Data Around Us
Chapter 14 – Analysing the R Community
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜