万本电子书0元读

万本电子书0元读

顶部广告

Mastering Data Analysis with R电子书

售       价:¥

4人正在读 | 0人评论 9.8

作       者:Gergely Daróczi

出  版  社:Packt Publishing

出版时间:2015-09-30

字       数:151.9万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Gain sharp insights into your data and solve real-world data science problems with R—from data munging to modeling and visualization About This Book Handle your data with precision and care for optimal business intelligence Restructure and transform your data to inform decision-making Packed with practical advice and tips to help you get to grips with data mining Who This Book Is For If you are a data scientist or R developer who wants to explore and optimize your use of R’s advanced features and tools, this is the book for you. A basic knowledge of R is required, along with an understanding of database logic. What You Will Learn Connect to and load data from R’s range of powerful databases Successfully fetch and parse structured and unstructured data Transform and restructure your data with efficient R packages Define and build complex statistical models with glm Develop and train machine learning algorithms Visualize social networks and graph data Deploy supervised and unsupervised classification algorithms Discover how to visualize spatial data with R In Detail R is an essential language for sharp and successful data analysis. Its numerous features and ease of use make it a powerful way of mining, managing, and interpreting large sets of data. In a world where understanding big data has become key, by mastering R you will be able to deal with your data effectively and efficiently. This book will give you the guidance you need to build and develop your knowledge and expertise. Bridging the gap between theory and practice, this book will help you to understand and use data for a competitive advantage. Beginning with taking you through essential data mining and management tasks such as munging, fetching, cleaning, and restructuring, the book then explores different model designs and the core components of effective analysis. You will then discover how to optimize your use of machine learning algorithms for classification and recommendation systems beside the traditional and more recent statistical methods. Style and approach Covering the essential tasks and skills within data science, Mastering Data Analysis provides you with solutions to the challenges of data science. Each section gives you a theoretical overview before demonstrating how to put the theory to work with real-world use cases and hands-on examples.
目录展开

Mastering Data Analysis with R

Table of Contents

Mastering Data Analysis with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Hello, Data!

Loading text files of a reasonable size

Data files larger than the physical memory

Benchmarking text file parsers

Loading a subset of text files

Filtering flat files before loading to R

Loading data from databases

Setting up the test environment

MySQL and MariaDB

PostgreSQL

Oracle database

ODBC database access

Using a graphical user interface to connect to databases

Other database backends

Importing data from other statistical systems

Loading Excel spreadsheets

Summary

2. Getting Data from the Web

Loading datasets from the Internet

Other popular online data formats

Reading data from HTML tables

Reading tabular data from static Web pages

Scraping data from other online sources

R packages to interact with data source APIs

Socrata Open Data API

Finance APIs

Fetching time series with Quandl

Google documents and analytics

Online search trends

Historical weather data

Other online data sources

Summary

3. Filtering and Summarizing Data

Drop needless data

Drop needless data in an efficient way

Drop needless data in another efficient way

Aggregation

Quicker aggregation with base R commands

Convenient helper functions

High-performance helper functions

Aggregate with data.table

Running benchmarks

Summary functions

Adding up the number of cases in subgroups

Summary

4. Restructuring Data

Transposing matrices

Filtering data by string matching

Rearranging data

dplyr versus data.table

Computing new variables

Memory profiling

Creating multiple variables at a time

Computing new variables with dplyr

Merging datasets

Reshaping data in a flexible way

Converting wide tables to the long table format

Converting long tables to the wide table format

Tweaking performance

The evolution of the reshape packages

Summary

5. Building Models (authored by Renata Nemeth and Gergely Toth)

The motivation behind multivariate models

Linear regression with continuous predictors

Model interpretation

Multiple predictors

Model assumptions

How well does the line fit in the data?

Discrete predictors

Summary

6. Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)

The modeling workflow

Logistic regression

Data considerations

Goodness of model fit

Model comparison

Models for count data

Poisson regression

Negative binomial regression

Multivariate non-linear models

Summary

7. Unstructured Data

Importing the corpus

Cleaning the corpus

Visualizing the most frequent words in the corpus

Further cleanup

Stemming words

Lemmatisation

Analyzing the associations among terms

Some other metrics

The segmentation of documents

Summary

8. Polishing Data

The types and origins of missing data

Identifying missing data

By-passing missing values

Overriding the default arguments of a function

Getting rid of missing data

Filtering missing data before or during the actual analysis

Data imputation

Modeling missing values

Comparing different imputation methods

Not imputing missing values

Multiple imputation

Extreme values and outliers

Testing extreme values

Using robust methods

Summary

9. From Big to Small Data

Adequacy tests

Normality

Multivariate normality

Dependence of variables

KMO and Barlett's test

Principal Component Analysis

PCA algorithms

Determining the number of components

Interpreting components

Rotation methods

Outlier-detection with PCA

Factor analysis

Principal Component Analysis versus Factor Analysis

Multidimensional Scaling

Summary

10. Classification and Clustering

Cluster analysis

Hierarchical clustering

Determining the ideal number of clusters

K-means clustering

Visualizing clusters

Latent class models

Latent Class Analysis

LCR models

Discriminant analysis

Logistic regression

Machine learning algorithms

The K-Nearest Neighbors algorithm

Classification trees

Random forest

Other algorithms

Summary

11. Social Network Analysis of the R Ecosystem

Loading network data

Centrality measures of networks

Visualizing network data

Interactive network plots

Custom plot layouts

Analyzing R package dependencies with an R package

Further network analysis resources

Summary

12. Analyzing Time-series

Creating time-series objects

Visualizing time-series

Seasonal decomposition

Holt-Winters filtering

Autoregressive Integrated Moving Average models

Outlier detection

More complex time-series objects

Advanced time-series analysis

Summary

13. Data Around Us

Geocoding

Visualizing point data in space

Finding polygon overlays of point data

Plotting thematic maps

Rendering polygons around points

Contour lines

Voronoi diagrams

Satellite maps

Interactive maps

Querying Google Maps

JavaScript mapping libraries

Alternative map designs

Spatial statistics

Summary

14. Analyzing the R Community

R Foundation members

Visualizing supporting members around the world

R package maintainers

The number of packages per maintainer

The R-help mailing list

Volume of the R-help mailing list

Forecasting the e-mail volume in the future

Analyzing overlaps between our lists of R users

Further ideas on extending the capture-recapture models

The number of R users in social media

R-related posts in social media

Summary

A. References

General good readings on R

Chapter 1 – Hello, Data!

Chapter 2 – Getting Data from the Web

Chapter 3 – Filtering and Summarizing Data

Chapter 4 – Restructuring Data

Chapter 5 – Building Models (authored by Renata Nemeth and Gergely Toth)

Chapter 6 – Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)

Chapter 7 – Unstructured Data

Chapter 8 – Polishing Data

Chapter 9 – From Big to Smaller Data

Chapter 10 – Classification and Clustering

Chapter 11 – Social Network Analysis of the R Ecosystem

Chapter 12 – Analyzing Time-series

Chapter 13 – Data Around Us

Chapter 14 – Analysing the R Community

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部