万本电子书0元读

万本电子书0元读

顶部广告

Statistics for Data Science电子书

售       价:¥

1人正在读 | 0人评论 9.8

作       者:James D. Miller

出  版  社:Packt Publishing

出版时间:2017-11-17

字       数:34.7万

所属分类: 进口书 > 外文原版书 > 小说

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, and analysis Learn all about probability, statistics, numerical computations, and more with the help of R programs Who This Book Is For This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful. What You Will Learn Analyze the transition from a data developer to a data scientist mindset Get acquainted with the R programs and the logic used for statistical computations Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks Get comfortable with performing various statistical computations for data science programmatically In Detail Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortable with performing various statistical computations for data science programmatically. Style and approach Step by step comprehensive guide with real world examples
目录展开

Title Page

Copyright

Statistics for Data Science

Credits

About the Author

About the Reviewer

www.PacktPub.com

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Transitioning from Data Developer to Data Scientist

Data developer thinking

Objectives of a data developer

Querying or mining

Data quality or data cleansing

Data modeling

Issue or insights

Thought process

Developer versus scientist

New data, new source

Quality questions

Querying and mining

Performance

Financial reporting

Visualizing

Tools of the trade

Advantages of thinking like a data scientist

Developing a better approach to understanding data

Using statistical thinking during program or database designing

Adding to your personal toolbox

Increased marketability

Perpetual learning

Seeing the future

Transitioning to a data scientist

Let's move ahead

Summary

Declaring the Objectives

Key objectives of data science

Collecting data

Processing data

Exploring and visualizing data

Analyzing the data and/or applying machine learning to the data

Deciding (or planning) based upon acquired insight

Thinking like a data scientist

Bringing statistics into data science

Common terminology

Statistical population

Probability

False positives

Statistical inference

Regression

Fitting

Categorical data

Classification

Clustering

Statistical comparison

Coding

Distributions

Data mining

Decision trees

Machine learning

Munging and wrangling

Visualization

D3

Regularization

Assessment

Cross-validation

Neural networks

Boosting

Lift

Mode

Outlier

Predictive modeling

Big Data

Confidence interval

Writing

Summary

A Developer's Approach to Data Cleaning

Understanding basic data cleaning

Common data issues

Contextual data issues

Cleaning techniques

R and common data issues

Outliers

Step 1 – Profiling the data

Step 2 – Addressing the outliers

Domain expertise

Validity checking

Enhancing data

Harmonization

Standardization

Transformations

Deductive correction

Deterministic imputation

Summary

Data Mining and the Database Developer

Data mining

Common techniques

Visualization

Cluster analysis

Correlation analysis

Discriminant analysis

Factor analysis

Regression analysis

Logistic analysis

Purpose

Mining versus querying

Choosing R for data mining

Visualizations

Current smokers

Missing values

A cluster analysis

Dimensional reduction

Calculating statistical significance

Frequent patterning

Frequent item-setting

Sequence mining

Summary

Statistical Analysis for the Database Developer

Data analysis

Looking closer

Statistical analysis

Summarization

Comparing groups

Samples

Group comparison conclusions

Summarization modeling

Establishing the nature of data

Successful statistical analysis

R and statistical analysis

Summary

Database Progression to Database Regression

Introducing statistical regression

Techniques and approaches for regression

Choosing your technique

Does it fit?

Identifying opportunities for statistical regression

Summarizing data

Exploring relationships

Testing significance of differences

Project profitability

R and statistical regression

A working example

Establishing the data profile

The graphical analysis

Predicting with our linear model

Step 1: Chunking the data

Step 2: Creating the model on the training data

Step 3: Predicting the projected profit on test data

Step 4: Reviewing the model

Step 4: Accuracy and error

Summary

Regularization for Database Improvement

Statistical regularization

Various statistical regularization methods

Ridge

Lasso

Least angles

Opportunities for regularization

Collinearity

Sparse solutions

High-dimensional data

Classification

Using data to understand statistical regularization

Improving data or a data model

Simplification

Relevance

Speed

Transformation

Variation of coefficients

Casual inference

Back to regularization

Reliability

Using R for statistical regularization

Parameter Setup

Summary

Database Development and Assessment

Assessment and statistical assessment

Objectives

Baselines

Planning for assessment

Evaluation

Development versus assessment

Planning

Data assessment and data quality assurance

Categorizing quality

Relevance

Cross-validation

Preparing data

R and statistical assessment

Questions to ask

Learning curves

Example of a learning curve

Summary

Databases and Neural Networks

Ask any data scientist

Defining neural network

Nodes

Layers

Training

Solution

Understanding the concepts

Neural network models and database models

No single or main node

Not serial

No memory address to store results

R-based neural networks

References

Data prep and preprocessing

Data splitting

Model parameters

Cross-validation

R packages for ANN development

ANN

ANN2

NNET

Black boxes

A use case

Popular use cases

Character recognition

Image compression

Stock market prediction

Fraud detection

Neuroscience

Summary

Boosting your Database

Definition and purpose

Bias

Categorizing bias

Causes of bias

Bias data collection

Bias sample selection

Variance

ANOVA

Noise

Noisy data

Weak and strong learners

Weak to strong

Model bias

Training and prediction time

Complexity

Which way?

Back to boosting

How it started

AdaBoost

What you can learn from boosting (to help) your database

Using R to illustrate boosting methods

Prepping the data

Training

Ready for boosting

Example results

Summary

Database Classification using Support Vector Machines

Database classification

Data classification in statistics

Guidelines for classifying data

Common guidelines

Definitions

Definition and purpose of an SVM

The trick

Feature space and cheap computations

Drawing the line

More than classification

Downside

Reference resources

Predicting credit scores

Using R and an SVM to classify data in a database

Moving on

Summary

Database Structures and Machine Learning

Data structures and data models

Data structures

Data models

What's the difference?

Relationships

Machine learning

Overview of machine learning concepts

Key elements of machine learning

Representation

Evaluation

Optimization

Types of machine learning

Supervised learning

Unsupervised learning

Semi-supervised learning

Reinforcement learning

Most popular

Applications of machine learning

Machine learning in practice

Understanding

Preparation

Learning

Interpretation

Deployment

Iteration

Using R to apply machine learning techniques to a database

Understanding the data

Preparing

Data developer

Understanding the challenge

Cross-tabbing and plotting

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部