万本电子书0元读

万本电子书0元读

顶部广告

Principles of Data Science电子书

售       价:¥

4人正在读 | 0人评论 9.8

作       者:Sinan Ozdemir

出  版  社:Packt Publishing

出版时间:2016-12-01

字       数:267.1万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Learn the techniques and math you need to start making sense of your data About This Book Enhance your knowledge of coding with data science theory for practical insight into data science and analysis More than just a math class, learn how to perform real-world data science tasks with R and Python Create actionable insights and transform raw data into tangible value Who This Book Is For You should be fairly well acquainted with basic algebra and should feel comfortable reading snippets of R/Python as well as pseudo code. You should have the urge to learn and apply the techniques put forth in this book on either your own data sets or those provided to you. If you have the basic math skills but want to apply them in data science or you have good programming skills but lack math, then this book is for you. What You Will Learn Get to know the five most important steps of data science Use your data intelligently and learn how to handle it with care Bridge the gap between mathematics and programming Learn about probability, calculus, and how to use statistical models to control and clean your data and drive actionable results Build and evaluate baseline machine learning models Explore the most effective metrics to determine the success of your machine learning models Create data visualizations that communicate actionable insights Read and apply machine learning concepts to your problems and make actual predictions In Detail Need to turn your skills at programming into effective data science skillsPrinciples of Data Science is created to help you join the dots between mathematics, programming, and business analysis. With this book, you’ll feel confident about asking—and answering—complex and sophisticated questions of your data to move from abstract and raw statistics to actionable ideas. With a unique approach that bridges the gap between mathematics and computer science, this books takes you through the entire data science pipeline. Beginning with cleaning and preparing data, and effective data mining strategies and techniques, you’ll move on to build a comprehensive picture of how every piece of the data science puzzle fits together. Learn the fundamentals of computational mathematics and statistics, as well as some pseudocode being used today by data scientists and analysts. You’ll get to grips with machine learning, discover the statistical models that help you take control and navigate even the densest datasets, and find out how to create powerful visualizations that communicate what your data means. Style and approach This is an easy-to-understand and accessible tutorial. It is a step-by-step guide with use cases, examples, and illustrations to get you well-versed with the concepts of data science. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts later on and will help you implement these techniques in the real world.
目录展开

Principles of Data Science

Table of Contents

Principles of Data Science

Credits

About the Author

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. How to Sound Like a Data Scientist

What is data science?

Basic terminology

Why data science?

Example – Sigma Technologies

The data science Venn diagram

The math

Example – spawner-recruit models

Computer programming

Why Python?

Python practices

Example of basic Python

Example – parsing a single tweet

Domain knowledge

Some more terminology

Data science case studies

Case study – automating government paper pushing

Fire all humans, right?

Case study – marketing dollars

Case study – what's in a job description?

Summary

2. Types of Data

Flavors of data

Why look at these distinctions?

Structured versus unstructured data

Example of data preprocessing

Word/phrase counts

Presence of certain special characters

Relative length of text

Picking out topics

Quantitative versus qualitative data

Example – coffee shop data

Example – world alcohol consumption data

Digging deeper

The road thus far…

The four levels of data

The nominal level

Mathematical operations allowed

Measures of center

What data is like at the nominal level

The ordinal level

Examples

Mathematical operations allowed

Measures of center

Quick recap and check

The interval level

Example

Mathematical operations allowed

Measures of center

Measures of variation

Standard deviation

The ratio level

Examples

Measures of center

Problems with the ratio level

Data is in the eye of the beholder

Summary

3. The Five Steps of Data Science

Introduction to data science

Overview of the five steps

Ask an interesting question

Obtain the data

Explore the data

Model the data

Communicate and visualize the results

Explore the data

Basic questions for data exploration

Dataset 1 – Yelp

Dataframes

Series

Exploration tips for qualitative data

Nominal level columns

Filtering in Pandas

Ordinal level columns

Dataset 2 – titanic

Summary

4. Basic Mathematics

Mathematics as a discipline

Basic symbols and terminology

Vectors and matrices

Quick exercises

Answers

Arithmetic symbols

Summation

Proportional

Dot product

Graphs

Logarithms/exponents

Set theory

Linear algebra

Matrix multiplication

How to multiply matrices

Summary

5. Impossible or Improbable – A Gentle Introduction to Probability

Basic definitions

Probability

Bayesian versus Frequentist

Frequentist approach

The law of large numbers

Compound events

Conditional probability

The rules of probability

The addition rule

Mutual exclusivity

The multiplication rule

Independence

Complementary events

A bit deeper

Summary

6. Advanced Probability

Collectively exhaustive events

Bayesian ideas revisited

Bayes theorem

More applications of Bayes theorem

Example – Titanic

Example – medical studies

Random variables

Discrete random variables

Types of discrete random variables

Binomial random variables

Poisson random variable,

Continuous random variables

Summary

7. Basic Statistics

What are statistics?

How do we obtain and sample data?

Obtaining data

Observational

Experimental

Sampling data

Probability sampling

Random sampling

Unequal probability sampling

How do we measure statistics?

Measures of center

Measures of variation

Definition

Example – employee salaries

Measures of relative standing

The insightful part – correlations in data

The Empirical rule

Summary

8. Advanced Statistics

Point estimates

Sampling distributions

Confidence intervals

Hypothesis tests

Conducting a hypothesis test

One sample t-tests

Example of a one sample t-tests

Assumptions of the one sample t-tests

Type I and type II errors

Hypothesis test for categorical variables

Chi-square goodness of fit test

Assumptions of the chi-square goodness of fit test

Example of a chi-square test for goodness of fit

Chi-square test for association/independence

Assumptions of the chi-square independence test

Summary

9. Communicating Data

Why does communication matter?

Identifying effective and ineffective visualizations

Scatter plots

Line graphs

Bar charts

Histograms

Box plots

When graphs and statistics lie

Correlation versus causation

Simpson's paradox

If correlation doesn't imply causation, then what does?

Verbal communication

It's about telling a story

On the more formal side of things

The why/how/what strategy of presenting

Summary

10. How to Tell If Your Toaster Is Learning – Machine Learning Essentials

What is machine learning?

Machine learning isn't perfect

How does machine learning work?

Types of machine learning

Supervised learning

It's not only about predictions

Types of supervised learning

Regression

Classification

Data is in the eyes of the beholder

Unsupervised learning

Reinforcement learning

Overview of the types of machine learning

How does statistical modeling fit into all of this?

Linear regression

Adding more predictors

Regression metrics

Logistic regression

Probability, odds, and log odds

The math of logistic regression

Dummy variables

Summary

11. Predictions Don't Grow on Trees – or Do They?

Naïve Bayes classification

Decision trees

How does a computer build a regression tree?

How does a computer fit a classification tree?

Unsupervised learning

When to use unsupervised learning

K-means clustering

Illustrative example – data points

Illustrative example – beer!

Choosing an optimal number for K and cluster validation

The Silhouette Coefficient

Feature extraction and principal component analysis

Summary

12. Beyond the Essentials

The bias variance tradeoff

Error due to bias

Error due to variance

Two extreme cases of bias/variance tradeoff

Underfitting

Overfitting

How bias/variance play into error functions

K folds cross-validation

Grid searching

Visualizing training error versus cross-validation error

Ensembling techniques

Random forests

Comparing Random forests with decision trees

Neural networks

Basic structure

Summary

13. Case Studies

Case study 1 – predicting stock prices based on social media

Text sentiment analysis

Exploratory data analysis

Regression route

Classification route

Going beyond with this example

Case study 2 – why do some people cheat on their spouses?

Case study 3 – using tensorflow

Tensorflow and neural networks

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部