万本电子书0元读

万本电子书0元读

顶部广告

Mastering Python for Data Science电子书

售       价:¥

9人正在读 | 0人评论 9.8

作       者:Samir Madhavan

出  版  社:Packt Publishing

出版时间:2015-08-31

字       数:89.8万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
If you are a Python developer who wants to master the world of data science, then this book is for you. Some knowledge of data science is assumed.
目录展开

Mastering Python for Data Science

Table of Contents

Mastering Python for Data Science

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started with Raw Data

The world of arrays with NumPy

Creating an array

Mathematical operations

Array subtraction

Squaring an array

A trigonometric function performed on the array

Conditional operations

Matrix multiplication

Indexing and slicing

Shape manipulation

Empowering data analysis with pandas

The data structure of pandas

Series

DataFrame

Panel

Inserting and exporting data

CSV

XLS

JSON

Database

Data cleansing

Checking the missing data

Filling the missing data

String operations

Merging data

Data operations

Aggregation operations

Joins

The inner join

The left outer join

The full outer join

The groupby function

Summary

2. Inferential Statistics

Various forms of distribution

A normal distribution

A normal distribution from a binomial distribution

A Poisson distribution

A Bernoulli distribution

A z-score

A p-value

One-tailed and two-tailed tests

Type 1 and Type 2 errors

A confidence interval

Correlation

Z-test vs T-test

The F distribution

The chi-square distribution

Chi-square for the goodness of fit

The chi-square test of independence

ANOVA

Summary

3. Finding a Needle in a Haystack

What is data mining?

Presenting an analysis

Studying the Titanic

Which passenger class has the maximum number of survivors?

What is the distribution of survivors based on gender among the various classes?

What is the distribution of nonsurvivors among the various classes who have family aboard the ship?

What was the survival percentage among different age groups?

Summary

4. Making Sense of Data through Advanced Visualization

Controlling the line properties of a chart

Using keyword arguments

Using the setter methods

Using the setp() command

Creating multiple plots

Playing with text

Styling your plots

Box plots

Heatmaps

Scatter plots with histograms

A scatter plot matrix

Area plots

Bubble charts

Hexagon bin plots

Trellis plots

A 3D plot of a surface

Summary

5. Uncovering Machine Learning

Different types of machine learning

Supervised learning

Unsupervised learning

Reinforcement learning

Decision trees

Linear regression

Logistic regression

The naive Bayes classifier

The k-means clustering

Hierarchical clustering

Summary

6. Performing Predictions with a Linear Regression

Simple linear regression

Multiple regression

Training and testing a model

Summary

7. Estimating the Likelihood of Events

Logistic regression

Data preparation

Creating training and testing sets

Building a model

Model evaluation

Evaluating a model based on test data

Model building and evaluation with SciKit

Summary

8. Generating Recommendations with Collaborative Filtering

Recommendation data

User-based collaborative filtering

Finding similar users

The Euclidean distance score

The Pearson correlation score

Ranking the users

Recommending items

Item-based collaborative filtering

Summary

9. Pushing Boundaries with Ensemble Models

The census income dataset

Exploring the census data

Hypothesis 1: People who are older earn more

Hypothesis 2: Income bias based on working class

Hypothesis 3: People with more education earn more

Hypothesis 4: Married people tend to earn more

Hypothesis 5: There is a bias in income based on race

Hypothesis 6: There is a bias in the income based on occupation

Hypothesis 7: Men earn more

Hypothesis 8: People who clock in more hours earn more

Hypothesis 9: There is a bias in income based on the country of origin

Decision trees

Random forests

Summary

10. Applying Segmentation with k-means Clustering

The k-means algorithm and its working

A simple example

The k-means clustering with countries

Determining the number of clusters

Clustering the countries

Summary

11. Analyzing Unstructured Data with Text Mining

Preprocessing data

Creating a wordcloud

Word and sentence tokenization

Parts of speech tagging

Stemming and lemmatization

Stemming

Lemmatization

The Stanford Named Entity Recognizer

Performing sentiment analysis on world leaders using Twitter

Summary

12. Leveraging Python in the World of Big Data

What is Hadoop?

The programming model

The MapReduce architecture

The Hadoop DFS

Hadoop's DFS architecture

Python MapReduce

The basic word count

A sentiment score for each review

The overall sentiment score

Deploying the MapReduce code on Hadoop

File handling with Hadoopy

Pig

Python with Apache Spark

Scoring the sentiment

The overall sentiment

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部