万本电子书0元读

万本电子书0元读

顶部广告

Python Data Science Essentials电子书

售       价:¥

2人正在读 | 0人评论 9.8

作       者:Alberto Boschetti

出  版  社:Packt Publishing

出版时间:2015-04-30

字       数:184.8万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
If you are an aspiring data scientist and you have at least a working knowledge of data analysis and Python, this book will get you started in data science. Data analysts with experience of R or MATLAB will also find the book to be a comprehensive reference to enhance their data manipulation and machine learning skills.
目录展开

Python Data Science Essentials

Table of Contents

Python Data Science Essentials

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. First Steps

Introducing data science and Python

Installing Python

Python 2 or Python 3?

Step-by-step installation

A glance at the essential Python packages

NumPy

SciPy

pandas

Scikit-learn

IPython

Matplotlib

Statsmodels

Beautiful Soup

NetworkX

NLTK

Gensim

PyPy

The installation of packages

Package upgrades

Scientific distributions

Anaconda

Enthought Canopy

PythonXY

WinPython

Introducing IPython

The IPython Notebook

Datasets and code used in the book

Scikit-learn toy datasets

The MLdata.org public repository

LIBSVM data examples

Loading data directly from CSV or text files

Scikit-learn sample generators

Summary

2. Data Munging

The data science process

Data loading and preprocessing with pandas

Fast and easy data loading

Dealing with problematic data

Dealing with big datasets

Accessing other data formats

Data preprocessing

Data selection

Working with categorical and textual data

A special type of data – text

Data processing with NumPy

NumPy's n-dimensional array

The basics of NumPy ndarray objects

Creating NumPy arrays

From lists to unidimensional arrays

Controlling the memory size

Heterogeneous lists

From lists to multidimensional arrays

Resizing arrays

Arrays derived from NumPy functions

Getting an array directly from a file

Extracting data from pandas

NumPy fast operation and computations

Matrix operations

Slicing and indexing with NumPy arrays

Stacking NumPy arrays

Summary

3. The Data Science Pipeline

Introducing EDA

Feature creation

Dimensionality reduction

The covariance matrix

Principal Component Analysis (PCA)

A variation of PCA for big data – RandomizedPCA

Latent Factor Analysis (LFA)

Linear Discriminant Analysis (LDA)

Latent Semantical Analysis (LSA)

Independent Component Analysis (ICA)

Kernel PCA

Restricted Boltzmann Machine (RBM)

The detection and treatment of outliers

Univariate outlier detection

EllipticEnvelope

OneClassSVM

Scoring functions

Multilabel classification

Binary classification

Regression

Testing and validating

Cross-validation

Using cross-validation iterators

Sampling and bootstrapping

Hyper-parameters' optimization

Building custom scoring functions

Reducing the grid search runtime

Feature selection

Univariate selection

Recursive elimination

Stability and L1-based selection

Summary

4. Machine Learning

Linear and logistic regression

Naive Bayes

The k-Nearest Neighbors

Advanced nonlinear algorithms

SVM for classification

SVM for regression

Tuning SVM

Ensemble strategies

Pasting by random samples

Bagging with weak ensembles

Random Subspaces and Random Patches

Sequences of models – AdaBoost

Gradient tree boosting (GTB)

Dealing with big data

Creating some big datasets as examples

Scalability with volume

Keeping up with velocity

Dealing with variety

A quick overview of Stochastic Gradient Descent (SGD)

A peek into Natural Language Processing (NLP)

Word tokenization

Stemming

Word Tagging

Named Entity Recognition (NER)

Stopwords

A complete data science example – text classification

An overview of unsupervised learning

Summary

5. Social Network Analysis

Introduction to graph theory

Graph algorithms

Graph loading, dumping, and sampling

Summary

6. Visualization

Introducing the basics of matplotlib

Curve plotting

Using panels

Scatterplots

Histograms

Bar graphs

Image visualization

Selected graphical examples with pandas

Boxplots and histograms

Scatterplots

Parallel coordinates

Advanced data learning representation

Learning curves

Validation curves

Feature importance

GBT partial dependence plot

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部