万本电子书0元读

万本电子书0元读

顶部广告

Python Data Analysis Cookbook电子书

售       价:¥

67人正在读 | 0人评论 9.8

作       者:Ivan Idris

出  版  社:Packt Publishing

出版时间:2016-07-01

字       数:192.0万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:此类商品不支持退换货,不支持下载打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Over 140 practical recipes to help you make sense of your data with ease and build production-ready data apps About This Book Analyze Big Data sets, create attractive visualizations, and manipulate and process various data types Packed with rich recipes to help you learn and explore amazing algorithms for statistics and machine learning Authored by Ivan Idris, expert in python programming and proud author of eight highly reviewed books Who This Book Is For This book teaches Python data analysis at an intermediate level with the goal of transforming you from journeyman to master. Basic Python and data analysis skills and affinity are assumed. What You Will Learn Set up reproducible data analysis Clean and transform data Apply advanced statistical analysis Create attractive data visualizations Web scrape and work with databases, Hadoop, and Spark Analyze images and time series data Mine text and analyze social networks Use machine learning and evaluate the results Take advantage of parallelism and concurrency In Detail Data analysis is a rapidly evolving field and Python is a multi-paradigm programming language suitable for object-oriented application development and functional design patterns. As Python offers a range of tools and libraries for all purposes, it has slowly evolved as the primary language for data science, including topics on: data analysis, visualization, and machine learning. Python Data Analysis Cookbook focuses on reproducibility and creating production-ready systems. You will start with recipes that set the foundation for data analysis with libraries such as matplotlib, NumPy, and pandas. You will learn to create visualizations by choosing color maps and palettes then dive into statistical data analysis using distribution algorithms and correlations. You’ll then help you find your way around different data and numerical problems, get to grips with Spark and HDFS, and then set up migration *s for web mining. In this book, you will dive deeper into recipes on spectral analysis, smoothing, and bootstrapping methods. Moving on, you will learn to rank stocks and check market efficiency, then work with metrics and clusters. You will achieve parallelism to improve system performance by using multiple threads and speeding up your code. By the end of the book, you will be capable of handling various data analysis techniques in Python and devising solutions for problem scenarios. Style and Approach The book is written in “cookbook” style striving for high realism in data analysis. Through the recipe-based format, you can read each recipe separately as required and immediately apply the knowledge gained.
目录展开

Python Data Analysis Cookbook

Table of Contents

Python Data Analysis Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Preface

Why do you need this book?

Data analysis, data science, big data – what is the big deal?

A brief of history of data analysis with Python

A conjecture about the future

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Laying the Foundation for Reproducible Data Analysis

Introduction

Setting up Anaconda

Getting ready

How to do it...

There's more...

See also

Installing the Data Science Toolbox

Getting ready

How to do it...

How it works...

See also

Creating a virtual environment with virtualenv and virtualenvwrapper

Getting ready

How to do it...

See also

Sandboxing Python applications with Docker images

Getting ready

How to do it...

How it works...

See also

Keeping track of package versions and history in IPython Notebook

Getting ready

How to do it...

How it works...

See also

Configuring IPython

Getting ready

How to do it...

See also

Learning to log for robust error checking

Getting ready

How to do it...

How it works...

See also

Unit testing your code

Getting ready

How to do it...

How it works...

See also

Configuring pandas

Getting ready

How to do it...

Configuring matplotlib

Getting ready

How to do it...

How it works...

See also

Seeding random number generators and NumPy print options

Getting ready

How to do it...

See also

Standardizing reports, code style, and data access

Getting ready

How to do it...

See also

2. Creating Attractive Data Visualizations

Introduction

Graphing Anscombe's quartet

How to do it...

See also

Choosing seaborn color palettes

How to do it...

See also

Choosing matplotlib color maps

How to do it...

See also

Interacting with IPython Notebook widgets

How to do it...

See also

Viewing a matrix of scatterplots

How to do it...

Visualizing with d3.js via mpld3

Getting ready

How to do it...

Creating heatmaps

Getting ready

How to do it...

See also

Combining box plots and kernel density plots with violin plots

How to do it...

See also

Visualizing network graphs with hive plots

Getting ready

How to do it...

Displaying geographical maps

Getting ready

How to do it...

Using ggplot2-like plots

Getting ready

How to do it...

Highlighting data points with influence plots

How to do it...

See also

3. Statistical Data Analysis and Probability

Introduction

Fitting data to the exponential distribution

How to do it...

How it works…

See also

Fitting aggregated data to the gamma distribution

How to do it...

See also

Fitting aggregated counts to the Poisson distribution

How to do it...

See also

Determining bias

How to do it...

See also

Estimating kernel density

How to do it...

See also

Determining confidence intervals for mean, variance, and standard deviation

How to do it...

See also

Sampling with probability weights

How to do it...

See also

Exploring extreme values

How to do it...

See also

Correlating variables with Pearson's correlation

How to do it...

See also

Correlating variables with the Spearman rank correlation

How to do it...

See also

Correlating a binary and a continuous variable with the point biserial correlation

How to do it...

See also

Evaluating relations between variables with ANOVA

How to do it...

See also

4. Dealing with Data and Numerical Issues

Introduction

Clipping and filtering outliers

How to do it...

See also

Winsorizing data

How to do it...

See also

Measuring central tendency of noisy data

How to do it...

See also

Normalizing with the Box-Cox transformation

How to do it...

How it works

See also

Transforming data with the power ladder

How to do it...

Transforming data with logarithms

How to do it...

Rebinning data

How to do it...

Applying logit() to transform proportions

How to do it...

Fitting a robust linear model

How to do it...

See also

Taking variance into account with weighted least squares

How to do it...

See also

Using arbitrary precision for optimization

Getting ready

How to do it...

See also

Using arbitrary precision for linear algebra

Getting ready

How to do it...

See also

5. Web Mining, Databases, and Big Data

Introduction

Simulating web browsing

Getting ready

How to do it…

See also

Scraping the Web

Getting ready

How to do it…

Dealing with non-ASCII text and HTML entities

Getting ready

How to do it…

See also

Implementing association tables

Getting ready

How to do it…

Setting up database migration scripts

Getting ready

How to do it…

See also

Adding a table column to an existing table

Getting ready

How to do it…

Adding indices after table creation

Getting ready

How to do it…

How it works…

See also

Setting up a test web server

Getting ready

How to do it…

Implementing a star schema with fact and dimension tables

How to do it…

See also

Using HDFS

Getting ready

How to do it…

See also

Setting up Spark

Getting ready

How to do it…

See also

Clustering data with Spark

Getting ready

How to do it…

How it works…

There's more…

See also

6. Signal Processing and Timeseries

Introduction

Spectral analysis with periodograms

How to do it...

See also

Estimating power spectral density with the Welch method

How to do it...

See also

Analyzing peaks

How to do it...

See also

Measuring phase synchronization

How to do it...

See also

Exponential smoothing

How to do it...

See also

Evaluating smoothing

How to do it...

See also

Using the Lomb-Scargle periodogram

How to do it...

See also

Analyzing the frequency spectrum of audio

How to do it...

See also

Analyzing signals with the discrete cosine transform

How to do it...

See also

Block bootstrapping time series data

How to do it...

See also

Moving block bootstrapping time series data

How to do it...

See also

Applying the discrete wavelet transform

Getting started

How to do it...

See also

7. Selecting Stocks with Financial Data Analysis

Introduction

Computing simple and log returns

How to do it...

See also

Ranking stocks with the Sharpe ratio and liquidity

How to do it...

See also

Ranking stocks with the Calmar and Sortino ratios

How to do it...

See also

Analyzing returns statistics

How to do it...

Correlating individual stocks with the broader market

How to do it...

Exploring risk and return

How to do it...

See also

Examining the market with the non-parametric runs test

How to do it...

See also

Testing for random walks

How to do it...

See also

Determining market efficiency with autoregressive models

How to do it...

See also

Creating tables for a stock prices database

How to do it...

Populating the stock prices database

How to do it...

Optimizing an equal weights two-asset portfolio

How to do it...

See also

8. Text Mining and Social Network Analysis

Introduction

Creating a categorized corpus

Getting ready

How to do it...

See also

Tokenizing news articles in sentences and words

Getting ready

How to do it...

See also

Stemming, lemmatizing, filtering, and TF-IDF scores

Getting ready

How to do it...

How it works

See also

Recognizing named entities

Getting ready

How to do it...

How it works

See also

Extracting topics with non-negative matrix factorization

How to do it...

How it works

See also

Implementing a basic terms database

How to do it...

How it works

See also

Computing social network density

Getting ready

How to do it...

See also

Calculating social network closeness centrality

Getting ready

How to do it...

See also

Determining the betweenness centrality

Getting ready

How to do it...

See also

Estimating the average clustering coefficient

Getting ready

How to do it...

See also

Calculating the assortativity coefficient of a graph

Getting ready

How to do it...

See also

Getting the clique number of a graph

Getting ready

How to do it...

See also

Creating a document graph with cosine similarity

How to do it...

See also

9. Ensemble Learning and Dimensionality Reduction

Introduction

Recursively eliminating features

How to do it...

How it works

See also

Applying principal component analysis for dimension reduction

How to do it...

See also

Applying linear discriminant analysis for dimension reduction

How to do it...

See also

Stacking and majority voting for multiple models

How to do it...

See also

Learning with random forests

How to do it...

There's more…

See also

Fitting noisy data with the RANSAC algorithm

How to do it...

See also

Bagging to improve results

How to do it...

See also

Boosting for better learning

How to do it...

See also

Nesting cross-validation

How to do it...

See also

Reusing models with joblib

How to do it...

See also

Hierarchically clustering data

How to do it...

See also

Taking a Theano tour

Getting ready

How to do it...

See also

10. Evaluating Classifiers, Regressors, and Clusters

Introduction

Getting classification straight with the confusion matrix

How to do it...

How it works

See also

Computing precision, recall, and F1-score

How to do it...

See also

Examining a receiver operating characteristic and the area under a curve

How to do it...

See also

Visualizing the goodness of fit

How to do it...

See also

Computing MSE and median absolute error

How to do it...

See also

Evaluating clusters with the mean silhouette coefficient

How to do it...

See also

Comparing results with a dummy classifier

How to do it...

See also

Determining MAPE and MPE

How to do it...

See also

Comparing with a dummy regressor

How to do it...

See also

Calculating the mean absolute error and the residual sum of squares

How to do it...

See also

Examining the kappa of classification

How to do it...

How it works

See also

Taking a look at the Matthews correlation coefficient

How to do it...

See also

11. Analyzing Images

Introduction

Setting up OpenCV

Getting ready

How to do it...

How it works

There's more

Applying Scale-Invariant Feature Transform (SIFT)

Getting ready

How to do it...

See also

Detecting features with SURF

Getting ready

How to do it...

See also

Quantizing colors

Getting ready

How to do it...

See also

Denoising images

Getting ready

How to do it...

See also

Extracting patches from an image

Getting ready

How to do it...

See also

Detecting faces with Haar cascades

Getting ready

How to do it...

See also

Searching for bright stars

Getting ready

How to do it...

See also

Extracting metadata from images

Getting ready

How to do it...

See also

Extracting texture features from images

Getting ready

How to do it...

See also

Applying hierarchical clustering on images

How to do it...

See also

Segmenting images with spectral clustering

How to do it...

See also

12. Parallelism and Performance

Introduction

Just-in-time compiling with Numba

Getting ready

How to do it...

How it works

See also

Speeding up numerical expressions with Numexpr

How to do it...

How it works

See also

Running multiple threads with the threading module

How to do it...

See also

Launching multiple tasks with the concurrent.futures module

How to do it...

See also

Accessing resources asynchronously with the asyncio module

How to do it...

See also

Distributed processing with execnet

Getting ready

How to do it...

See also

Profiling memory usage

Getting ready

How to do it...

See also

Calculating the mean, variance, skewness, and kurtosis on the fly

Getting ready

How to do it...

See also

Caching with a least recently used cache

Getting ready

How to do it...

See also

Caching HTTP requests

Getting ready

How to do it...

See also

Streaming counting with the Count-min sketch

How to do it...

See also

Harnessing the power of the GPU with OpenCL

Getting ready

How to do it...

See also

A. Glossary

B. Function Reference

IPython

Matplotlib

NumPy

pandas

Scikit-learn

SciPy

Seaborn

Statsmodels

C. Online Resources

IPython notebooks and open data

Mathematics and statistics

Presentations

D. Tips and Tricks for Command-Line and Miscellaneous Tools

IPython notebooks

Command-line tools

The alias command

Command-line history

Reproducible sessions

Docker tips

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部