万本电子书0元读

万本电子书0元读

顶部广告

Hands-On Data Analysis with Pandas电子书

售       价:¥

10人正在读 | 0人评论 9.8

作       者:Stefanie Molin

出  版  社:Packt Publishing

出版时间:2019-07-26

字       数:88.6万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Get to grips with pandas—a versatile and high-performance Python library for data manipulation, analysis, and discovery Key Features * Perform efficient data analysis and manipulation tasks using pandas * Apply pandas to different real-world domains using step-by-step demonstrations * Get accustomed to using pandas as an effective data exploration tool Book Description Data analysis has become a necessary skill in a variety of positions where knowing how to work with data and extract insights can generate significant value. Hands-On Data Analysis with Pandas will show you how to analyze your data, get started with machine learning, and work effectively with Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn. Using real-world datasets, you will learn how to use the powerful pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will learn how to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding chapters, you will explore some applications of anomaly detection, regression, clustering, and classification, using scikit-learn, to make predictions based on past data. By the end of this book, you will be equipped with the skills you need to use pandas to ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets. What you will learn * Understand how data analysts and scientists gather and analyze data * Perform data analysis and data wrangling in Python * Combine, group, and aggregate data from multiple sources * Create data visualizations with pandas, matplotlib, and seaborn * Apply machine learning (ML) algorithms to identify patterns and make predictions * Use Python data science libraries to analyze real-world datasets * Use pandas to solve common data representation and analysis problems * Build Python scripts, modules, and packages for reusable analysis code Who this book is for This book is for data analysts, data science beginners, and Python developers who want to explore each stage of data analysis and scientific computing using a wide range of datasets. You will also find this book useful if you are a data scientist who is looking to implement pandas in machine learning. Working knowledge of Python programming language will be beneficial.
目录展开

Dedication

About Packt

Why subscribe?

Foreword

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Getting Started with Pandas

Introduction to Data Analysis

Chapter materials

Fundamentals of data analysis

Data collection

Data wrangling

Exploratory data analysis

Drawing conclusions

Statistical foundations

Sampling

Descriptive statistics

Measures of central tendency

Mean

Median

Mode

Measures of spread

Range

Variance

Standard deviation

Coefficient of variation

Interquartile range

Quartile coefficient of dispersion

Summarizing data

Common distributions

Scaling data

Quantifying relationships between variables

Pitfalls of summary statistics

Prediction and forecasting

Inferential statistics

Setting up a virtual environment

Virtual environments

venv

Windows

Linux/macOS

Anaconda

Installing the required Python packages

Why pandas?

Jupyter Notebooks

Launching JupyterLab

Validating the virtual environment

Closing JupyterLab

Summary

Exercises

Further reading

Working with Pandas DataFrames

Chapter materials

Pandas data structures

Series

Index

DataFrame

Bringing data into a pandas DataFrame

From a Python object

From a file

From a database

From an API

Inspecting a DataFrame object

Examining the data

Describing and summarizing the data

Grabbing subsets of the data

Selection

Slicing

Indexing

Filtering

Adding and removing data

Creating new data

Deleting unwanted data

Summary

Exercises

Further reading

Section 2: Using Pandas for Data Analysis

Data Wrangling with Pandas

Chapter materials

What is data wrangling?

Data cleaning

Data transformation

The wide data format

The long data format

Data enrichment

Collecting temperature data

Cleaning up the data

Renaming columns

Type conversion

Reordering, reindexing, and sorting data

Restructuring the data

Pivoting DataFrames

Melting DataFrames

Handling duplicate, missing, or invalid data

Finding the problematic data

Mitigating the issues

Summary

Exercises

Further reading

Aggregating Pandas DataFrames

Chapter materials

Database-style operations on DataFrames

Querying DataFrames

Merging DataFrames

DataFrame operations

Arithmetic and statistics

Binning and thresholds

Applying functions

Window calculations

Pipes

Aggregations with pandas and numpy

Summarizing DataFrames

Using groupby

Pivot tables and crosstabs

Time series

Time-based selection and filtering

Shifting for lagged data

Differenced data

Resampling

Merging

Summary

Exercises

Further reading

Visualizing Data with Pandas and Matplotlib

Chapter materials

An introduction to matplotlib

The basics

Plot components

Additional options

Plotting with pandas

Evolution over time

Relationships between variables

Distributions

Counts and frequencies

The pandas.plotting subpackage

Scatter matrices

Lag plots

Autocorrelation plots

Bootstrap plots

Summary

Exercises

Further reading

Plotting with Seaborn and Customization Techniques

Chapter materials

Utilizing seaborn for advanced plotting

Categorical data

Correlations and heatmaps

Regression plots

Distributions

Faceting

Formatting

Titles and labels

Legends

Formatting axes

Customizing visualizations

Adding reference lines

Shading regions

Annotations

Colors

Summary

Exercises

Further reading

Section 3: Applications - Real-World Analyses Using Pandas

Financial Analysis - Bitcoin and the Stock Market

Chapter materials

Building a Python package

Package structure

Overview of the stock_analysis package

Data extraction with pandas

The StockReader class

Bitcoin historical data from HTML

S&P 500 historical data from Yahoo! Finance

FAANG historical data from IEX

Exploratory data analysis

The Visualizer class family

Visualizing a stock

Visualizing multiple assets

Technical analysis of financial instruments

The StockAnalyzer class

The AssetGroupAnalyzer class

Comparing assets

Modeling performance

The StockModeler class

Time series decomposition

ARIMA

Linear regression with statsmodels

Comparing models

Summary

Exercises

Further reading

Rule-Based Anomaly Detection

Chapter materials

Simulating login attempts

Assumptions

The login_attempt_simulator package

Helper functions

The LoginAttemptSimulator class

Simulating from the command line

Exploratory data analysis

Rule-based anomaly detection

Percent difference

Tukey fence

Z-score

Evaluating performance

Summary

Exercises

Further reading

Section 4: Introduction to Machine Learning with Scikit-Learn

Getting Started with Machine Learning in Python

Chapter materials

Learning the lingo

Exploratory data analysis

Red wine quality data

White and red wine chemical properties data

Planets and exoplanets data

Preprocessing data

Training and testing sets

Scaling and centering data

Encoding data

Imputing

Additional transformers

Pipelines

Clustering

k-means

Grouping planets by orbit characteristics

Elbow point method for determining k

Interpreting centroids and visualizing the cluster space

Evaluating clustering results

Regression

Linear regression

Predicting the length of a year on a planet

Interpreting the linear regression equation

Making predictions

Evaluating regression results

Analyzing residuals

Metrics

Classification

Logistic regression

Predicting red wine quality

Determining wine type by chemical properties

Evaluating classification results

Confusion matrix

Classification metrics

Accuracy and error rate

Precision and recall

F score

Sensitivity and specificity

ROC curve

Precision-recall curve

Summary

Exercises

Further reading

Making Better Predictions - Optimizing Models

Chapter materials

Hyperparameter tuning with grid search

Feature engineering

Interaction terms and polynomial features

Dimensionality reduction

Feature unions

Feature importances

Ensemble methods

Random forest

Gradient boosting

Voting

Inspecting classification prediction confidence

Addressing class imbalance

Under-sampling

Over-sampling

Regularization

Summary

Exercises

Further reading

Machine Learning Anomaly Detection

Chapter materials

Exploring the data

Unsupervised methods

Isolation forest

Local outlier factor

Comparing models

Supervised methods

Baselining

Dummy classifier

Naive Bayes

Logistic regression

Online learning

Creating the PartialFitPipeline subclass

Stochastic gradient descent classifier

Building our initial model

Evaluating the model

Updating the model

Presenting our results

Further improvements

Summary

Exercises

Further reading

Section 5: Additional Resources

The Road Ahead

Data resources

Python packages

Seaborn

Scikit-learn

Searching for data

APIs

Websites

Finance

Government data

Health and economy

Social networks

Sports

Miscellaneous

Practicing working with data

Python practice

Summary

Exercises

Further reading

Solutions

Appendix

Data analysis workflow

Choosing the appropriate visualization

Machine learning workflow

Other Books You May Enjoy

Leave a review - let other readers know what you think

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部