售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Dedication
About Packt
Why subscribe?
Foreword
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Getting Started with Pandas
Introduction to Data Analysis
Chapter materials
Fundamentals of data analysis
Data collection
Data wrangling
Exploratory data analysis
Drawing conclusions
Statistical foundations
Sampling
Descriptive statistics
Measures of central tendency
Mean
Median
Mode
Measures of spread
Range
Variance
Standard deviation
Coefficient of variation
Interquartile range
Quartile coefficient of dispersion
Summarizing data
Common distributions
Scaling data
Quantifying relationships between variables
Pitfalls of summary statistics
Prediction and forecasting
Inferential statistics
Setting up a virtual environment
Virtual environments
venv
Windows
Linux/macOS
Anaconda
Installing the required Python packages
Why pandas?
Jupyter Notebooks
Launching JupyterLab
Validating the virtual environment
Closing JupyterLab
Summary
Exercises
Further reading
Working with Pandas DataFrames
Chapter materials
Pandas data structures
Series
Index
DataFrame
Bringing data into a pandas DataFrame
From a Python object
From a file
From a database
From an API
Inspecting a DataFrame object
Examining the data
Describing and summarizing the data
Grabbing subsets of the data
Selection
Slicing
Indexing
Filtering
Adding and removing data
Creating new data
Deleting unwanted data
Summary
Exercises
Further reading
Section 2: Using Pandas for Data Analysis
Data Wrangling with Pandas
Chapter materials
What is data wrangling?
Data cleaning
Data transformation
The wide data format
The long data format
Data enrichment
Collecting temperature data
Cleaning up the data
Renaming columns
Type conversion
Reordering, reindexing, and sorting data
Restructuring the data
Pivoting DataFrames
Melting DataFrames
Handling duplicate, missing, or invalid data
Finding the problematic data
Mitigating the issues
Summary
Exercises
Further reading
Aggregating Pandas DataFrames
Chapter materials
Database-style operations on DataFrames
Querying DataFrames
Merging DataFrames
DataFrame operations
Arithmetic and statistics
Binning and thresholds
Applying functions
Window calculations
Pipes
Aggregations with pandas and numpy
Summarizing DataFrames
Using groupby
Pivot tables and crosstabs
Time series
Time-based selection and filtering
Shifting for lagged data
Differenced data
Resampling
Merging
Summary
Exercises
Further reading
Visualizing Data with Pandas and Matplotlib
Chapter materials
An introduction to matplotlib
The basics
Plot components
Additional options
Plotting with pandas
Evolution over time
Relationships between variables
Distributions
Counts and frequencies
The pandas.plotting subpackage
Scatter matrices
Lag plots
Autocorrelation plots
Bootstrap plots
Summary
Exercises
Further reading
Plotting with Seaborn and Customization Techniques
Chapter materials
Utilizing seaborn for advanced plotting
Categorical data
Correlations and heatmaps
Regression plots
Distributions
Faceting
Formatting
Titles and labels
Legends
Formatting axes
Customizing visualizations
Adding reference lines
Shading regions
Annotations
Colors
Summary
Exercises
Further reading
Section 3: Applications - Real-World Analyses Using Pandas
Financial Analysis - Bitcoin and the Stock Market
Chapter materials
Building a Python package
Package structure
Overview of the stock_analysis package
Data extraction with pandas
The StockReader class
Bitcoin historical data from HTML
S&P 500 historical data from Yahoo! Finance
FAANG historical data from IEX
Exploratory data analysis
The Visualizer class family
Visualizing a stock
Visualizing multiple assets
Technical analysis of financial instruments
The StockAnalyzer class
The AssetGroupAnalyzer class
Comparing assets
Modeling performance
The StockModeler class
Time series decomposition
ARIMA
Linear regression with statsmodels
Comparing models
Summary
Exercises
Further reading
Rule-Based Anomaly Detection
Chapter materials
Simulating login attempts
Assumptions
The login_attempt_simulator package
Helper functions
The LoginAttemptSimulator class
Simulating from the command line
Exploratory data analysis
Rule-based anomaly detection
Percent difference
Tukey fence
Z-score
Evaluating performance
Summary
Exercises
Further reading
Section 4: Introduction to Machine Learning with Scikit-Learn
Getting Started with Machine Learning in Python
Chapter materials
Learning the lingo
Exploratory data analysis
Red wine quality data
White and red wine chemical properties data
Planets and exoplanets data
Preprocessing data
Training and testing sets
Scaling and centering data
Encoding data
Imputing
Additional transformers
Pipelines
Clustering
k-means
Grouping planets by orbit characteristics
Elbow point method for determining k
Interpreting centroids and visualizing the cluster space
Evaluating clustering results
Regression
Linear regression
Predicting the length of a year on a planet
Interpreting the linear regression equation
Making predictions
Evaluating regression results
Analyzing residuals
Metrics
Classification
Logistic regression
Predicting red wine quality
Determining wine type by chemical properties
Evaluating classification results
Confusion matrix
Classification metrics
Accuracy and error rate
Precision and recall
F score
Sensitivity and specificity
ROC curve
Precision-recall curve
Summary
Exercises
Further reading
Making Better Predictions - Optimizing Models
Chapter materials
Hyperparameter tuning with grid search
Feature engineering
Interaction terms and polynomial features
Dimensionality reduction
Feature unions
Feature importances
Ensemble methods
Random forest
Gradient boosting
Voting
Inspecting classification prediction confidence
Addressing class imbalance
Under-sampling
Over-sampling
Regularization
Summary
Exercises
Further reading
Machine Learning Anomaly Detection
Chapter materials
Exploring the data
Unsupervised methods
Isolation forest
Local outlier factor
Comparing models
Supervised methods
Baselining
Dummy classifier
Naive Bayes
Logistic regression
Online learning
Creating the PartialFitPipeline subclass
Stochastic gradient descent classifier
Building our initial model
Evaluating the model
Updating the model
Presenting our results
Further improvements
Summary
Exercises
Further reading
Section 5: Additional Resources
The Road Ahead
Data resources
Python packages
Seaborn
Scikit-learn
Searching for data
APIs
Websites
Finance
Government data
Health and economy
Social networks
Sports
Miscellaneous
Practicing working with data
Python practice
Summary
Exercises
Further reading
Solutions
Appendix
Data analysis workflow
Choosing the appropriate visualization
Machine learning workflow
Other Books You May Enjoy
Leave a review - let other readers know what you think
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜