售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Dedication
About Packt
Why subscribe?
Packt.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Data Mining and Getting Started with Python Tools
Descriptive, predictive, and prescriptive analytics
What will and will not be covered in this book
Recommended readings for further explanation
Setting up Python environments for data mining
Installing the Anaconda distribution and Conda package manager
Installing on Linux
Installing on Windows
Installing on macOS
Launching the Spyder IDE
Launching a Jupyter Notebook
Installing high-performance Python distribution
Recommended libraries and how to install
Recommended libraries
Summary
Basic Terminology and Our End-to-End Example
Basic data terminology
Sample spaces
Variable types
Data types
Basic summary statistics
An end-to-end example of data mining in Python
Loading data into memory – viewing and managing with ease using pandas
Plotting and exploring data – harnessing the power of Seaborn
Transforming data – PCA and LDA with scikit-learn
Quantifying separations – k-means clustering and the silhouette score
Making decisions or predictions
Summary
Collecting, Exploring, and Visualizing Data
Types of data sources and loading into pandas
Databases
Basic Structured Query Language (SQL) queries
Disks
Web sources
From URLs
From Scikit-learn and Seaborn-included sets
Access, search, and sanity checks with pandas
Basic plotting in Seaborn
Popular types of plots for visualizing data
Scatter plots
Histograms
Jointplots
Violin plots
Pairplots
Summary
Cleaning and Readying Data for Analysis
The scikit-learn transformer API
Cleaning input data
Missing values
Finding and removing missing values
Imputing to replace the missing values
Feature scaling
Normalization
Standardization
Handling categorical data
Ordinal encoding
One-hot encoding
Label encoding
High-dimensional data
Dimension reduction
Feature selection
Feature filtering
The variance threshold
The correlation coefficient
Wrapper methods
Sequential feature selection
Transformation
PCA
LDA
Summary
Grouping and Clustering Data
Introducing clustering concepts
Location of the group
Euclidean space (centroids)
Non-Euclidean space (medioids)
Similarity
Euclidean space
The Euclidean distance
The Manhattan distance
Maximum distance
Non-Euclidean space
The cosine distance
The Jaccard distance
Termination condition
With known number of groupings
Without known number of groupings
Quality score and silhouette score
Clustering methods
Means separation
K-means
Finding k
K-means++
Mini batch K-means
Hierarchical clustering
Reuse the dendrogram to find number of clusters
Plot dendrogram
Density clustering
Spectral clustering
Summary
Prediction with Regression and Classification
Scikit-learn Estimator API
Introducing prediction concepts
Prediction nomenclature
Mathematical machinery
Loss function
Gradient descent
Fit quality regimes
Regression
Metrics of regression model prediction
Regression example dataset
Linear regression
Extension to multivariate form
Regularization with penalized regression
Regularization penalties
Classification
Classification example dataset
Metrics of classification model prediction
Multi-class classification
One-versus-all
One-versus-one
Logistic regression
Regularized logistic regression
Support vector machines
Soft-margin with C
The kernel trick
Tree-based classification
Decision trees
Node splitting with Gini
Random forest
Avoid overfitting and speed up the fits
Built-in validation with bagging
Tuning a prediction model
Cross-validation
Introduction of the validation set
Multiple validation sets with k-fold method
Grid search for hyperparameter tuning
Summary
Advanced Topics - Building a Data Processing Pipeline and Deploying It
Pipelining your analysis
Scikit-learn's pipeline object
Deploying the model
Serializing a model and storing with the pickle module
Loading a serialized model and predicting
Python-specific deployment concerns
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜