售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Julia for Data Science
Julia for Data Science
Credits
About the Author
About the Reviewer
www.PacktPub.com
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. The Groundwork – Julias Environment
Julia is different
Setting up the environment
Installing Julia (Linux)
Installing Julia (Mac)
Installing Julia (Windows)
Exploring the source code
Using REPL
Using Jupyter Notebook
Package management
Pkg.status() – package status
Pkg.add() – adding packages
Working with unregistered packages
Pkg.update() – package update
METADATA repository
Developing packages
Creating a new package
Parallel computation using Julia
Julia's key feature – multiple dispatch
Methods in multiple dispatch
Ambiguities – method definitions
Facilitating language interoperability
Calling Python code in Julia
Summary
References
2. Data Munging
What is data munging?
The data munging process
What is a DataFrame?
The NA data type and its importance
DataArray – a series-like data structure
DataFrames – tabular data structures
Installation and using DataFrames.jl
Writing the data to a file
Working with DataFrames
Understanding DataFrames joins
The Split-Apply-Combine strategy
Reshaping the data
Sorting a dataset
Formula - a special data type for mathematical expressions
Pooling data
Web scraping
Summary
References
3. Data Exploration
Sampling
Population
Weight vectors
Inferring column types
Basic statistical summaries
Calculating the mean of the array or dataframe
Scalar statistics
Standard deviations and variances
Measures of variation
Z-scores
Entropy
Quantiles
Modes
Summary of datasets
Scatter matrix and covariance
Computing deviations
Rankings
Counting functions
Histograms
Correlation analysis
Summary
References
4. Deep Dive into Inferential Statistics
Installation
Understanding the sampling distribution
Understanding the normal distribution
Parameter estimation
Type hierarchy in Distributions.jl
Understanding Sampleable
Representing probabilistic distributions
Univariate distributions
Retrieving parameters
Statistical functions
Evaluation of probability
Sampling in Univariate distributions
Understanding Discrete Univariate distributions and types
Bernoulli distribution
Binomial distribution
Continuous distributions
Cauchy distribution
Chi distribution
Chi-square distribution
Truncated distributions
Truncated normal distributions
Understanding multivariate distributions
Multinomial distribution
Multivariate normal distribution
Dirichlet distribution
Understanding matrixvariate distributions
Wishart distribution
Inverse-Wishart distribution
Distribution fitting
Distribution selection
Symmetrical distributions
Skew distributions to the right
Skew distributions to the left
Maximum Likelihood Estimation
Sufficient statistics
Maximum-a-Posteriori estimation
Confidence interval
Interpreting the confidence intervals
Usage
Understanding z-score
Interpreting z-scores
Understanding the significance of the P-value
One-tailed and two-tailed test
Summary
References
5. Making Sense of Data Using Visualization
Difference between using and importall
Pyplot for Julia
Multimedia I/O
Installation
Basic plotting
Plot using sine and cosine
Unicode plots
Installation
Examples
Generating Unicode scatterplots
Generating Unicode line plots
Visualizing using Vega
Installation
Examples
Scatterplot
Heatmaps in Vega
Data visualization using Gadfly
Installing Gadfly
Interacting with Gadfly using plot function
Example
Using Gadfly to plot DataFrames
Using Gadfly to visualize functions and expressions
Generating an image with multiple layers
Generating plots with different aesthetics using statistics
The step function
The quantile-quantile function
Ticks in Gadfly
Generating plots with different aesthetics using Geometry
Boxplots
Using Geometry to create density plots
Using Geometry to create histograms
Bar plots
Histogram2d - the two-dimensional histogram
Smooth line plot
Subplot grid
Horizontal and vertical lines
Plotting a ribbon
Violin plots
Beeswarm plots
Elements - scale
x_continuous and y_continuous
x_discrete and y_discrete
Continuous color scale
Elements - guide
Understanding how Gadfly works
Summary
References
6. Supervised Machine Learning
What is machine learning?
Uses of machine learning
Machine learning and ethics
Machine learning – the process
Different types of machine learning
What is bias-variance trade-off?
Effects of overfitting and underfitting on a model
Understanding decision trees
Building decision trees – divide and conquer
Where should we use decision tree learning?
Advantages of decision trees
Disadvantages of decision trees
Decision tree learning algorithms
How a decision tree algorithm works
Understanding and measuring purity of node
An example
Supervised learning using Naïve Bayes
Advantages of Naïve Bayes
Disadvantages of Naïve Bayes
Uses of Naïve Bayes classification
How Bayesian methods work
Posterior probabilities
Class-conditional probabilities
Prior probabilities
Evidence
The bag of words
Advantages of using Naïve Bayes as a spam filter
Disadvantages of Naïve Bayes filters
Examples of Naïve Bayes
Summary
References
7. Unsupervised Machine Learning
Understanding clustering
How are clusters formed?
Types of clustering
Hierarchical clustering
Overlapping, exclusive, and fuzzy clustering
Differences between partial versus complete clustering
K-means clustering
K-means algorithm
Algorithm of K-means
Associating the data points with the closest centroid
How to choose the initial centroids?
Time-space complexity of K-means algorithms
Issues with K-means
Empty clusters in K-means
Outliers in the dataset
Different types of cluster
K-means – strengths and weaknesses
Bisecting K-means algorithm
Getting deep into hierarchical clustering
Agglomerative hierarchical clustering
How proximity is computed
Strengths and weaknesses of hierarchical clustering
Understanding the DBSCAN technique
So, what is density?
How are points classified using center-based density
DBSCAN algorithm
Strengths and weaknesses of the DBSCAN algorithm
Cluster validation
Example
Summary
References
8. Creating Ensemble Models
What is ensemble learning?
Understanding ensemble learning
How to construct an ensemble
Combination strategies
Subsampling training dataset
Bagging
When does bagging work?
Boosting
Boosting approach
Boosting algorithm
AdaBoost – boosting by sampling
What is boosting doing?
The bias and variance decomposition
Manipulating the input features
Injecting randomness
Random forests
Features of random forests
How do random forests work?
The out-of-bag (oob) error estimate
Gini importance
Proximities
Implementation in Julia
Learning and prediction
Why is ensemble learning superior?
Applications of ensemble learning
Summary
References
9. Time Series
What is forecasting?
Decision-making process
The dynamics of a system
What is TimeSeries?
Trends, seasonality, cycles, and residuals
Difference from standard linear regression
Basic objectives of the analysis
Types of models
Important characteristics to consider first
Systematic pattern and random noise
Two general aspects of time series patterns
Trend analysis
Smoothing
Fitting a function
Analysis of seasonality
Autocorrelation correlogram
Examining correlograms
Partial autocorrelations
Removing serial dependency
ARIMA
Common processes
ARIMA methodology
Identification
Estimation and forecasting
The constant in ARIMA models
Identification phase
Seasonal models
Parameter estimation
Evaluation of the model
Interrupted time series ARIMA
Exponential smoothing
Simple exponential smoothing
Indices of lack of fit (error)
Implementation in Julia
The TimeArray time series type
Using time constraints
when
from
to
findwhen
find
Mathematical, comparison, and logical operators
Applying methods to TimeSeries
Lag
Lead
Percentage
Combining methods in TimeSeries
Merge
Collapse
Map
Summary
References
10. Collaborative Filtering and Recommendation System
What is a recommendation system?
The utility matrix
Association rule mining
Measures of association rules
How to generate the item sets
How to generate the rules
Content-based filtering
Steps involved in content-based filtering
Advantages of content-based filtering
Limitations of content-based filtering
Collaborative filtering
Baseline prediction methods
User-based collaborative filtering
Item-item collaborative filtering
Algorithm of item-based collaborative filtering
Building a movie recommender system
Summary
11. Introduction to Deep Learning
Revisiting linear algebra
A gist of scalars
A brief outline of vectors
The importance of matrices
What are tensors?
Probability and information theory
Why probability?
Differences between machine learning and deep learning
What is deep learning?
Deep feedforward networks
Understanding the hidden layers in a neural network
The motivation of neural networks
Understanding regularization
Optimizing deep learning models
The case of optimization
Implementation in Julia
Network architecture
Types of layers
Neurons (activation functions)
Understanding regularizers for ANN
Norm constraints
Using solvers in deep neural networks
Coffee breaks
Image classification with pre-trained Imagenet CNN
Summary
References
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜