万本电子书0元读

万本电子书0元读

顶部广告

Julia for Data Science电子书

售       价:¥

8人正在读 | 0人评论 9.8

作       者:Anshul Joshi

出  版  社:Packt Publishing

出版时间:2016-09-01

字       数:319.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Explore the world of data science from scratch with Julia by your side About This Book An in-depth exploration of Julia's growing ecosystem of packages Work with the most powerful open-source libraries for deep learning, data wrangling, and data visualization Learn about deep learning using Mocha.jl and give speed and high performance to data analysis on large data sets Who This Book Is For This book is aimed at data analysts and aspiring data scientists who have a basic knowledge of Julia or are completely new to it. The book also appeals to those competent in R and Python and wish to adopt Julia to improve their skills set in Data Science. It would be beneficial if the readers have a good background in statistics and computational mathematics. What You Will Learn Apply statistical models in Julia for data-driven decisions Understanding the process of data munging and data preparation using Julia Explore techniques to visualize data using Julia and D3 based packages Using Julia to create self-learning systems using cutting edge machine learning algorithms Create supervised and unsupervised machine learning systems using Julia. Also, explore ensemble models Build a recommendation engine in Julia Dive into Julia’s deep learning framework and build a system using Mocha.jl In Detail Julia is a fast and high performing language that's perfectly suited to data science with a mature package ecosystem and is now feature complete. It is a good tool for a data science practitioner. There was a famous post at Harvard Business Review that Data Scientist is the sexiest job of the 21st century. (https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century). This book will help you get familiarised with Julia's rich ecosystem, which is continuously evolving, allowing you to stay on top of your game. This book contains the essentials of data science and gives a high-level overview of advanced statistics and techniques. You will dive in and will work on generating insights by performing inferential statistics, and will reveal hidden patterns and trends using data mining. This has the practical coverage of statistics and machine learning. You will develop knowledge to build statistical models and machine learning systems in Julia with attractive visualizations. You will then delve into the world of Deep learning in Julia and will understand the framework, Mocha.jl with which you can create artificial neural networks and implement deep learning. This book addresses the challenges of real-world data science problems, including data cleaning, data preparation, inferential statistics, statistical modeling, building high-performance machine learning systems and creating effective visualizations using Julia. Style and approach This practical and easy-to-follow yet comprehensive guide will get you learning about Julia with respect to data science. Each topic is explained thoroughly and placed in context. For the more inquisitive, we dive deeper into the language and its use case. This is the one true guide to working with Julia in data science.
目录展开

Julia for Data Science

Julia for Data Science

Credits

About the Author

About the Reviewer

www.PacktPub.com

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. The Groundwork – Julias Environment

Julia is different

Setting up the environment

Installing Julia (Linux)

Installing Julia (Mac)

Installing Julia (Windows)

Exploring the source code

Using REPL

Using Jupyter Notebook

Package management

Pkg.status() – package status

Pkg.add() – adding packages

Working with unregistered packages

Pkg.update() – package update

METADATA repository

Developing packages

Creating a new package

Parallel computation using Julia

Julia's key feature – multiple dispatch

Methods in multiple dispatch

Ambiguities – method definitions

Facilitating language interoperability

Calling Python code in Julia

Summary

References

2. Data Munging

What is data munging?

The data munging process

What is a DataFrame?

The NA data type and its importance

DataArray – a series-like data structure

DataFrames – tabular data structures

Installation and using DataFrames.jl

Writing the data to a file

Working with DataFrames

Understanding DataFrames joins

The Split-Apply-Combine strategy

Reshaping the data

Sorting a dataset

Formula - a special data type for mathematical expressions

Pooling data

Web scraping

Summary

References

3. Data Exploration

Sampling

Population

Weight vectors

Inferring column types

Basic statistical summaries

Calculating the mean of the array or dataframe

Scalar statistics

Standard deviations and variances

Measures of variation

Z-scores

Entropy

Quantiles

Modes

Summary of datasets

Scatter matrix and covariance

Computing deviations

Rankings

Counting functions

Histograms

Correlation analysis

Summary

References

4. Deep Dive into Inferential Statistics

Installation

Understanding the sampling distribution

Understanding the normal distribution

Parameter estimation

Type hierarchy in Distributions.jl

Understanding Sampleable

Representing probabilistic distributions

Univariate distributions

Retrieving parameters

Statistical functions

Evaluation of probability

Sampling in Univariate distributions

Understanding Discrete Univariate distributions and types

Bernoulli distribution

Binomial distribution

Continuous distributions

Cauchy distribution

Chi distribution

Chi-square distribution

Truncated distributions

Truncated normal distributions

Understanding multivariate distributions

Multinomial distribution

Multivariate normal distribution

Dirichlet distribution

Understanding matrixvariate distributions

Wishart distribution

Inverse-Wishart distribution

Distribution fitting

Distribution selection

Symmetrical distributions

Skew distributions to the right

Skew distributions to the left

Maximum Likelihood Estimation

Sufficient statistics

Maximum-a-Posteriori estimation

Confidence interval

Interpreting the confidence intervals

Usage

Understanding z-score

Interpreting z-scores

Understanding the significance of the P-value

One-tailed and two-tailed test

Summary

References

5. Making Sense of Data Using Visualization

Difference between using and importall

Pyplot for Julia

Multimedia I/O

Installation

Basic plotting

Plot using sine and cosine

Unicode plots

Installation

Examples

Generating Unicode scatterplots

Generating Unicode line plots

Visualizing using Vega

Installation

Examples

Scatterplot

Heatmaps in Vega

Data visualization using Gadfly

Installing Gadfly

Interacting with Gadfly using plot function

Example

Using Gadfly to plot DataFrames

Using Gadfly to visualize functions and expressions

Generating an image with multiple layers

Generating plots with different aesthetics using statistics

The step function

The quantile-quantile function

Ticks in Gadfly

Generating plots with different aesthetics using Geometry

Boxplots

Using Geometry to create density plots

Using Geometry to create histograms

Bar plots

Histogram2d - the two-dimensional histogram

Smooth line plot

Subplot grid

Horizontal and vertical lines

Plotting a ribbon

Violin plots

Beeswarm plots

Elements - scale

x_continuous and y_continuous

x_discrete and y_discrete

Continuous color scale

Elements - guide

Understanding how Gadfly works

Summary

References

6. Supervised Machine Learning

What is machine learning?

Uses of machine learning

Machine learning and ethics

Machine learning – the process

Different types of machine learning

What is bias-variance trade-off?

Effects of overfitting and underfitting on a model

Understanding decision trees

Building decision trees – divide and conquer

Where should we use decision tree learning?

Advantages of decision trees

Disadvantages of decision trees

Decision tree learning algorithms

How a decision tree algorithm works

Understanding and measuring purity of node

An example

Supervised learning using Naïve Bayes

Advantages of Naïve Bayes

Disadvantages of Naïve Bayes

Uses of Naïve Bayes classification

How Bayesian methods work

Posterior probabilities

Class-conditional probabilities

Prior probabilities

Evidence

The bag of words

Advantages of using Naïve Bayes as a spam filter

Disadvantages of Naïve Bayes filters

Examples of Naïve Bayes

Summary

References

7. Unsupervised Machine Learning

Understanding clustering

How are clusters formed?

Types of clustering

Hierarchical clustering

Overlapping, exclusive, and fuzzy clustering

Differences between partial versus complete clustering

K-means clustering

K-means algorithm

Algorithm of K-means

Associating the data points with the closest centroid

How to choose the initial centroids?

Time-space complexity of K-means algorithms

Issues with K-means

Empty clusters in K-means

Outliers in the dataset

Different types of cluster

K-means – strengths and weaknesses

Bisecting K-means algorithm

Getting deep into hierarchical clustering

Agglomerative hierarchical clustering

How proximity is computed

Strengths and weaknesses of hierarchical clustering

Understanding the DBSCAN technique

So, what is density?

How are points classified using center-based density

DBSCAN algorithm

Strengths and weaknesses of the DBSCAN algorithm

Cluster validation

Example

Summary

References

8. Creating Ensemble Models

What is ensemble learning?

Understanding ensemble learning

How to construct an ensemble

Combination strategies

Subsampling training dataset

Bagging

When does bagging work?

Boosting

Boosting approach

Boosting algorithm

AdaBoost – boosting by sampling

What is boosting doing?

The bias and variance decomposition

Manipulating the input features

Injecting randomness

Random forests

Features of random forests

How do random forests work?

The out-of-bag (oob) error estimate

Gini importance

Proximities

Implementation in Julia

Learning and prediction

Why is ensemble learning superior?

Applications of ensemble learning

Summary

References

9. Time Series

What is forecasting?

Decision-making process

The dynamics of a system

What is TimeSeries?

Trends, seasonality, cycles, and residuals

Difference from standard linear regression

Basic objectives of the analysis

Types of models

Important characteristics to consider first

Systematic pattern and random noise

Two general aspects of time series patterns

Trend analysis

Smoothing

Fitting a function

Analysis of seasonality

Autocorrelation correlogram

Examining correlograms

Partial autocorrelations

Removing serial dependency

ARIMA

Common processes

ARIMA methodology

Identification

Estimation and forecasting

The constant in ARIMA models

Identification phase

Seasonal models

Parameter estimation

Evaluation of the model

Interrupted time series ARIMA

Exponential smoothing

Simple exponential smoothing

Indices of lack of fit (error)

Implementation in Julia

The TimeArray time series type

Using time constraints

when

from

to

findwhen

find

Mathematical, comparison, and logical operators

Applying methods to TimeSeries

Lag

Lead

Percentage

Combining methods in TimeSeries

Merge

Collapse

Map

Summary

References

10. Collaborative Filtering and Recommendation System

What is a recommendation system?

The utility matrix

Association rule mining

Measures of association rules

How to generate the item sets

How to generate the rules

Content-based filtering

Steps involved in content-based filtering

Advantages of content-based filtering

Limitations of content-based filtering

Collaborative filtering

Baseline prediction methods

User-based collaborative filtering

Item-item collaborative filtering

Algorithm of item-based collaborative filtering

Building a movie recommender system

Summary

11. Introduction to Deep Learning

Revisiting linear algebra

A gist of scalars

A brief outline of vectors

The importance of matrices

What are tensors?

Probability and information theory

Why probability?

Differences between machine learning and deep learning

What is deep learning?

Deep feedforward networks

Understanding the hidden layers in a neural network

The motivation of neural networks

Understanding regularization

Optimizing deep learning models

The case of optimization

Implementation in Julia

Network architecture

Types of layers

Neurons (activation functions)

Understanding regularizers for ANN

Norm constraints

Using solvers in deep neural networks

Coffee breaks

Image classification with pre-trained Imagenet CNN

Summary

References

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部