万本电子书0元读

万本电子书0元读

顶部广告

Scala for Machine Learning - Second Edition电子书

售       价:¥

2人正在读 | 0人评论 9.8

作       者:Patrick R. Nicolas

出  版  社:Packt Publishing

出版时间:2017-09-26

字       数:1168.4万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Leverage Scala and Machine Learning to study and construct systems that can learn from data About This Book ? Explore a broad variety of data processing, machine learning, and genetic algorithms through diagrams, mathematical formulation, and updated source code in Scala ? Take your expertise in Scala programming to the next level by creating and customizing AI applications ? Experiment with different techniques and evaluate their benefits and limitations using real-world applications in a tutorial style Who This Book Is For If you’re a data scientist or a data analyst with a fundamental knowledge of Scala who wants to learn and implement various Machine learning techniques, this book is for you. All you need is a good understanding of the Scala programming language, a basic knowledge of statistics, a keen interest in Big Data processing, and this book! What You Will Learn ? Build dynamic workflows for scientific computing ? Leverage open source libraries to extract patterns from time series ? Write your own classification, clustering, or evolutionary algorithm ? Perform relative performance tuning and evaluation of Spark ? Master probabilistic models for sequential data ? Experiment with advanced techniques such as regularization and kernelization ? Dive into neural networks and some deep learning architecture ? Apply some basic multiarm-bandit algorithms ? Solve big data problems with Scala parallel collections, Akka actors, and Apache Spark clusters ? Apply key learning strategies to a technical analysis of financial markets In Detail The discovery of information through data clustering and classification is becoming a key differentiator for competitive organizations. Machine learning applications are everywhere, from self-driving cars, engineering design, logistics, manufacturing, and trading strategies, to detection of genetic anomalies. The book is your one stop guide that introduces you to the functional capabilities of the Scala programming language that are critical to the creation of machine learning algorithms such as dependency injection and implicits. You start by learning data preprocessing and filtering techniques. Following this, you'll move on to unsupervised learning techniques such as clustering and dimension reduction, followed by probabilistic graphical models such as Na?ve Bayes, hidden Markov models and Monte Carlo inference. Further, it covers the discriminative algorithms such as linear, logistic regression with regularization, kernelization, support vector machines, neural networks, and deep learning. You’ll move on to evolutionary computing, multibandit algorithms, and reinforcement learning. Finally, the book includes a comprehensive overview of parallel computing in Scala and Akka followed by a de*ion of Apache Spark and its ML library. With updated codes based on the latest version of Scala and comprehensive examples, this book will ensure that you have more than just a solid fundamental knowledge in machine learning with Scala. Style and approach This book is designed as a tutorial with hands-on exercises using technical analysis of financial markets and corporate data. The approach of each chapter is such that it allows you to understand key concepts easily.
目录展开

Scala for Machine Learning Second Edition

Table of Contents

Scala for Machine Learning Second Edition

Credits

About the Author

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started

Mathematical notations for the curious

Why machine learning?

Classification

Prediction

Optimization

Regression

Why Scala?

Scala as a functional language

Abstraction

Higher kinded types

Functors

Monads

Scala as an object oriented language

Scala as a scalable language

Model categorization

Taxonomy of machine learning algorithms

Unsupervised learning

Clustering

Dimension reduction

Supervised learning

Generative models

Discriminative models

Semi-supervised learning

Reinforcement learning

Leveraging Java libraries

Tools and frameworks

Java

Scala

Eclipse Scala IDE

IntelliJ IDEA Scala plugin

Simple build tool

Apache Commons Math

Description

Licensing

Installation

JFreeChart

Description

Licensing

Installation

Other libraries and frameworks

Source code

Convention

Context bounds

Presentation

Primitives and implicits

Immutability

Let's kick the tires

Writing a simple workflow

Step 1 – scoping the problem

Step 2 – loading data

Step 3 – preprocessing data

Immutable normalization

Step 4 – discovering patterns

Analyzing data

Plotting data

Visualizing model features

Visualizing label

Step 5 – implementing the classifier

Selecting an optimizer

Training the model

Classifying observations

Step 6 – evaluating the model

Summary

2. Data Pipelines

Modeling

What is a model?

Model versus design

Selecting features

Extracting features

Defining a methodology

Monadic data transformation

Error handling

Monads to the rescue

Implicit models

Explicit models

Workflow computational model

Supporting mathematical abstractions

Step 1 – variable declaration

Step 2 – model definition

Step 3 – instantiation

Composing mixins to build workflow

Understanding the problem

Defining modules

Instantiating the workflow

Modularizing

Profiling data

Immutable statistics

Z-score and Gauss

Assessing a model

Validation

Key quality metrics

F-score for binomial classification

F-score for multinomial classification

Area under the curves

Area under PRC

Area under ROC

Cross-validation

One-fold cross-validation

K-fold cross-validation

Bias-variance decomposition

Overfitting

Summary

3. Data Preprocessing

Time series in Scala

Context bounds

Types and operations

Transpose operator

Differential operator

Lazy views

Moving averages

Simple moving average

Weighted moving average

Exponential moving average

Fourier analysis

Discrete Fourier transform (DFT)

DFT-based filtering

Detection of market cycles

The discrete Kalman filter

The state space estimation

The transition equation

The measurement equation

The recursive algorithm

Prediction

Correction

Kalman smoothing

Fixed lag smoothing

Experimentation

Benefits and drawbacks

Alternative preprocessing techniques

Summary

4. Unsupervised Learning

K-mean clustering

K-means

Measuring similarity

Defining the algorithm

Step 1 – Clusters configuration

Defining clusters

Initializing clusters

Step 2 – Clusters assignment

Step 3 – Reconstruction error minimization

Creating K-means components

Tail recursive implementation

Iterative implementation

Step 4 – Classification

Curse of dimensionality

Evaluation

The results

Tuning the number of clusters

Validation

Expectation-Maximization (EM)

Gaussian mixture model

EM overview

Implementation

Classification

Testing

Online EM

Summary

5. Dimension Reduction

Challenging model complexity

The divergences

The Kullback-Leibler divergence

Overview

Implementation

Testing

The mutual information

Principal components analysis (PCA)

Algorithm

Implementation

Test case

Evaluation

Extending PCA

Validation

Categorical features

Performance

Nonlinear models

Kernel PCA

Manifolds

Summary

6. Naïve Bayes Classifiers

Probabilistic graphical models

Naïve Bayes classifiers

Introducing the multinomial Naïve Bayes

Formalism

The frequentist perspective

The predictive model

The zero-frequency problem

Implementation

Design

Training

Class likelihood

Binomial model

Multinomial model

Classifier components

Classification

F1 Validation

Features extraction

Testing

Multivariate Bernoulli classification

Model

Implementation

Naïve Bayes and text mining

Basics information retrieval

Implementation

Analyzing documents

Extracting relative terms frequency

Generating the features

Testing

Retrieving textual information

Evaluating text mining classifier

Pros and cons

Summary

7. Sequential Data Models

Markov decision processes

The Markov property

The first-order discrete Markov chain

The hidden Markov model (HMM)

Notation

The lambda model

Design

Evaluation (CF-1)

Alpha (forward pass)

Beta (backward pass)

Training (CF-2)

Baum-Welch estimator (EM)

Decoding (CF-3)

The Viterbi algorithm

Putting it all together

Test case 1 – Training

Test case 2 – Evaluation

HMM as filtering technique

Conditional random fields

Introduction to CRF

Linear chain CRF

Regularized CRF and text analytics

The feature functions model

Design

Implementation

Configuring the CRF classifier

Training the CRF model

Applying the CRF model

Tests

The training convergence profile

Impact of the size of the training set

Impact of L2 regularization factor

Comparing CRF and HMM

Performance consideration

Summary

8. Monte Carlo Inference

The purpose of sampling

Gaussian sampling

Box-Muller transform

Monte Carlo approximation

Overview

Implementation

Bootstrapping with replacement

Overview

Resampling

Implementation

Pros and cons of bootstrap

Markov Chain Monte Carlo (MCMC)

Overview

Metropolis-Hastings (MH)

Implementation

Test

Summary

9. Regression and Regularization

Linear regression

Univariate linear regression

Implementation

Test case

Ordinary least squares (OLS) regression

Design

Implementation

Test case 1 – trending

Test case 2 – features selection

Regularization

Ln roughness penalty

Ridge regression

Design

Implementation

Test case

Numerical optimization

Logistic regression

Logistic function

Design

Training workflow

Step 1 – configuring the optimizer

Step 2 – computing the Jacobian matrix

Step 3 – managing the convergence of optimizer

Step 4 – defining the least squares problem

Step 5 – minimizing the sum of square errors

Test

Classification

Summary

10. Multilayer Perceptron

Feed-forward neural networks (FFNN)

The biological background

Mathematical background

The multilayer perceptron (MLP)

Activation function

Network topology

Design

Configuration

Network components

Network topology

Input and hidden layers

Output layer

Synapses

Connections

Weights initialization

Model

Problem types (modes)

Online versus batch training

Training epoch

Step 1 – input forward propagation

Computational flow

Error functions

Operating modes

Softmax

Step 2 – error backpropagation

Weights adjustment

Error propagation

The computational model

Step 3 – exit condition

Putting it all together

Training and classification

Regularization

Model generation

Fast Fisher-Yates shuffle

Prediction

Model fitness

Evaluation

Execution profile

Impact of learning rate

Impact of the momentum factor

Impact of the number of hidden layers

Test case

Implementation

Models evaluation

Impact of hidden layers' architecture

Benefits and limitations

Summary

11. Deep Learning

Sparse autoencoder

Undercomplete autoencoder

Deterministic autoencoder

Categorization

Feed-forward sparse, undercomplete autoencoder

Sparsity updating equations

Implementation

Restricted Boltzmann Machines (RBMs)

Boltzmann machine

Binary restricted Boltzmann machines

Conditional probabilities

Sampling

Log-likelihood gradient

Contrastive divergence

Configuration parameters

Unsupervised learning

Convolution neural networks

Local receptive fields

Weight sharing

Convolution layers

Sub-sampling layers

Putting it all together

Summary

12. Kernel Models and SVM

Kernel functions

Overview

Common discriminative kernels

Kernel monadic composition

The support vector machine (SVM)

The linear SVM

The separable case (hard margin)

The non-separable case (soft margin)

The nonlinear SVM

Max-margin classification

The kernel trick

Support vector classifier (SVC)

The binary SVC

LIBSVM

Design

Configuration parameters

The SVM formulation

The SVM kernel function

The SVM execution

Interface to LIBSVM

Training

Classification

C-penalty and margin

Kernel evaluation

Application to risk analysis

Anomaly detection with one-class SVC

Support vector regression (SVR)

Overview

SVR versus linear regression

Performance considerations

Summary

13. Evolutionary Computing

Evolution

The origin

NP problems

Evolutionary computing

Genetic algorithms and machine learning

Genetic algorithm components

Encodings

Value encoding

Predicate encoding

Solution encoding

The encoding scheme

Flat encoding

Hierarchical encoding

Genetic operators

Selection

Crossover

Mutation

Fitness score

Implementation

Software design

Key components

Population

Chromosomes

Genes

Selection

Controlling population growth

GA configuration

Crossover

Population

Chromosomes

Genes

Mutation

Population

Chromosomes

Genes

Reproduction

Solver

GA for trading strategies

Definition of trading strategies

Trading operators

The cost function

Market signals

Trading strategies

Signal encoding

Test case – Fall 2008 market crash

Creating trading strategies

Configuring the optimizer

Finding the best trading strategy

Tests

The weighted score

The unweighted score

Advantages and risks of genetic algorithms

Summary

14. Multiarmed Bandits

K-armed bandit

Exploration-exploitation trade-offs

Expected cumulative regret

Bayesian Bernoulli bandits

Epsilon-greedy algorithm

Thompson sampling

Bandit context

Prior/posterior beta distribution

Implementation

Simulated exploration and exploitation

Upper bound confidence

Confidence interval

Implementation

Summary

15. Reinforcement Learning

Reinforcement learning

Understanding the challenge

A solution – Q-learning

Terminology

Concept

Value of policy

Bellman optimality equations

Temporal difference for model-free learning

Action-value iterative update

Implementation

Software design

The states and actions

The search space

The policy and action-value

The Q-learning components

The Q-learning training

Tail recursion to the rescue

Validation

The prediction

Option trading using Q-learning

Option property

Option model

Quantization

Putting it all together

Evaluation

Pros and cons of reinforcement learning

Learning classifier systems

Introduction to LCS

Combining learning and evolution

Terminology

Extended learning classifier systems

XCS components

Application to portfolio management

XCS core data

XCS rules

Covering

Example of implementation

Benefits and limitations of learning classifier systems

Summary

16. Parallelism in Scala and Akka

Overview

Scala

Object creation

Streams

Memory on demand

Design for reusing Streams memory

Parallel collections

Processing a parallel collection

Benchmark framework

Performance evaluation

Scalability with Actors

The Actor model

Partitioning

Beyond Actors – reactive programming

Akka

Master-workers

Messages exchange

Worker Actors

The workflow controller

The master Actor

Master with routing

Distributed discrete Fourier transform

Limitations

Futures

Blocking on futures

Future callbacks

Putting it all together

Summary

17. Apache Spark MLlib

Overview

Apache Spark core

Why Spark?

Design principles

In-memory persistency

Laziness

Transforms and actions

Shared variables

Experimenting with Spark

Deploying Spark

Using Spark shell

MLlib library

Overview

Creating RDDs

K-means using MLlib

Tests

Reusable ML pipelines

Reusable ML transforms

Encoding features

Training the model

Predictive model

Training summary statistics

Validating the model

Grid search

Apache Spark and ScalaTest

Extending Spark

Kullback-Leibler divergence

Implementation

Kullback-Leibler evaluator

Streaming engine

Why streaming?

Batch and real-time processing

Architecture overview

Discretized streams

Use case – continuous parsing

Checkpointing

Performance evaluation

Tuning parameters

Performance considerations

Pros and cons

Summary

A. Basic Concepts

Scala programming

List of libraries and tools

Code snippets format

Best practices

Encapsulation

Class constructor template

Companion objects versus case classes

Enumerations versus case classes

Overloading

Design template for immutable classifiers

Utility classes

Data extraction

Financial data sources

Documents extraction

DMatrix class

Counter

Monitor

Mathematics

Linear algebra

QR decomposition

LU factorization

LDL decomposition

Cholesky factorization

Singular Value Decomposition (SVD)

Eigenvalue decomposition

Algebraic and numerical libraries

First order predicate logic

Jacobian and Hessian matrices

Summary of optimization techniques

Gradient descent methods

Steepest descent

Conjugate gradient

Stochastic gradient descent

Quasi-Newton algorithms

BFGS

L-BFGS

Nonlinear least squares minimization

Gauss-Newton

Levenberg-Marquardt

Lagrange multipliers

Overview dynamic programming

Finances 101

Fundamental analysis

Technical analysis

Terminology

Trading data

Trading signal and strategy

Price patterns

Options trading

Financial data sources

Suggested online courses

References

B. References

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Chapter 13

Chapter 14

Chapter 15

Chapter 16

Chapter 17

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部