万本电子书0元读

万本电子书0元读

顶部广告

Scala for Machine Learning电子书

售       价:¥

6人正在读 | 0人评论 9.8

作       者:Patrick R. Nicolas

出  版  社:Packt Publishing

出版时间:2014-12-17

字       数:757.1万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Are you curious about AIAll you need is a good understanding of the Scala programming language, a basic knowledge of statistics, a keen interest in Big Data processing, and this book!
目录展开

Scala for Machine Learning

Table of Contents

Scala for Machine Learning

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Getting Started

Mathematical notation for the curious

Why machine learning?

Classification

Prediction

Optimization

Regression

Why Scala?

Abstraction

Scalability

Configurability

Maintainability

Computation on demand

Model categorization

Taxonomy of machine learning algorithms

Unsupervised learning

Clustering

Dimension reduction

Supervised learning

Generative models

Discriminative models

Reinforcement learning

Tools and frameworks

Java

Scala

Apache Commons Math

Description

Licensing

Installation

JFreeChart

Description

Licensing

Installation

Other libraries and frameworks

Source code

Context versus view bounds

Presentation

Primitives and implicits

Primitive types

Type conversions

Operators

Immutability

Performance of Scala iterators

Let's kick the tires

Overview of computational workflows

Writing a simple workflow

Selecting a dataset

Loading the dataset

Preprocessing the dataset

Basic statistics

Normalization and Gauss distribution

Plotting data

Creating a model (learning)

Classify the data

Summary

2. Hello World!

Modeling

A model by any other name

Model versus design

Selecting a model's features

Extracting features

Designing a workflow

The computational framework

The pipe operator

Monadic data transformation

Dependency injection

Workflow modules

The workflow factory

Examples of workflow components

The preprocessing module

The clustering module

Assessing a model

Validation

Key metrics

Implementation

K-fold cross-validation

Bias-variance decomposition

Overfitting

Summary

3. Data Preprocessing

Time series

Moving averages

The simple moving average

The weighted moving average

The exponential moving average

Fourier analysis

Discrete Fourier transform (DFT)

DFT-based filtering

Detection of market cycles

The Kalman filter

The state space estimation

The transition equation

The measurement equation

The recursive algorithm

Prediction

Correction

Kalman smoothing

Experimentation

Alternative preprocessing techniques

Summary

4. Unsupervised Learning

Clustering

K-means clustering

Measuring similarity

Overview of the K-means algorithm

Step 1 – cluster configuration

Defining clusters

Defining K-means

Initializing clusters

Step 2 – cluster assignment

Step 3 – iterative reconstruction

Curse of dimensionality

Experiment

Tuning the number of clusters

Validation

Expectation-maximization (EM) algorithm

Gaussian mixture model

EM overview

Implementation

Testing

Online EM

Dimension reduction

Principal components analysis (PCA)

Algorithm

Implementation

Test case

Evaluation

Other dimension reduction techniques

Performance considerations

K-means

EM

PCA

Summary

5. Naïve Bayes Classifiers

Probabilistic graphical models

Naïve Bayes classifiers

Introducing the multinomial Naïve Bayes

Formalism

The frequentist perspective

The predictive model

The zero-frequency problem

Implementation

Software design

Training

Classification

Labeling

Results

Multivariate Bernoulli classification

Model

Implementation

Naïve Bayes and text mining

Basics of information retrieval

Implementation

Extraction of terms

Scoring of terms

Testing

Retrieving textual information

Evaluation

Pros and cons

Summary

6. Regression and Regularization

Linear regression

One-variate linear regression

Implementation

Test case

Ordinary least squares (OLS) regression

Design

Implementation

Test case 1 – trending

Test case 2 – features selection

Regularization

Ln roughness penalty

The ridge regression

Implementation

The test case

Numerical optimization

The logistic regression

The logit function

Binomial classification

Software design

The training workflow

Configuring the least squares optimizer

Computing the Jacobian matrix

Defining the exit conditions

Defining the least squares problem

Minimizing the loss function

Test

Classification

Summary

7. Sequential Data Models

Markov decision processes

The Markov property

The first-order discrete Markov chain

The hidden Markov model (HMM)

Notation

The lambda model

HMM execution state

Evaluation (CF-1)

Alpha class (the forward variable)

Beta class (the backward variable)

Training (CF-2)

Baum-Welch estimator (EM)

Decoding (CF-3)

The Viterbi algorithm

Putting it all together

Test case

The hidden Markov model for time series analysis

Conditional random fields

Introduction to CRF

Linear chain CRF

CRF and text analytics

The feature functions model

Software design

Implementation

Building the training set

Generating tags

Extracting data sequences

CRF control parameters

Putting it all together

Tests

The training convergence profile

Impact of the size of the training set

Impact of the L2 regularization factor

Comparing CRF and HMM

Performance consideration

Summary

8. Kernel Models and Support Vector Machines

Kernel functions

Overview

Common discriminative kernels

The support vector machine (SVM)

The linear SVM

The separable case (hard margin)

The nonseparable case (soft margin)

The nonlinear SVM

Max-margin classification

The kernel trick

Support vector classifier (SVC)

The binary SVC

LIBSVM

Software design

Configuration parameters

SVM Formulation

The SVM kernel function

SVM execution

SVM implementation

C-penalty and margin

Kernel evaluation

Application to risk analysis

Features and labels

Anomaly detection with one-class SVC

Support vector regression (SVR)

Overview

SVR versus linear regression

Performance considerations

Summary

9. Artificial Neural Networks

Feed-forward neural networks (FFNN)

The Biological background

The mathematical background

The multilayer perceptron (MLP)

The activation function

The network architecture

Software design

Model definition

Layers

Synapses

Connections

Training cycle/epoch

Step 1 – input forward propagation

The computational model

Objective

Softmax

Step 2 – sum of squared errors

Step 3 – error backpropagation

Error propagation

The computational model

Step 4 – synapse/weights adjustment

Momentum factor for gradient descent

Implementation

Step 5 – convergence criteria

Configuration

Putting all together

Training strategies and classification

Online versus batch training

Regularization

Model instantiation

Prediction

Evaluation

Impact of learning rate

Impact of the momentum factor

Test case

Implementation

Models evaluation

Impact of hidden layers architecture

Benefits and limitations

Summary

10. Genetic Algorithms

Evolution

The origin

NP problems

Evolutionary computing

Genetic algorithms and machine learning

Genetic algorithm components

Encodings

Value encoding

Predicate encoding

Solution encoding

The encoding scheme

Flat encoding

Hierarchical encoding

Genetic operators

Selection

Crossover

Mutation

Fitness score

Implementation

Software design

Key components

Selection

Controlling population growth

GA configuration

Crossover

Population

Chromosomes

Genes

Mutation

Population

Chromosomes

Genes

The reproduction cycle

GA for trading strategies

Definition of trading strategies

Trading operators

The cost/unfitness function

Trading signals

Trading strategies

Signal encoding

Test case

Data extraction

Initial population

Configuration

GA instantiation

GA execution

Tests

The unweighted score

The weighted score

Advantages and risks of genetic algorithms

Summary

11. Reinforcement Learning

Introduction

The problem

A solution – Q-learning

Terminology

Concept

Value of policy

Bellman optimality equations

Temporal difference for model-free learning

Action-value iterative update

Implementation

Software design

States and actions

Search space

Policy and action-value

The Q-learning training

Tail recursion to the rescue

Prediction

Option trading using Q-learning

Option property

Option model

Function approximation

Constrained state-transition

Putting it all together

Evaluation

Pros and cons of reinforcement learning

Learning classifier systems

Introduction to LCS

Why LCS

Terminology

Extended learning classifier systems (XCS)

XCS components

Application to portfolio management

XCS core data

XCS rules

Covering

Example of implementation

Benefits and limitation of learning classifier systems

Summary

12. Scalable Frameworks

Overview

Scala

Controlling object creation

Parallel collections

Processing a parallel collection

Benchmark framework

Performance evaluation

Scalability with Actors

The Actor model

Partitioning

Beyond actors – reactive programming

Akka

Master-workers

Messages exchange

Worker actors

The workflow controller

The master Actor

Master with routing

Distributed discrete Fourier transform

Limitations

Futures

The Actor life cycle

Blocking on futures

Handling future callbacks

Putting all together

Apache Spark

Why Spark

Design principles

In-memory persistency

Laziness

Transforms and Actions

Shared variables

Experimenting with Spark

Deploying Spark

Using Spark shell

MLlib

RDD generation

K-means using Spark

Performance evaluation

Tuning parameters

Tests

Performance considerations

Pros and cons

0xdata Sparkling Water

Summary

A. Basic Concepts

Scala programming

List of libraries

Format of code snippets

Encapsulation

Class constructor template

Companion objects versus case classes

Enumerations versus case classes

Overloading

Design template for classifiers

Data extraction

Data sources

Extraction of documents

Matrix class

Mathematics

Linear algebra

QR Decomposition

LU factorization

LDL decomposition

Cholesky factorization

Singular value decomposition

Eigenvalue decomposition

Algebraic and numerical libraries

First order predicate logic

Jacobian and Hessian matrices

Summary of optimization techniques

Gradient descent methods

Steepest descent

Conjugate gradient

Stochastic gradient descent

Quasi-Newton algorithms

BFGS

L-BFGS

Nonlinear least squares minimization

Gauss-Newton

Levenberg-Marquardt

Lagrange multipliers

Overview of dynamic programming

Finances 101

Fundamental analysis

Technical analysis

Terminology

Trading signals and strategy

Price patterns

Options trading

Financial data sources

Suggested online courses

References

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部