万本电子书0元读

万本电子书0元读

顶部广告

Apache Spark 2.x Machine Learning Cookbook电子书

售       价:¥

3人正在读 | 0人评论 9.8

作       者:Siamak Amirghodsi,Meenakshi Rajendran,Broderick Hall,Shuen Mei

出  版  社:Packt Publishing

出版时间:2017-09-22

字       数:70.2万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Simplify machine learning model implementations with Spark About This Book ? Solve the day-to-day problems of data science with Spark ? This unique cookbook consists of exciting and intuitive numerical recipes ? Optimize your work by acquiring, cleaning, analyzing, predicting, and visualizing your data Who This Book Is For This book is for Scala developers with a fairly good exposure to and understanding of machine learning techniques, but lack practical implementations with Spark. A solid knowledge of machine learning algorithms is assumed, as well as hands-on experience of implementing ML algorithms with Scala. However, you do not need to be acquainted with the Spark ML libraries and ecosystem. What You Will Learn ? Get to know how Scala and Spark go hand-in-hand for developers when developing ML systems with Spark ? Build a recommendation engine that scales with Spark ? Find out how to build unsupervised clustering systems to classify data in Spark ? Build machine learning systems with the Decision Tree and Ensemble models in Spark ? Deal with the curse of high-dimensionality in big data using Spark ? Implement Text analytics for Search Engines in Spark ? Streaming Machine Learning System implementation using Spark In Detail Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability, and optimization. Learning about algorithms enables a wide range of applications, from everyday tasks such as product recommendations and spam filtering to cutting edge applications such as self-driving cars and personalized medicine. You will gain hands-on experience of applying these principles using Apache Spark, a resilient cluster computing system well suited for large-scale machine learning tasks. This book begins with a quick overview of setting up the necessary IDEs to facilitate the execution of code examples that will be covered in various chapters. It also highlights some key issues developers face while working with machine learning algorithms on the Spark platform. We progress by uncovering the various Spark APIs and the implementation of ML algorithms with developing classification systems, recommendation engines, text analytics, clustering, and learning systems. Toward the final chapters, we’ll focus on building high-end applications and explain various unsupervised methodologies and challenges to tackle when implementing with big data ML systems. Style and approach This book is packed with intuitive recipes supported with line-by-line explanations to help you understand how to optimize your work flow and resolve problems when working with complex data modeling tasks and predictive algorithms. This is a valuable resource for data scientists and those working on large scale data projects.
目录展开

Title Page

Copyright

Apache Spark 2.x Machine Learning Cookbook

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Practical Machine Learning with Spark Using Scala

Introduction

Apache Spark

Machine learning

Scala

Software versions and libraries used in this book

Downloading and installing the JDK

Getting ready

How to do it...

Downloading and installing IntelliJ

Getting ready

How to do it...

Downloading and installing Spark

Getting ready

How to do it...

Configuring IntelliJ to work with Spark and run Spark ML sample codes

Getting ready

How to do it...

There's more...

See also

Running a sample ML code from Spark

Getting ready

How to do it...

Identifying data sources for practical machine learning

Getting ready

How to do it...

See also

Running your first program using Apache Spark 2.0 with the IntelliJ IDE

How to do it...

How it works...

There's more...

See also

How to add graphics to your Spark program

How to do it...

How it works...

There's more...

See also

Just Enough Linear Algebra for Machine Learning with Spark

Introduction

Package imports and initial setup for vectors and matrices

How to do it...

There's more...

See also

Creating DenseVector and setup with Spark 2.0

How to do it...

How it works...

There's more...

See also

Creating SparseVector and setup with Spark

How to do it...

How it works...

There's more...

See also

Creating dense matrix and setup with Spark 2.0

Getting ready

How to do it...

How it works...

There's more...

See also

Using sparse local matrices with Spark 2.0

How to do it...

How it works...

There's more...

See also

Performing vector arithmetic using Spark 2.0

How to do it...

How it works...

There's more...

See also

Performing matrix arithmetic using Spark 2.0

How to do it...

How it works...

Exploring RowMatrix in Spark 2.0

How to do it...

How it works...

There's more...

See also

Exploring Distributed IndexedRowMatrix in Spark 2.0

How to do it...

How it works...

See also

Exploring distributed CoordinateMatrix in Spark 2.0

How to do it...

How it works...

See also

Exploring distributed BlockMatrix in Spark 2.0

How to do it...

How it works...

See also

Spark's Three Data Musketeers for Machine Learning - Perfect Together

Introduction

RDDs - what started it all...

DataFrame - a natural evolution to unite API and SQL via a high-level API

Dataset - a high-level unifying Data API

Creating RDDs with Spark 2.0 using internal data sources

How to do it...

How it works...

Creating RDDs with Spark 2.0 using external data sources

How to do it...

How it works...

There's more...

See also

Transforming RDDs with Spark 2.0 using the filter() API

How to do it...

How it works...

There's more...

See also

Transforming RDDs with the super useful flatMap() API

How to do it...

How it works...

There's more...

See also

Transforming RDDs with set operation APIs

How to do it...

How it works...

See also

RDD transformation/aggregation with groupBy() and reduceByKey()

How to do it...

How it works...

There's more...

See also

Transforming RDDs with the zip() API

How to do it...

How it works...

See also

Join transformation with paired key-value RDDs

How to do it...

How it works...

There's more...

Reduce and grouping transformation with paired key-value RDDs

How to do it...

How it works...

See also

Creating DataFrames from Scala data structures

How to do it...

How it works...

There's more...

See also

Operating on DataFrames programmatically without SQL

How to do it...

How it works...

There's more...

See also

Loading DataFrames and setup from an external source

How to do it...

How it works...

There's more...

See also

Using DataFrames with standard SQL language - SparkSQL

How to do it...

How it works...

There's more...

See also

Working with the Dataset API using a Scala Sequence

How to do it...

How it works...

There's more...

See also

Creating and using Datasets from RDDs and back again

How to do it...

How it works...

There's more...

See also

Working with JSON using the Dataset API and SQL together

How to do it...

How it works...

There's more...

See also

Functional programming with the Dataset API using domain objects

How to do it...

How it works...

There's more...

See also

Common Recipes for Implementing a Robust Machine Learning System

Introduction

Spark's basic statistical API to help you build your own algorithms

How to do it...

How it works...

There's more...

See also

ML pipelines for real-life machine learning applications

How to do it...

How it works...

There's more...

See also

Normalizing data with Spark

How to do it...

How it works...

There's more...

See also

Splitting data for training and testing

How to do it...

How it works...

There's more...

See also

Common operations with the new Dataset API

How to do it...

How it works...

There's more...

See also

Creating and using RDD versus DataFrame versus Dataset from a text file in Spark 2.0

How to do it...

How it works...

There's more...

See also

LabeledPoint data structure for Spark ML

How to do it...

How it works...

There's more...

See also

Getting access to Spark cluster in Spark 2.0

How to do it...

How it works...

There's more...

See also

Getting access to Spark cluster pre-Spark 2.0

How to do it...

How it works...

There's more...

See also

Getting access to SparkContext vis-a-vis SparkSession object in Spark 2.0

How to do it...

How it works...

There's more...

See also

New model export and PMML markup in Spark 2.0

How to do it...

How it works...

There's more...

See also

Regression model evaluation using Spark 2.0

How to do it...

How it works...

There's more...

See also

Binary classification model evaluation using Spark 2.0

How to do it...

How it works...

There's more...

See also

Multiclass classification model evaluation using Spark 2.0

How to do it...

How it works...

There's more...

See also

Multilabel classification model evaluation using Spark 2.0

How to do it...

How it works...

There's more...

See also

Using the Scala Breeze library to do graphics in Spark 2.0

How to do it...

How it works...

There's more...

See also

Practical Machine Learning with Regression and Classification in Spark 2.0 - Part I

Introduction

Fitting a linear regression line to data the old fashioned way

How to do it...

How it works...

There's more...

See also

Generalized linear regression in Spark 2.0

How to do it...

How it works...

There's more...

See also

Linear regression API with Lasso and L-BFGS in Spark 2.0

How to do it...

How it works...

There's more...

See also

Linear regression API with Lasso and 'auto' optimization selection in Spark 2.0

How to do it...

How it works...

There's more...

See also

Linear regression API with ridge regression and 'auto' optimization selection in Spark 2.0

How to do it...

How it works...

There's more...

See also

Isotonic regression in Apache Spark 2.0

How to do it...

How it works...

There's more...

See also

Multilayer perceptron classifier in Apache Spark 2.0

How to do it...

How it works...

There's more...

See also

One-vs-Rest classifier (One-vs-All) in Apache Spark 2.0

How to do it...

How it works...

There's more...

See also

Survival regression – parametric AFT model in Apache Spark 2.0

How to do it...

How it works...

There's more...

See also

Practical Machine Learning with Regression and Classification in Spark 2.0 - Part II

Introduction

Linear regression with SGD optimization in Spark 2.0

How to do it...

How it works...

There's more...

See also

Logistic regression with SGD optimization in Spark 2.0

How to do it...

How it works...

There's more...

See also

Ridge regression with SGD optimization in Spark 2.0

How to do it...

How it works...

There's more...

See also

Lasso regression with SGD optimization in Spark 2.0

How to do it...

How it works...

There's more...

See also

Logistic regression with L-BFGS optimization in Spark 2.0

How to do it...

How it works...

There's more...

See also

Support Vector Machine (SVM) with Spark 2.0

How to do it...

How it works...

There's more...

See also

Naive Bayes machine learning with Spark 2.0 MLlib

How to do it...

How it works...

There's more...

See also

Exploring ML pipelines and DataFrames using logistic regression in Spark 2.0

Getting ready

How to do it...

How it works...

There's more...

PipeLine

Vectors

See also

Recommendation Engine that Scales with Spark

Introduction

Content filtering

Collaborative filtering

Neighborhood method

Latent factor models techniques

Setting up the required data for a scalable recommendation engine in Spark 2.0

How to do it...

How it works...

There's more...

See also

Exploring the movies data details for the recommendation system in Spark 2.0

How to do it...

How it works...

There's more...

See also

Exploring the ratings data details for the recommendation system in Spark 2.0

How to do it...

How it works...

There's more...

See also

Building a scalable recommendation engine using collaborative filtering in Spark 2.0

How to do it...

How it works...

There's more...

See also

Dealing with implicit input for training

Unsupervised Clustering with Apache Spark 2.0

Introduction

Building a KMeans classifying system in Spark 2.0

How to do it...

How it works...

KMeans (Lloyd Algorithm)

KMeans++ (Arthur's algorithm)

KMeans|| (pronounced as KMeans Parallel)

There's more...

See also

Bisecting KMeans, the new kid on the block in Spark 2.0

How to do it...

How it works...

There's more...

See also

Using Gaussian Mixture and Expectation Maximization (EM) in Spark to classify data

How to do it...

How it works...

New GaussianMixture()

There's more...

See also

Classifying the vertices of a graph using Power Iteration Clustering (PIC) in Spark 2.0

How to do it...

How it works...

There's more...

See also

Latent Dirichlet Allocation (LDA) to classify documents and text into topics

How to do it...

How it works...

There's more...

See also

Streaming KMeans to classify data in near real-time

How to do it...

How it works...

There's more...

See also

Optimization - Going Down the Hill with Gradient Descent

Introduction

How do machines learn using an error-based system?

Optimizing a quadratic cost function and finding the minima using just math to gain insight

How to do it...

How it works...

There's more...

See also

Coding a quadratic cost function optimization using Gradient Descent (GD) from scratch

How to do it...

How it works...

There's more...

See also

Coding Gradient Descent optimization to solve Linear Regression from scratch

How to do it...

How it works...

There's more...

See also

Normal equations as an alternative for solving Linear Regression in Spark 2.0

How to do it...

How it works...

There's more...

See also

Building Machine Learning Systems with Decision Tree and Ensemble Models

Introduction

Ensemble models

Measures of impurity

Getting and preparing real-world medical data for exploring Decision Trees and Ensemble models in Spark 2.0

How to do it...

There's more...

Building a classification system with Decision Trees in Spark 2.0

How to do it

How it works...

There's more...

See also

Solving Regression problems with Decision Trees in Spark 2.0

How to do it...

How it works...

See also

Building a classification system with Random Forest Trees in Spark 2.0

How to do it...

How it works...

See also

Solving regression problems with Random Forest Trees in Spark 2.0

How to do it...

How it works...

See also

Building a classification system with Gradient Boosted Trees (GBT) in Spark 2.0

How to do it...

How it works....

There's more...

See also

Solving regression problems with Gradient Boosted Trees (GBT) in Spark 2.0

How to do it...

How it works...

There's more...

See also

Curse of High-Dimensionality in Big Data

Introduction

Feature selection versus feature extraction

Two methods of ingesting and preparing a CSV file for processing in Spark

How to do it...

How it works...

There's more...

See also

Singular Value Decomposition (SVD) to reduce high-dimensionality in Spark

How to do it...

How it works...

There's more...

See also

Principal Component Analysis (PCA) to pick the most effective latent factor for machine learning in Spark

How to do it...

How it works...

There's more...

See also

Implementing Text Analytics with Spark 2.0 ML Library

Introduction

Doing term frequency with Spark - everything that counts

How to do it...

How it works...

There's more...

See also

Displaying similar words with Spark using Word2Vec

How to do it...

How it works...

There's more...

See also

Downloading a complete dump of Wikipedia for a real-life Spark ML project

How to do it...

There's more...

See also

Using Latent Semantic Analysis for text analytics with Spark 2.0

How to do it...

How it works...

There's more...

See also

Topic modeling with Latent Dirichlet allocation in Spark 2.0

How to do it...

How it works...

There's more...

See also

Spark Streaming and Machine Learning Library

Introduction

Structured streaming for near real-time machine learning

How to do it...

How it works...

There's more...

See also

Streaming DataFrames for real-time machine learning

How to do it...

How it works...

There's more...

See also

Streaming Datasets for real-time machine learning

How to do it...

How it works...

There's more...

See also

Streaming data and debugging with queueStream

How to do it...

How it works...

See also

Downloading and understanding the famous Iris data for unsupervised classification

How to do it...

How it works...

There's more...

See also

Streaming KMeans for a real-time on-line classifier

How to do it...

How it works...

There's more...

See also

Downloading wine quality data for streaming regression

How to do it...

How it works...

There's more...

Streaming linear regression for a real-time regression

How to do it...

How it works...

There's more...

See also

Downloading Pima Diabetes data for supervised classification

How to do it...

How it works...

There's more...

See also

Streaming logistic regression for an on-line classifier

How to do it...

How it works...

There's more...

See also

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部