当当云阅读 > 进口书 > 外文原版书 > 电脑/网络 > Machine Learning with Apache Spark Quick Start Guide

| | 手机阅读

扫描下载当当云阅读App

Machine Learning with Apache Spark Quick Start Guide电子书

售价：¥

0人正在读 | 0人评论

9.8

作者：Jillur Quddus

出版社：Packt Publishing

出版时间：2018-12-26

字数：30.9万

所属分类：进口书 > 外文原版书 > 电脑/网络

温馨提示：数字商品不支持退换货，不提供源文件，不支持导出打印

为你推荐

读书简介
目录
累计评论(0条)

读书简介
目录
累计评论(0条)

Combine advanced analytics including Machine Learning, Deep Learning Neural Networks and Natural Language Processing with modern scalable technologies including Apache Spark to derive actionable insights from Big Data in real-time Key Features *Make a hands-on start in the fields of Big Data, Distributed Technologies and Machine Learning *Learn how to design, develop and interpret the results of common Machine Learning algorithms *Uncover hidden patterns in your data in order to derive real actionable insights and business value Book Description Every person and every organization in the world manages data, whether they realize it or not. Data is used to describe the world around us and can be used for almost any purpose, from analyzing consumer habits to fighting disease and serious organized crime. Ultimately, we manage data in order to derive value from it, and many organizations around the world have traditionally invested in technology to help process their data faster and more efficiently. But we now live in an interconnected world driven by mass data creation and consumption where data is no longer rows and columns restricted to a spreadsheet, but an organic and evolving asset in its own right. With this realization comes major challenges for organizations: how do we manage the sheer size of data being created every second (think not only spreadsheets and databases, but also social media posts, images, videos, music, blogs and so on)? And once we can manage all of this data, how do we derive real value from it? The focus of Machine Learning with Apache Spark is to help us answer these questions in a hands-on manner. We introduce the latest scalable technologies to help us manage and process big data. We then introduce advanced analytical algorithms applied to real-world use cases in order to uncover patterns, derive actionable insights, and learn from this big data. What you will learn *Understand how Spark fits in the context of the big data ecosystem *Understand how to deploy and configure a local development environment using Apache Spark *Understand how to design supervised and unsupervised learning models *Build models to perform NLP, deep learning, and cognitive services using Spark ML libraries *Design real-time machine learning pipelines in Apache Spark *Become familiar with advanced techniques for processing a large volume of data by applying machine learning algorithms Who this book is for This book is aimed at Business Analysts, Data Analysts and Data Scientists who wish to make a hands-on start in order to take advantage of modern Big Data technologies combined with Advanced Analytics.

目录展开

Title Page

Machine Learning with Apache Spark Quick Start Guide

Dedication

About Packt

Why subscribe?

Packt.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Reviews

The Big Data Ecosystem

A brief history of data

Vertical scaling

Master/slave architecture

Sharding

Data processing and analysis

Data becomes big

Big data ecosystem

Horizontal scaling

Distributed systems

Distributed data stores

Distributed filesystems

Distributed databases

NoSQL databases

Document databases

Columnar databases

Key-value databases

Graph databases

CAP theorem

Distributed search engines

Distributed processing

MapReduce

Apache Spark

RDDs, DataFrames, and datasets

RDDs

DataFrames

Datasets

Jobs, stages, and tasks

Job

Stage

Tasks

Distributed messaging

Distributed streaming

Distributed ledgers

Artificial intelligence and machine learning

Cloud computing platforms

Data insights platform

Reference logical architecture

Data sources layer

Ingestion layer

Persistent data storage layer

Data processing layer

Serving data storage layer

Data intelligence layer

Unified access layer

Data insights and reporting layer

Platform governance, management, and administration

Open source implementation

Summary

Setting Up a Local Development Environment

CentOS Linux 7 virtual machine

Java SE Development Kit 8

Scala 2.11

Anaconda 5 with Python 3

Basic conda commands

Additional Python packages

Jupyter Notebook

Starting Jupyter Notebook

Troubleshooting Jupyter Notebook

Apache Spark 2.3

Spark binaries

Local working directories

Spark configuration

Spark properties

Environmental variables

Standalone master server

Spark worker node

PySpark and Jupyter Notebook

Apache Kafka 2.0

Kafka binaries

Local working directories

Kafka configuration

Start the Kafka server

Testing Kafka

Summary

Artificial Intelligence and Machine Learning

Artificial intelligence

Machine learning

Supervised learning

Unsupervised learning

Reinforced learning

Deep learning

Natural neuron

Artificial neuron

Weights

Activation function

Heaviside step function

Sigmoid function

Hyperbolic tangent function

Artificial neural network

Single-layer perceptron

Multi-layer perceptron

NLP

Cognitive computing

Machine learning pipelines in Apache Spark

Summary

Supervised Learning Using Apache Spark

Linear regression

Case study – predicting bike sharing demand

Univariate linear regression

Residuals

Root mean square error

R-squared

Univariate linear regression in Apache Spark

Multivariate linear regression

Correlation

Multivariate linear regression in Apache Spark

Logistic regression

Threshold value

Confusion matrix

Receiver operator characteristic curve

Area under the ROC curve

Case study – predicting breast cancer

Classification and Regression Trees

Case study – predicting political affiliation

Random forests

K-Fold cross validation

Summary

Unsupervised Learning Using Apache Spark

Clustering

Euclidean distance

Hierarchical clustering

K-means clustering

Case study – detecting brain tumors

Feature vectors from images

Image segmentation

K-means cost function

K-means clustering in Apache Spark

Principal component analysis

Case study – movie recommendation system

Covariance matrix

Identity matrix

Eigenvectors and eigenvalues

PCA in Apache Spark

Summary

Natural Language Processing Using Apache Spark

Feature transformers

Document

Corpus

Preprocessing pipeline

Tokenization

Stop words

Stemming

Lemmatization

Normalization

Feature extractors

Bag of words

Term frequency–inverse document frequency

Case study – sentiment analysis

NLP pipeline

NLP in Apache Spark

Summary

Deep Learning Using Apache Spark

Artificial neural networks

Multilayer perceptrons

MLP classifier

Input layer

Hidden layers

Output layer

Case study 1 – OCR

Input data

Training architecture

Detecting patterns in the hidden layer

Classifying in the output layer

MLPs in Apache Spark

Convolutional neural networks

End-to-end neural architecture

Input layer

Convolution layers

Rectified linear units

Pooling layers

Fully connected layer

Output layer

Case study 2 – image recognition

InceptionV3 via TensorFlow

Deep learning pipelines for Apache Spark

Image library

PySpark image recognition application

Spark submit

Image-recognition results

Case study 3 – image prediction

PySpark image-prediction application

Image-prediction results

Summary

Real-Time Machine Learning Using Apache Spark

Distributed streaming platform

Distributed stream processing engines

Streaming using Apache Spark

Spark Streaming (DStreams)

Structured Streaming

Stream processing pipeline

Case study – real-time sentiment analysis

Start Zookeeper and Kafka Servers

Kafka topic

Twitter developer account

Twitter apps and the Twitter API

Application configuration

Kafka Twitter producer application

Preprocessing and feature vectorization pipelines

Kafka Twitter consumer application

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论，分享你的想法吧！

发表评论

买过这本书的人还买过

读了这本书的人还在读

支持设备

Building Web Applications with Python and Neo4j ￥63.21

Sumit Gupta

￥63.21

NumPy Essentials ￥54.49

Leo (Liang-Huan) Chin

￥54.49

Mastering Python ￥71.93

Rick van Hattem

￥71.93

Mastering Python Regular Expressions ￥45.77

Félix López

￥45.77

Learning Python ￥90.46

Fabrizio Romano

￥90.46

Cybersecurity – Attack and Defense Strategies ￥73.02

Yuri Diogenes,Erdal Ozkaya

￥73.02

Learning Python Application Development ￥80.65

Ninad Sathaye

￥80.65

Learning Python Network Programming ￥90.46

Dr. M. O. Faruque Sarker

￥90.46

Selenium Essentials ￥54.49

Prashanth Sams

￥54.49

Learning NumPy ￥49.04

Ivan Idris

￥49.04

更多同类图书 >