万本电子书0元读

万本电子书0元读

顶部广告

Apache Mahout Essentials电子书

售       价:¥

3人正在读 | 0人评论 9.8

作       者:Jayani Withanawasam

出  版  社:Packt Publishing

出版时间:2015-06-19

字       数:94.3万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
If you are a Java developer or data scientist, haven't worked with Apache Mahout before, and want to get up to speed on implementing machine learning on big data, then this is the perfect guide for you.
目录展开

Apache Mahout Essentials

Table of Contents

Apache Mahout Essentials

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Introducing Apache Mahout

Machine learning in a nutshell

Features

Supervised learning versus unsupervised learning

Machine learning applications

Information retrieval

Business

Market segmentation (clustering)

Stock market predictions (regression)

Health care

Using a mammogram for cancer tissue detection

Machine learning libraries

Open source or commercial

Scalability

Languages used

Algorithm support

Batch processing versus stream processing

The story so far

Apache Mahout

Setting up Apache Mahout

How Apache Mahout works?

The high-level design

The distribution

From Hadoop MapReduce to Spark

Problems with Hadoop MapReduce

In-memory data processing with Spark and H2O

Why is Mahout shifting from Hadoop MapReduce to Spark?

When is it appropriate to use Apache Mahout?

Summary

2. Clustering

Unsupervised learning and clustering

Applications of clustering

Computer vision and image processing

Types of clustering

Hard clustering versus soft clustering

Flat clustering versus hierarchical clustering

Model-based clustering

K-Means clustering

Getting your hands dirty!

Running K-Means using Java programming

Data preparation

Understanding important parameters

Cluster visualization

Distance measure

Writing a custom distance measure

K-Means clustering with MapReduce

MapReduce in Apache Mahout

The map function

The reduce function

Additional clustering algorithms

Canopy clustering

Fuzzy K-Means

Streaming K-Means

The streaming step

The ball K-Means step

Spectral clustering

Dirichlet clustering

Text clustering

The vector space model and TF-IDF

N-grams and collocations

Preprocessing text with Lucene

Text clustering with the K-Means algorithm

Topic modeling

Optimizing clustering performance

Selecting the right features

Selecting the right algorithms

Selecting the right distance measure

Evaluating clusters

The initialization of centroids and the number of clusters

Tuning up parameters

The decision on infrastructure

Summary

3. Regression and Classification

Supervised learning

Target variables and predictor variables

Predictive analytics' techniques

Regression-based prediction

Model-based prediction

Tree-based prediction

Classification versus regression

Linear regression with Apache Spark

How does linear regression work?

A real-world example

The impact of smoking on mortality and different diseases

Linear regression with one variable and multiple variables

The integration of Apache Spark

Setting up Apache Spark with Apache Mahout

An example script

Distributed row matrix

An explanation of the code

Mahout references

The bias-variance trade-off

How to avoid over-fitting and under-fitting

Logistic regression with SGD

Logistic functions

Minimizing the cost function

Multinomial logistic regression versus binary logistic regression

A real-world example

An example script

Testing and evaluation

The confusion matrix

The area under the curve

The Naïve Bayes algorithm

The Bayes theorem

Text classification

Naïve assumption and its pros and cons in text classification

Improvements that Apache Mahout has made to the Naïve Bayes classification

A text classification coding example using the 20 newsgroups' example

Understand the 20 newsgroups' dataset

Text classification using Naïve Bayes – a MapReduce implementation with Hadoop

Text classification using Naïve Bayes – the Spark implementation

The Markov chain

Hidden Markov Model

A real-world example – developing a POS tagger using HMM supervised learning

POS tagging

HMM for POS tagging

HMM implementation in Apache Mahout

HMM supervised learning

The important parameters

Returns

The Baum Welch algorithm

A code example

The important parameters

The Viterbi evaluator

The Apache Mahout references

Summary

4. Recommendations

Collaborative versus content-based filtering

Content-based filtering

Collaborative filtering

Hybrid filtering

User-based recommenders

A real-world example – movie recommendations

Data models

The similarity measure

The neighborhood

Recommenders

Evaluation techniques

The IR-based method (precision/recall)

Addressing the issues with inaccurate recommendation results

Item-based recommenders

Item-based recommenders with Spark

Matrix factorization-based recommenders

Alternative least squares

Singular value decomposition

Algorithm usage tips and tricks

Summary

5. Apache Mahout in Production

Introduction

Apache Mahout with Hadoop

YARN with MapReduce 2.0

The resource manager

The application manager

A node manager

The application master

Containers

Managing storage with HDFS

The life cycle of a Hadoop application

Setting up Hadoop

Setting up Mahout in local mode

Prerequisites

Java installation

Setting up Mahout in Hadoop distributed mode

Prerequisites

Creating a Hadoop user

Passwordless SSH configuration

The pseudo-distributed mode

Configuration changes

Formatting the DFS filesystem

Starting the servers

The fully-distributed mode

Prerequisites

Host file configuration

Hadoop configuration changes

Formatting the DFS filesystem

Starting servers

Monitoring Hadoop

Commands/scripts

Data nodes

Node managers

Web UIs

Setting up Mahout with Hadoop's fully-distributed mode

Troubleshooting Hadoop

Optimization tips

Summary

6. Visualization

The significance of visualization in machine learning

D3.js

A visualization example for K-Means clustering

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部