万本电子书0元读

万本电子书0元读

顶部广告

Apache Mahout Clustering Designs电子书

售       价:¥

0人正在读 | 0人评论 9.8

作       者:Ashish Gupta

出  版  社:Packt Publishing

出版时间:2015-10-08

字       数:41.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Explore clustering algorithms used with Apache MahoutAbout This BookUse Mahout for clustering datasets and gain useful insightsExplore the different clustering algorithms used in day-to-day workA practical guide to create and evaluate your own clustering models using real world data sets Who This Book Is For This book is for developers who want to try out clustering on large datasets using Mahout. It will also be useful for those users who don’t have background in Mahout, but have knowledge of basic programming and are familiar with basics of machine learning and clustering. It will be helpful if you know about clustering techniques with some other tool.What You Will LearnExplore clustering algorithms and cluster evaluation techniquesLearn different types of clustering and distance measuring techniquesPerform clustering on your data using K-Means clusteringDiscover how canopy clustering is used as pre-process step for K-MeansUse the Fuzzy K-Means algorithm in Apache MahoutImplement Streaming K-Means clustering in MahoutLearn Spectral K-Means clustering implementation of Mahout In Detail As more and more organizations are discovering the use of big data analytics, interest in platforms that provide storage, computation, and analytic capabilities has increased. Apache Mahout caters to this need and paves the way for the implementation of complex algorithms in the field of machine learning to better analyse your data and get useful insights into it. Starting with the introduction of clustering algorithms, this book provides an insight into Apache Mahout and different algorithms it uses for clustering data. It provides a general introduction of the algorithms, such as K-Means, Fuzzy K-Means, StreamingKMeans, and how to use Mahout to cluster your data using a particular algorithm. You will study the different types of clustering and learn how to use Apache Mahout with real world data sets to implement and evaluate your clusters. This book will discuss about cluster improvement and visualization using Mahout APIs and also explore model-based clustering and topic modelling using Dirichlet process. Finally, you will learn how to build and deploy a model for production use.Style and approach This book is a hand's-on guide with examples using real-world datasets. Each chapter begins by explaining the algorithm in detail and follows up with showing how to use mahout for that algorithm using example data-sets.
目录展开

Apache Mahout Clustering Designs

Table of Contents

Apache Mahout Clustering Designs

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Understanding Clustering

The clustering concept

Application of clustering

Understanding distance measures

Understanding different clustering techniques

Hierarchical methods

The partitioning method

The density-based method

Probabilistic clustering

Algorithm support in Mahout

Clustering algorithms in Mahout

Installing Mahout

Building Mahout code using Maven

Setting up the development environment using Eclipse

Setting up Mahout for Windows users

Preparing data for use with clustering techniques

Summary

2. Understanding K-means Clustering

Learning K-means

Running K-means on Mahout

Dataset selection

Executing K-means

The clusterdump result

Visualizing clusters

Summary

3. Understanding Canopy Clustering

Running Canopy clustering on Mahout

The Canopy generation phase

The Canopy clustering phase

Running Canopy clustering

Using the Canopy output for K-means

Visualizing clusters

Working with CSV files

Summary

4. Understanding the Fuzzy K-means Algorithm Using Mahout

Learning Fuzzy K-means clustering

Running Fuzzy K-means on Mahout

Dataset

Creating a vector for the dataset

Vector reader

Visualizing clusters

Summary

5. Understanding Model-based Clustering

Learning model-based clustering

Understanding Dirichlet clustering

Topic modeling

Running LDA using Mahout

Dataset selection

Steps to execute CVB (LDA)

Summary

6. Understanding Streaming K-means

Learning Streaming K-means

The Streaming step

The BallKMeans step

Using Mahout for streaming K-means

Dataset selection

Converting CSV to a vector file

Running Streaming K-means

Summary

7. Spectral Clustering

Understanding spectral clustering

Affinity (similarity) graph

Getting graph Laplacian from the affinity matrix

Eigenvectors and eigenvalues

The spectral clustering algorithm

Normalized spectral clustering

Mahout implementation of spectral clustering

Summary

8. Improving Cluster Quality

Evaluating clusters

Extrinsic methods

Intrinsic methods

Using DistanceMeasure interface

Summary

9. Creating a Cluster Model for Production

Preparing the dataset

Launching the Mahout job on the cluster

Performance tuning for the job

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部