万本电子书0元读

万本电子书0元读

顶部广告

Hadoop MapReduce v2 Cookbook - Second Edition电子书

售       价:¥

4人正在读 | 0人评论 9.8

作       者:Thilina Gunarathne

出  版  社:Packt Publishing

出版时间:2015-02-25

字       数:182.5万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
If you are a Big Data enthusiast and wish to use Hadoop v2 to solve your problems, then this book is for you. This book is for Java programmers with little to moderate knowledge of Hadoop MapReduce. This is also a one-stop reference for developers and system admins who want to quickly get up to speed with using Hadoop v2. It would be helpful to have a basic knowledge of software development using Java and a basic working knowledge of Linux.
目录展开

Hadoop MapReduce v2 Cookbook Second Edition

Table of Contents

Hadoop MapReduce v2 Cookbook Second Edition

Credits

About the Author

Acknowledgments

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Getting Started with Hadoop v2

Introduction

Hadoop Distributed File System – HDFS

Hadoop YARN

Hadoop MapReduce

Hadoop installation modes

Setting up Hadoop v2 on your local machine

Getting ready

How to do it...

How it works...

Writing a WordCount MapReduce application, bundling it, and running it using the Hadoop local mode

Getting ready

How to do it...

How it works...

There's more...

See also

Adding a combiner step to the WordCount MapReduce program

How to do it...

How it works...

There's more...

Setting up HDFS

Getting ready

How to do it...

See also

Setting up Hadoop YARN in a distributed cluster environment using Hadoop v2

Getting ready

How to do it...

How it works...

See also

Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution

Getting ready

How to do it...

There's more...

HDFS command-line file operations

Getting ready

How to do it...

How it works...

There's more...

Running the WordCount program in a distributed cluster environment

Getting ready

How to do it...

How it works...

There's more...

Benchmarking HDFS using DFSIO

Getting ready

How to do it...

How it works...

There's more...

Benchmarking Hadoop MapReduce using TeraSort

Getting ready

How to do it...

How it works...

2. Cloud Deployments – Using Hadoop YARN on Cloud Environments

Introduction

Running Hadoop MapReduce v2 computations using Amazon Elastic MapReduce

Getting ready

How to do it...

See also

Saving money using Amazon EC2 Spot Instances to execute EMR job flows

How to do it...

There's more...

See also

Executing a Pig script using EMR

How to do it...

There's more...

Starting a Pig interactive session

Executing a Hive script using EMR

How to do it...

There's more...

Starting a Hive interactive session

See also

Creating an Amazon EMR job flow using the AWS Command Line Interface

Getting ready

How to do it...

There's more...

See also

Deploying an Apache HBase cluster on Amazon EC2 using EMR

Getting ready

How to do it...

See also

Using EMR bootstrap actions to configure VMs for the Amazon EMR jobs

How to do it...

There's more...

Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment

How to do it...

How it works...

See also

3. Hadoop Essentials – Configurations, Unit Tests, and Other APIs

Introduction

Optimizing Hadoop YARN and MapReduce configurations for cluster deployments

Getting ready

How to do it...

How it works...

There's more...

Shared user Hadoop clusters – using Fair and Capacity schedulers

How to do it...

How it works...

There's more...

Setting classpath precedence to user-provided JARs

How to do it...

How it works...

Speculative execution of straggling tasks

How to do it...

There's more...

Unit testing Hadoop MapReduce applications using MRUnit

Getting ready

How to do it...

See also

Integration testing Hadoop MapReduce applications using MiniYarnCluster

Getting ready

How to do it...

See also

Adding a new DataNode

Getting ready

How to do it...

There's more...

Rebalancing HDFS

See also

Decommissioning DataNodes

How to do it...

How it works...

See also

Using multiple disks/volumes and limiting HDFS disk usage

How to do it...

Setting the HDFS block size

How to do it...

There's more...

See also

Setting the file replication factor

How to do it...

How it works...

There's more...

See also

Using the HDFS Java API

How to do it...

How it works...

There's more...

Configuring the FileSystem object

Retrieving the list of data blocks of a file

4. Developing Complex Hadoop MapReduce Applications

Introduction

Choosing appropriate Hadoop data types

How to do it...

There's more...

See also

Implementing a custom Hadoop Writable data type

How to do it...

How it works...

There's more...

See also

Implementing a custom Hadoop key type

How to do it...

How it works...

See also

Emitting data of different value types from a Mapper

How to do it...

How it works...

There's more...

See also

Choosing a suitable Hadoop InputFormat for your input data format

How to do it...

How it works...

There's more...

See also

Adding support for new input data formats – implementing a custom InputFormat

How to do it...

How it works...

There's more...

See also

Formatting the results of MapReduce computations – using Hadoop OutputFormats

How to do it...

How it works...

There's more...

Writing multiple outputs from a MapReduce computation

How to do it...

How it works...

Using multiple input data types and multiple Mapper implementations in a single MapReduce application

See also

Hadoop intermediate data partitioning

How to do it...

How it works...

There's more...

TotalOrderPartitioner

KeyFieldBasedPartitioner

Secondary sorting – sorting Reduce input values

How to do it...

How it works...

See also

Broadcasting and distributing shared resources to tasks in a MapReduce job – Hadoop DistributedCache

How to do it...

How it works...

There's more...

Distributing archives using the DistributedCache

Adding resources to the DistributedCache from the command line

Adding resources to the classpath using the DistributedCache

Using Hadoop with legacy applications – Hadoop streaming

How to do it...

How it works...

There's more...

See also

Adding dependencies between MapReduce jobs

How to do it...

How it works...

There's more...

Hadoop counters to report custom metrics

How to do it...

How it works...

5. Analytics

Introduction

Simple analytics using MapReduce

Getting ready

How to do it...

How it works...

There's more...

Performing GROUP BY using MapReduce

Getting ready

How to do it...

How it works...

Calculating frequency distributions and sorting using MapReduce

Getting ready

How to do it...

How it works...

There's more...

Plotting the Hadoop MapReduce results using gnuplot

Getting ready

How to do it...

How it works...

There's more...

Calculating histograms using MapReduce

Getting ready

How to do it...

How it works...

Calculating Scatter plots using MapReduce

Getting ready

How to do it...

How it works...

Parsing a complex dataset with Hadoop

Getting ready

How to do it...

How it works...

There's more...

Joining two datasets using MapReduce

Getting ready

How to do it...

How it works...

6. Hadoop Ecosystem – Apache Hive

Introduction

Getting started with Apache Hive

How to do it...

See also

Creating databases and tables using Hive CLI

Getting ready

How to do it...

How it works...

There's more...

Hive data types

Hive external tables

Using the describe formatted command to inspect the metadata of Hive tables

Simple SQL-style data querying using Apache Hive

Getting ready

How to do it...

How it works...

There's more...

Using Apache Tez as the execution engine for Hive

See also

Creating and populating Hive tables and views using Hive query results

Getting ready

How to do it...

Utilizing different storage formats in Hive - storing table data using ORC files

Getting ready

How to do it...

How it works...

Using Hive built-in functions

Getting ready

How to do it...

How it works...

There's more...

See also

Hive batch mode - using a query file

How to do it...

How it works...

There's more...

See also

Performing a join with Hive

Getting ready

How to do it...

How it works...

See also

Creating partitioned Hive tables

Getting ready

How to do it...

Writing Hive User-defined Functions (UDF)

Getting ready

How to do it...

How it works...

HCatalog – performing Java MapReduce computations on data mapped to Hive tables

Getting ready

How to do it...

How it works...

HCatalog – writing data to Hive tables from Java MapReduce computations

Getting ready

How to do it...

How it works...

7. Hadoop Ecosystem II – Pig, HBase, Mahout, and Sqoop

Introduction

Getting started with Apache Pig

Getting ready

How to do it...

How it works...

There's more...

See also

Joining two datasets using Pig

How to do it...

How it works...

There's more...

Accessing a Hive table data in Pig using HCatalog

Getting ready

How to do it...

There's more...

See also

Getting started with Apache HBase

Getting ready

How to do it...

There's more...

See also

Data random access using Java client APIs

Getting ready

How to do it...

How it works...

Running MapReduce jobs on HBase

Getting ready

How to do it...

How it works...

Using Hive to insert data into HBase tables

Getting ready

How to do it...

See also

Getting started with Apache Mahout

How to do it...

How it works...

There's more...

Running K-means with Mahout

Getting ready

How to do it...

How it works...

Importing data to HDFS from a relational database using Apache Sqoop

Getting ready

How to do it...

Exporting data from HDFS to a relational database using Apache Sqoop

Getting ready

How to do it...

8. Searching and Indexing

Introduction

Generating an inverted index using Hadoop MapReduce

Getting ready

How to do it...

How it works...

There's more...

Outputting a random accessible indexed InvertedIndex

See also

Intradomain web crawling using Apache Nutch

Getting ready

How to do it...

See also

Indexing and searching web documents using Apache Solr

Getting ready

How to do it...

How it works...

See also

Configuring Apache HBase as the backend data store for Apache Nutch

Getting ready

How to do it...

How it works...

See also

Whole web crawling with Apache Nutch using a Hadoop/HBase cluster

Getting ready

How to do it...

How it works...

See also

Elasticsearch for indexing and searching

Getting ready

How to do it...

How it works...

See also

Generating the in-links graph for crawled web pages

Getting ready

How to do it...

How it works...

See also

9. Classifications, Recommendations, and Finding Relationships

Introduction

Performing content-based recommendations

How to do it...

How it works...

There's more...

Classification using the naïve Bayes classifier

How to do it...

How it works...

Assigning advertisements to keywords using the Adwords balance algorithm

How to do it...

How it works...

There's more...

10. Mass Text Data Processing

Introduction

Data preprocessing using Hadoop streaming and Python

Getting ready

How to do it...

How it works...

There's more...

See also

De-duplicating data using Hadoop streaming

Getting ready

How to do it...

How it works...

See also

Loading large datasets to an Apache HBase data store – importtsv and bulkload

Getting ready

How to do it…

How it works...

There's more...

Data de-duplication using HBase

See also

Creating TF and TF-IDF vectors for the text data

Getting ready

How to do it…

How it works…

See also

Clustering text data using Apache Mahout

Getting ready

How to do it...

How it works...

See also

Topic discovery using Latent Dirichlet Allocation (LDA)

Getting ready

How to do it…

How it works…

See also

Document classification using Mahout Naive Bayes Classifier

Getting ready

How to do it...

How it works...

See also

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部