万本电子书0元读

万本电子书0元读

顶部广告

Optimizing Hadoop for MapReduce电子书

售       价:¥

2人正在读 | 0人评论 9.8

作       者:Khaled Tannir

出  版  社:Packt Publishing

出版时间:2014-02-21

字       数:53.6万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This book is an examplebased tutorial that deals with Optimizing Hadoop for MapReduce job performance. If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code.
目录展开

Optimizing Hadoop for MapReduce

Table of Contents

Optimizing Hadoop for MapReduce

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Errata

Piracy

Questions

1. Understanding Hadoop MapReduce

The MapReduce model

An overview of Hadoop MapReduce

Hadoop MapReduce internals

Factors affecting the performance of MapReduce

Summary

2. An Overview of the Hadoop Parameters

Investigating the Hadoop parameters

The mapred-site.xml configuration file

The CPU-related parameters

The disk I/O related parameters

The memory-related parameters

The network-related parameters

The hdfs-site.xml configuration file

The core-site.xml configuration file

Hadoop MapReduce metrics

Performance monitoring tools

Using Chukwa to monitor Hadoop

Using Ganglia to monitor Hadoop

Using Nagios to monitor Hadoop

Using Apache Ambari to monitor Hadoop

Summary

3. Detecting System Bottlenecks

Performance tuning

Creating a performance baseline

Identifying resource bottlenecks

Identifying RAM bottlenecks

Identifying CPU bottlenecks

Identifying storage bottlenecks

Identifying network bandwidth bottlenecks

Summary

4. Identifying Resource Weaknesses

Identifying cluster weakness

Checking the Hadoop cluster node's health

Checking the input data size

Checking massive I/O and network traffic

Checking for insufficient concurrent tasks

Checking for CPU contention

Sizing your Hadoop cluster

Configuring your cluster correctly

Summary

5. Enhancing Map and Reduce Tasks

Enhancing map tasks

Input data and block size impact

Dealing with small and unsplittable files

Reducing spilled records during the Map phase

Calculating map tasks' throughput

Enhancing reduce tasks

Calculating reduce tasks' throughput

Improving Reduce execution phase

Tuning map and reduce parameters

Summary

6. Optimizing MapReduce Tasks

Using Combiners

Using compression

Using appropriate Writable types

Reusing types smartly

Optimizing mappers and reducers code

Summary

7. Best Practices and Recommendations

Hardware tuning and OS recommendations

The Hadoop cluster checklist

The Bios tuning checklist

OS configuration recommendations

Hadoop best practices and recommendations

Deploying Hadoop

Hadoop tuning recommendations

Using a MapReduce template class code

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部