万本电子书0元读

万本电子书0元读

顶部广告

Hadoop 2.x Administration Cookbook电子书

售       价:¥

5人正在读 | 0人评论 9.8

作       者:Gurmukh Singh

出  版  社:Packt Publishing

出版时间:2017-05-26

字       数:195.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Over 100 practical recipes to help you become an expert Hadoop administrator About This Book ? Become an expert Hadoop administrator and perform tasks to optimize your Hadoop Cluster ? Import and export data into Hive and use Oozie to manage workflow. ? Practical recipes will help you plan and secure your Hadoop cluster, and make it highly available Who This Book Is For If you are a system administrator with a basic understanding of Hadoop and you want to get into Hadoop administration, this book is for you. It’s also ideal if you are a Hadoop administrator who wants a quick reference guide to all the Hadoop administration-related tasks and solutions to commonly occurring problems What You Will Learn ? Set up the Hadoop architecture to run a Hadoop cluster smoothly ? Maintain a Hadoop cluster on HDFS, YARN, and MapReduce ? Understand high availability with Zookeeper and Journal Node ? Configure Flume for data ingestion and Oozie to run various workflows ? Tune the Hadoop cluster for optimal performance ? Schedule jobs on a Hadoop cluster using the Fair and Capacity scheduler ? Secure your cluster and troubleshoot it for various common pain points In Detail Hadoop enables the distributed storage and processing of large datasets across clusters of computers. Learning how to administer Hadoop is crucial to exploit its unique features. With this book, you will be able to overcome common problems encountered in Hadoop administration. The book begins with laying the foundation by showing you the steps needed to set up a Hadoop cluster and its various nodes. You will get a better understanding of how to maintain Hadoop cluster, especially on the HDFS layer and using YARN and MapReduce. Further on, you will explore durability and high availability of a Hadoop cluster. You’ll get a better understanding of the schedulers in Hadoop and how to configure and use them for your tasks. You will also get hands-on experience with the backup and recovery options and the performance tuning aspects of Hadoop. Finally, you will get a better understanding of troubleshooting, diagnostics, and best practices in Hadoop administration. By the end of this book, you will have a proper understanding of working with Hadoop clusters and will also be able to secure, encrypt it, and configure auditing for your Hadoop clusters. Style and approach This book contains short recipes that will help you run a Hadoop cluster efficiently. The recipes are solutions to real-life problems that administrators encounter while working with a Hadoop cluster
目录展开

Hadoop 2.x Administration Cookbook

Table of Contents

Hadoop 2.x Administration Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Hadoop Architecture and Deployment

Introduction

Overview of Hadoop Architecture

Building and compiling Hadoop

Getting ready

How to do it...

How it works...

Installation methods

Getting ready

How to do it...

How it works...

Setting up host resolution

Getting ready

How to do it...

How it works...

Installing a single-node cluster - HDFS components

Getting ready

How to do it...

How it works...

There's more...

Setting up ResourceManager and NodeManager

Installing a single-node cluster - YARN components

Getting ready

How to do it...

How it works...

There's more...

See also

Installing a multi-node cluster

Getting ready

How to do it...

How it works...

Configuring the Hadoop Gateway node

Getting ready

How to do it...

How it works...

See also

Decommissioning nodes

Getting ready

How to do it...

How it works...

See also

Adding nodes to the cluster

Getting ready

How to do it...

How it works...

There's more...

2. Maintaining Hadoop Cluster HDFS

Introduction

Overview of HDFS

Configuring HDFS block size

Getting ready

How to do it...

How it works...

Setting up Namenode metadata location

Getting ready

How to do it...

How it works...

Loading data in HDFS

Getting ready

How to do it...

How it works...

Configuring HDFS replication

Getting ready

How to do it...

How it works...

See also

HDFS balancer

Getting ready

How to do it...

How it works...

Quota configuration

Getting ready

How to do it...

How it works...

HDFS health and FSCK

Getting ready

How to do it...

How it works...

See also

Configuring rack awareness

Getting ready

How to do it...

How it works...

See also

Recycle or trash bin configuration

Getting ready

How to do it...

How it works...

There's more...

Distcp usage

Getting ready

How to do it...

How it works...

Control block report storm

Getting ready

How to do it...

How it works...

Configuring Datanode heartbeat

Getting ready

How to do it...

How it works...

3. Maintaining Hadoop Cluster – YARN and MapReduce

Introduction

Running a simple MapReduce program

Getting ready

How to do it...

Hadoop streaming

Getting ready

How to do it...

How it works...

Configuring YARN history server

Getting ready

How to do it...

How it works...

There's more...

Job history web interface and metrics

Getting ready

How to do it...

How it works...

Configuring ResourceManager components

Getting ready

How to do it...

How it works...

There's more...

See also

YARN containers and resource allocations

Getting ready

How to do it...

How it works...

There's more...

See also

ResourceManager Web UI and JMX metrics

Getting ready

How to do it...

How it works...

Preserving ResourceManager states

Getting ready

How to do it...

How it works...

There's more...

4. High Availability

Introduction

Namenode HA using shared storage

Getting ready

How to do it...

How it works...

See also

ZooKeeper configuration

Getting ready

How to do it...

How it works...

Namenode HA using Journal node

Getting ready

How to do it...

How it works...

Resourcemanager HA using ZooKeeper

Getting ready

How to do it...

How it works…

Rolling upgrade with HA

Getting ready

How to do it...

How it works...

Configure shared cache manager

Getting ready

How to do it...

There's more...

See also

Configure HDFS cache

Getting ready

How to do it...

How it works...

See also

HDFS snapshots

Getting ready

How to do it...

How it works...

Configuring storage based policies

Getting ready

How to do it...

How it works...

Configuring HA for Edge nodes

Getting ready

How to do it...

How it works...

5. Schedulers

Introduction

Configuring users and groups

Getting ready

How to do it...

How it works...

See also

Fair Scheduler configuration

Getting ready

How to do it...

How it works...

Fair Scheduler pools

Getting ready

How to do it...

How it works...

Configuring job queues

Getting ready

How to do it...

How it works...

See also

Job queue ACLs

Getting ready

How to do it...

How it works...

See also

Configuring Capacity Scheduler

Getting ready

How to do it...

How it works...

See also

Queuing mappings in Capacity Scheduler

Getting ready

How to do it...

How it works...

YARN and Mapred commands

Getting ready

How to do it...

How it works...

YARN label-based scheduling

Getting ready

How to do it...

How it works...

YARN SLS

Getting ready

How to do it...

How it works...

6. Backup and Recovery

Introduction

Initiating Namenode saveNamespace

Getting ready

How to do it...

How it works...

Using HDFS Image Viewer

Getting ready

How to do it...

How it works...

Fetching parameters which are in-effect

Getting ready

How to do it...

How it works...

Configuring HDFS and YARN logs

Getting ready

How to do it...

How it works...

See also

Backing up and recovering Namenode

Getting ready

How to do it...

How it works...

See also

Configuring Secondary Namenode

Getting ready

How to do it...

How it works…

Promoting Secondary Namenode to Primary

Getting ready

How to do it...

How it works...

See also

Namenode recovery

Getting ready

How to do it...

How it works...

Namenode roll edits – online mode

Getting ready

How to do it...

How it works...

Namenode roll edits – offline mode

Getting ready

How to do it...

How it works...

Datanode recovery – disk full

Getting ready

How to do it...

How it works...

Configuring NFS gateway to serve HDFS

Getting ready

How to do it...

How it works...

Recovering deleted files

Getting ready

How to do it...

How it works...

7. Data Ingestion and Workflow

Introduction

Hive server modes and setup

Getting ready

How to do it...

How it works...

Using MySQL for Hive metastore

How to do it…

How it works...

Operating Hive with ZooKeeper

Getting ready

How to do it...

How it works...

Loading data into Hive

Getting ready

How to do it...

How it works...

See also

Partitioning and Bucketing in Hive

Getting ready

How to do it...

How it works...

See also

Hive metastore database

Getting ready

How to do it...

How it works...

See also

Designing Hive with credential store

Getting ready

How to do it...

How it works...

Configuring Flume

Getting ready

How to do it...

How it works...

Configure Oozie and workflows

Getting ready

How to do it...

How it works...

8. Performance Tuning

Tuning the operating system

Getting ready

How to do it...

How it works...

See also

Tuning the disk

Getting ready

How to do it...

How it works...

Tuning the network

Getting ready

How to do it...

How it works...

Tuning HDFS

Getting ready

How to do it...

How it works...

Tuning Namenode

Getting ready

How to do it...

There's more...

See also

Tuning Datanode

Getting ready

How to do it...

How it works...

See also

Configuring YARN for performance

Getting ready

How to do it...

How it works...

Configuring MapReduce for performance

Getting ready

How to do it...

How it works...

Hive performance tuning

Getting ready

How to do it...

There's more...

How it works...

Benchmarking Hadoop cluster

Getting ready

How to do it...

Benchmark 1--Testing HDFS with TestDFSIO

Benchmark 2--Stress testing Namenode

Benchmark 3--MapReduce testing by generating small files

Benchmark 4--TeraGen, TeraSort, and TeraValidate benchmarks

There's more...

How it works...

9. HBase Administration

Introduction

Setting up single node HBase cluster

Getting ready

How to do it...

How it works...

Setting up multi-node HBase cluster

Getting ready

How to do it...

How it works...

Inserting data into HBase

Getting ready

How to do it...

How it works...

Integration with Hive

Getting ready

How to do it...

How it works...

See also

HBase administration commands

Getting ready

How to do it...

How it works...

See also

HBase backup and restore

Getting ready

How to do it...

How it works...

Tuning HBase

Getting ready

How to do it...

How it works...

HBase upgrade

Getting ready

How to do it...

How it works...

Migrating data from MySQL to HBase using Sqoop

Getting ready

How to do it...

10. Cluster Planning

Introduction

Disk space calculations

Getting ready

How to do it...

How it works...

Nodes needed in the cluster

Getting ready

How to do it...

How it works...

See also

Memory requirements

Getting ready

How to do it...

How it works...

See also

Sizing the cluster as per SLA

Getting ready

How to do it...

How it works...

See also

Network design

Getting ready

How to do it...

How it works...

Estimating the cost of the Hadoop cluster

How to do it...

How it works...

Hardware and software options

How it works...

11. Troubleshooting, Diagnostics, and Best Practices

Introduction

Namenode troubleshooting

Getting ready

How to do it...

How it works...

See also

Datanode troubleshooting

Getting ready

How to do it...

How it works...

See also

Resourcemanager troubleshooting

Getting ready

How to do it…

How it works...

See also

Diagnose communication issues

Getting ready

How to do it...

How it works...

Parse logs for errors

Getting ready

How to do it...

How it works...

Hive troubleshooting

Getting ready

How to do it...

How it works...

See also

HBase troubleshooting

Getting ready

How to do it...

How it works...

Hadoop best practices

How it works...

12. Security

Introduction

Encrypting disk using LUKS

Getting ready

How to do it...

How it works...

See also

Configuring Hadoop users

Getting ready

How to do it...

How it works...

HDFS encryption at Rest

Getting ready

How to do it...

How it works...

Configuring SSL in Hadoop

Getting ready

How to do it...

How it works...

See also

In-transit encryption

Getting ready

How to do it...

There's more...

See also

Enabling service level authorization

Getting ready

How to do it...

How it works...

See also

Securing ZooKeeper

Getting ready

How to do it...

How it works...

Configuring auditing

Getting ready

How to do it...

How it works...

Configuring Kerberos server

Getting ready

How to do it...

How it works...

Configuring and enabling Kerberos for Hadoop

Getting ready

How to do it...

How it works...

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部