万本电子书0元读

万本电子书0元读

顶部广告

Learning YARN电子书

售       价:¥

7人正在读 | 0人评论 9.8

作       者:Akhil Arora

出  版  社:Packt Publishing

出版时间:2015-08-28

字       数:201.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This book is intended for those who want to understand what YARN is and how to efficiently use it for the resource management of large clusters. For cluster administrators, this book gives a detailed explanation of provisioning and managing YARN clusters. If you are a Java developer or an open source contributor, this book will help you to drill down the YARN architecture, write your own YARN applications and understand the application execution phases. This book will also help big data engineers explore YARN integration with real-time analytics technologies such as Spark and Storm.
目录展开

Learning YARN

Table of Contents

Learning YARN

Credits

About the Authors

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Starting with YARN Basics

Introduction to MapReduce v1

Shortcomings of MapReducev1

An overview of YARN components

ResourceManager

NodeManager

ApplicationMaster

Container

The YARN architecture

How YARN satisfies big data needs

Projects powered by YARN

Summary

2. Setting up a Hadoop-YARN Cluster

Starting with the basics

Supported platforms

Hardware requirements

Software requirements

Basic Linux commands / utilities

Sudo

Nano editor

Source

Jps

Netstat

Man

Preparing a node for a Hadoop-YARN cluster

Install Java

Create a Hadoop dedicated user and group

Disable firewall or open Hadoop ports

Configure the domain name resolution

Install SSH and configure passwordless SSH from the master to all slaves

The Hadoop-YARN single node installation

Prerequisites

Installation steps

Step 1 – Download and extract the Hadoop bundle

Step 2 – Configure the environment variables

Step 3 – Configure the Hadoop configuration files

The core-site.xml file

The hdfs-site.xml file

The mapred-site.xml file

The yarn-site.xml file

The hadoop-env.sh and yarn-env.sh files

The slaves file

Step 4 – Format NameNode

Step 5 – Start Hadoop daemons

An overview of web user interfaces

Run a sample application

The Hadoop-YARN multi-node installation

Prerequisites

Installation steps

Step 1 – Configure the master node as a single-node Hadoop-YARN installation

Step 2 – Copy the Hadoop folder to all the slave nodes

Step 3 – Configure environment variables on slave nodes

Step 4 – Format NameNode

Step 5 – Start Hadoop daemons

An overview of the Hortonworks and Cloudera installations

Summary

3. Administering a Hadoop-YARN Cluster

Using the Hadoop-YARN commands

The user commands

Jar

Application

Command options

Sample output

Node

Command options

Sample output

Logs

Command options

Classpath

Version

Administration commands

ResourceManager / NodeManager / ProxyServer

RMAdmin

Command options

DaemonLog

Command options

Configuring the Hadoop-YARN services

The ResourceManager service

The NodeManager service

The Timeline server

The web application proxy server

Ports summary

Managing the Hadoop-YARN services

Managing service logs

Managing pid files

Monitoring the YARN services

JMX monitoring

The ResourceManager JMX beans

The NodeManager JMX beans

Ganglia monitoring

Ganglia daemons

Integrating Ganglia with Hadoop

Understanding ResourceManager's High Availability

Architecture

Failover mechanisms

Configuring ResourceManager's High Availability

Define nodes

The RM state store mechanism

The failover proxy provider

Automatic failover

High Availability admin commands

Monitoring NodeManager's health

The health checker script

Summary

4. Executing Applications Using YARN

Understanding application execution flow

Phase 1 – Application initialization and submission

Phase 2 – Allocate memory and start ApplicationMaster

Phase 3 – ApplicationMaster registration and resource allocation

Phase 4 – Launch and monitor containers

Phase 5 – Application progress report

Phase 6 – Application completion

Submitting a sample MapReduce application

Submitting an application to the cluster

Updates in the ResourceManager web UI

Understanding the application process

Tracking application details

The ApplicationMaster process

Cluster nodes information

Node's container list

YARN child processes

Application details after completion

Handling failures in YARN

The container failure

The NodeManager failure

The ResourceManager failure

YARN application logging

Services logs

Application logs

Summary

5. Understanding YARN Life Cycle Management

An introduction to state management analogy

The ResourceManager's view

View 1 – Node

View 2 – Application

View 3 – An application attempt

View 4 – Container

The NodeManager's view

View 1 – Application

View 2 – Container

View 3 – A localized resource

Analyzing transitions through logs

NodeManager registration with ResourceManager

Application submission

Container resource allocation

Resource localization

Summary

6. Migrating from MRv1 to MRv2

Introducing MRv1 and MRv2

High-level changes from MRv1 to MRv2

The evolution of the MRApplicationMaster service

Resource capability

Pluggable shuffle

Hierarchical queues and fair scheduler

Task execution as containers

The migration steps from MRv1 to MRv2

Configuration changes

The binary / source compatibility

Running and monitoring MRv1 apps on YARN

Summary

7. Writing Your Own YARN Applications

An introduction to the YARN API

YARNConfiguration

Load resources

Final properties

Variable expansion

ApplicationSubmissionContext

ContainerLaunchContext

Communication protocols

ApplicationClientProtocol

ApplicationMasterProtocol

ContainerManagementProtocol

ApplicationHistoryProtocol

YARN client API

Writing your own application

Step 1 – Create a new project and add Hadoop-YARN JAR files

Step 2 – Define the ApplicationMaster and client classes

Define an ApplicationMaster

Define a YARN client

Step 3 – Export the project and copy resources

Step 4 – Run the application using bin or the YARN command

Summary

8. Dive Deep into YARN Components

Understanding ResourceManager

The client and admin interfaces

The core interfaces

The NodeManager interfaces

The security and token managers

Understanding NodeManager

Status updates

State and health management

Container management

The security and token managers

The YARN Timeline server

The web application proxy server

YARN Scheduler Load Simulator (SLS)

Handling resource localization in YARN

Resource localization terminologies

The resource localization directory structure

Summary

9. Exploring YARN REST Services

Introduction to YARN REST services

HTTP request and response

Successful response

Response with an error

ResourceManager REST APIs

The cluster summary

Scheduler details

Nodes

Applications

NodeManager REST APIs

The node summary

Applications

Containers

MapReduce ApplicationMaster REST APIs

ApplicationMaster summary

Jobs

Tasks

MapReduce HistoryServer REST APIs

How to access REST services

RESTClient plugins

Curl command

Java API

Summary

10. Scheduling YARN Applications

An introduction to scheduling in YARN

An overview of queues

Types of queues

CapacityScheduler Queue (CSQueue)

FairScheduler Queue (FSQueue)

An introduction to schedulers

Fair scheduler

Hierarchical queues

Schedulable

Scheduling policy

Configuring a fair scheduler

CapacityScheduler

Configuring CapacityScheduler

Summary

11. Enabling Security in YARN

Adding security to a YARN cluster

Using a dedicated user group for Hadoop-YARN daemons

Validating permissions to YARN directories

Enabling the HTTPS protocol

Enabling authorization using Access Control Lists

Enabling authentication using Kerberos

Working with ACLs

Defining an ACL value

Type of ACLs

The administration ACL

The service-level ACL

The queue ACL

The application ACL

Other security frameworks

Apache Ranger

Apache Knox

Summary

12. Real-time Data Analytics Using YARN

The integration of Spark with YARN

Running Spark on YARN

The integration of Storm with YARN

Running Storm on YARN

Create a Zookeeper quorum

Download, extract, and prepare the Storm bundle

Copy Storm ZIP to HDFS

Configuring the storm.yaml file

Launching the Storm-YARN cluster

Managing Storm on YARN

The integration of HAMA and Giraph with YARN

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部