售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Learning YARN
Table of Contents
Learning YARN
Credits
About the Authors
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Starting with YARN Basics
Introduction to MapReduce v1
Shortcomings of MapReducev1
An overview of YARN components
ResourceManager
NodeManager
ApplicationMaster
Container
The YARN architecture
How YARN satisfies big data needs
Projects powered by YARN
Summary
2. Setting up a Hadoop-YARN Cluster
Starting with the basics
Supported platforms
Hardware requirements
Software requirements
Basic Linux commands / utilities
Sudo
Nano editor
Source
Jps
Netstat
Man
Preparing a node for a Hadoop-YARN cluster
Install Java
Create a Hadoop dedicated user and group
Disable firewall or open Hadoop ports
Configure the domain name resolution
Install SSH and configure passwordless SSH from the master to all slaves
The Hadoop-YARN single node installation
Prerequisites
Installation steps
Step 1 – Download and extract the Hadoop bundle
Step 2 – Configure the environment variables
Step 3 – Configure the Hadoop configuration files
The core-site.xml file
The hdfs-site.xml file
The mapred-site.xml file
The yarn-site.xml file
The hadoop-env.sh and yarn-env.sh files
The slaves file
Step 4 – Format NameNode
Step 5 – Start Hadoop daemons
An overview of web user interfaces
Run a sample application
The Hadoop-YARN multi-node installation
Prerequisites
Installation steps
Step 1 – Configure the master node as a single-node Hadoop-YARN installation
Step 2 – Copy the Hadoop folder to all the slave nodes
Step 3 – Configure environment variables on slave nodes
Step 4 – Format NameNode
Step 5 – Start Hadoop daemons
An overview of the Hortonworks and Cloudera installations
Summary
3. Administering a Hadoop-YARN Cluster
Using the Hadoop-YARN commands
The user commands
Jar
Application
Command options
Sample output
Node
Command options
Sample output
Logs
Command options
Classpath
Version
Administration commands
ResourceManager / NodeManager / ProxyServer
RMAdmin
Command options
DaemonLog
Command options
Configuring the Hadoop-YARN services
The ResourceManager service
The NodeManager service
The Timeline server
The web application proxy server
Ports summary
Managing the Hadoop-YARN services
Managing service logs
Managing pid files
Monitoring the YARN services
JMX monitoring
The ResourceManager JMX beans
The NodeManager JMX beans
Ganglia monitoring
Ganglia daemons
Integrating Ganglia with Hadoop
Understanding ResourceManager's High Availability
Architecture
Failover mechanisms
Configuring ResourceManager's High Availability
Define nodes
The RM state store mechanism
The failover proxy provider
Automatic failover
High Availability admin commands
Monitoring NodeManager's health
The health checker script
Summary
4. Executing Applications Using YARN
Understanding application execution flow
Phase 1 – Application initialization and submission
Phase 2 – Allocate memory and start ApplicationMaster
Phase 3 – ApplicationMaster registration and resource allocation
Phase 4 – Launch and monitor containers
Phase 5 – Application progress report
Phase 6 – Application completion
Submitting a sample MapReduce application
Submitting an application to the cluster
Updates in the ResourceManager web UI
Understanding the application process
Tracking application details
The ApplicationMaster process
Cluster nodes information
Node's container list
YARN child processes
Application details after completion
Handling failures in YARN
The container failure
The NodeManager failure
The ResourceManager failure
YARN application logging
Services logs
Application logs
Summary
5. Understanding YARN Life Cycle Management
An introduction to state management analogy
The ResourceManager's view
View 1 – Node
View 2 – Application
View 3 – An application attempt
View 4 – Container
The NodeManager's view
View 1 – Application
View 2 – Container
View 3 – A localized resource
Analyzing transitions through logs
NodeManager registration with ResourceManager
Application submission
Container resource allocation
Resource localization
Summary
6. Migrating from MRv1 to MRv2
Introducing MRv1 and MRv2
High-level changes from MRv1 to MRv2
The evolution of the MRApplicationMaster service
Resource capability
Pluggable shuffle
Hierarchical queues and fair scheduler
Task execution as containers
The migration steps from MRv1 to MRv2
Configuration changes
The binary / source compatibility
Running and monitoring MRv1 apps on YARN
Summary
7. Writing Your Own YARN Applications
An introduction to the YARN API
YARNConfiguration
Load resources
Final properties
Variable expansion
ApplicationSubmissionContext
ContainerLaunchContext
Communication protocols
ApplicationClientProtocol
ApplicationMasterProtocol
ContainerManagementProtocol
ApplicationHistoryProtocol
YARN client API
Writing your own application
Step 1 – Create a new project and add Hadoop-YARN JAR files
Step 2 – Define the ApplicationMaster and client classes
Define an ApplicationMaster
Define a YARN client
Step 3 – Export the project and copy resources
Step 4 – Run the application using bin or the YARN command
Summary
8. Dive Deep into YARN Components
Understanding ResourceManager
The client and admin interfaces
The core interfaces
The NodeManager interfaces
The security and token managers
Understanding NodeManager
Status updates
State and health management
Container management
The security and token managers
The YARN Timeline server
The web application proxy server
YARN Scheduler Load Simulator (SLS)
Handling resource localization in YARN
Resource localization terminologies
The resource localization directory structure
Summary
9. Exploring YARN REST Services
Introduction to YARN REST services
HTTP request and response
Successful response
Response with an error
ResourceManager REST APIs
The cluster summary
Scheduler details
Nodes
Applications
NodeManager REST APIs
The node summary
Applications
Containers
MapReduce ApplicationMaster REST APIs
ApplicationMaster summary
Jobs
Tasks
MapReduce HistoryServer REST APIs
How to access REST services
RESTClient plugins
Curl command
Java API
Summary
10. Scheduling YARN Applications
An introduction to scheduling in YARN
An overview of queues
Types of queues
CapacityScheduler Queue (CSQueue)
FairScheduler Queue (FSQueue)
An introduction to schedulers
Fair scheduler
Hierarchical queues
Schedulable
Scheduling policy
Configuring a fair scheduler
CapacityScheduler
Configuring CapacityScheduler
Summary
11. Enabling Security in YARN
Adding security to a YARN cluster
Using a dedicated user group for Hadoop-YARN daemons
Validating permissions to YARN directories
Enabling the HTTPS protocol
Enabling authorization using Access Control Lists
Enabling authentication using Kerberos
Working with ACLs
Defining an ACL value
Type of ACLs
The administration ACL
The service-level ACL
The queue ACL
The application ACL
Other security frameworks
Apache Ranger
Apache Knox
Summary
12. Real-time Data Analytics Using YARN
The integration of Spark with YARN
Running Spark on YARN
The integration of Storm with YARN
Running Storm on YARN
Create a Zookeeper quorum
Download, extract, and prepare the Storm bundle
Copy Storm ZIP to HDFS
Configuring the storm.yaml file
Launching the Storm-YARN cluster
Managing Storm on YARN
The integration of HAMA and Giraph with YARN
Summary
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜