万本电子书0元读

万本电子书0元读

顶部广告

Hadoop Essentials电子书

售       价:¥

3人正在读 | 0人评论 9.8

作       者:Shiva Achari

出  版  社:Packt Publishing

出版时间:2015-04-29

字       数:215.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. This book is also meant for Hadoop professionals who want to find solutions to the different challenges they come across in their Hadoop projects.
目录展开

Hadoop Essentials

Table of Contents

Hadoop Essentials

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Introduction to Big Data and Hadoop

V's of big data

Volume

Velocity

Variety

Understanding big data

NoSQL

Types of NoSQL databases

Analytical database

Who is creating big data?

Big data use cases

Big data use case patterns

Big data as a storage pattern

Big data as a data transformation pattern

Big data for a data analysis pattern

Big data for data in a real-time pattern

Big data for a low latency caching pattern

Hadoop

Hadoop history

Description

Advantages of Hadoop

Uses of Hadoop

Hadoop ecosystem

Apache Hadoop

Hadoop distributions

Pillars of Hadoop

Data access components

Data storage component

Data ingestion in Hadoop

Streaming and real-time analysis

Summary

2. Hadoop Ecosystem

Traditional systems

Database trend

The Hadoop use cases

Hadoop's basic data flow

Hadoop integration

The Hadoop ecosystem

Distributed filesystem

HDFS

Distributed programming

NoSQL databases

Apache HBase

Data ingestion

Service programming

Apache YARN

Apache Zookeeper

Scheduling

Data analytics and machine learning

System management

Apache Ambari

Summary

3. Pillars of Hadoop – HDFS, MapReduce, and YARN

HDFS

Features of HDFS

HDFS architecture

NameNode

DataNode

Checkpoint NameNode or Secondary NameNode

BackupNode

Data storage in HDFS

Read pipeline

Write pipeline

Rack awareness

Advantages of rack awareness in HDFS

HDFS federation

Limitations of HDFS 1.0

The benefit of HDFS federation

HDFS ports

HDFS commands

MapReduce

The MapReduce architecture

JobTracker

TaskTracker

Serialization data types

The Writable interface

WritableComparable interface

The MapReduce example

The MapReduce process

Mapper

Shuffle and sorting

Reducer

Speculative execution

FileFormats

InputFormats

RecordReader

OutputFormats

RecordWriter

Writing a MapReduce program

Mapper code

Reducer code

Driver code

Auxiliary steps

Combiner

Partitioner

Custom partitioner

YARN

YARN architecture

ResourceManager

NodeManager

ApplicationMaster

Applications powered by YARN

Summary

4. Data Access Components – Hive and Pig

Need of a data processing tool on Hadoop

Pig

Pig data types

The Pig architecture

The logical plan

The physical plan

The MapReduce plan

Pig modes

Grunt shell

Input data

Loading data

Dump

Store

FOREACH generate

Filter

Group By

Limit

Aggregation

Cogroup

DESCRIBE

EXPLAIN

ILLUSTRATE

Hive

The Hive architecture

Metastore

The Query compiler

The Execution engine

Data types and schemas

Installing Hive

Starting Hive shell

HiveQL

DDL (Data Definition Language) operations

DML (Data Manipulation Language) operations

The SQL operation

Joins

Aggregations

Built-in functions

Custom UDF (User Defined Functions)

Managing tables – external versus managed

SerDe

Partitioning

Bucketing

Summary

5. Storage Component – HBase

An Overview of HBase

Advantages of HBase

The Architecture of HBase

MasterServer

RegionServer

WAL

BlockCache

LRUBlockCache

SlabCache

BucketCache

Regions

MemStore

Zookeeper

The HBase data model

Logical components of a data model

ACID properties

The CAP theorem

The Schema design

The Write pipeline

The Read pipeline

Compaction

The Compaction policy

Minor compaction

Major compaction

Splitting

Pre-Splitting

Auto Splitting

Forced Splitting

Commands

help

Create

List

Put

Scan

Get

Disable

Drop

HBase Hive integration

Performance tuning

Compression

Filters

Counters

HBase coprocessors

Summary

6. Data Ingestion in Hadoop – Sqoop and Flume

Data sources

Challenges in data ingestion

Sqoop

Connectors and drivers

Sqoop 1 architecture

Limitation of Sqoop 1

Sqoop 2 architecture

Imports

Exports

Apache Flume

Reliability

Flume architecture

Multitier topology

Flume master

Flume nodes

Components in Agent

Source

Sink

Channels

Memory channel

File Channel

JDBC Channel

Examples of configuring Flume

The Single agent example

Multiple flows in an agent

Configuring a multiagent setup

Summary

7. Streaming and Real-time Analysis – Storm and Spark

An introduction to Storm

Features of Storm

Physical architecture of Storm

Data architecture of Storm

Storm topology

Storm on YARN

Topology configuration example

Spouts

Bolts

Topology

An introduction to Spark

Features of Spark

Spark framework

Spark SQL

GraphX

MLib

Spark streaming

Spark architecture

Directed Acyclic Graph engine

Resilient Distributed Dataset

Physical architecture

Operations in Spark

Transformations

Actions

Spark example

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部