万本电子书0元读

万本电子书0元读

顶部广告

Apache Flume: Distributed Log Collection for Hadoop - Second Edition电子书

售       价:¥

5人正在读 | 0人评论 9.8

作       者:Steve Hoffman

出  版  社:Packt Publishing

出版时间:2015-02-25

字       数:112.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.
目录展开

Apache Flume: Distributed Log Collection for Hadoop Second Edition

Table of Contents

Apache Flume: Distributed Log Collection for Hadoop Second Edition

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Overview and Architecture

Flume 0.9

Flume 1.X (Flume-NG)

The problem with HDFS and streaming data/logs

Sources, channels, and sinks

Flume events

Interceptors, channel selectors, and sink processors

Tiered data collection (multiple flows and/or agents)

The Kite SDK

Summary

2. A Quick Start Guide to Flume

Downloading Flume

Flume in Hadoop distributions

An overview of the Flume configuration file

Starting up with "Hello, World!"

Summary

3. Channels

The memory channel

The file channel

Spillable Memory Channel

Summary

4. Sinks and Sink Processors

HDFS sink

Path and filename

File rotation

Compression codecs

Event Serializers

Text output

Text with headers

Apache Avro

User-provided Avro schema

File type

SequenceFile

DataStream

CompressedStream

Timeouts and workers

Sink groups

Load balancing

Failover

MorphlineSolrSink

Morphline configuration files

Typical SolrSink configuration

Sink configuration

ElasticSearchSink

LogStash Serializer

Dynamic Serializer

Summary

5. Sources and Channel Selectors

The problem with using tail

The Exec source

Spooling Directory Source

Syslog sources

The syslog UDP source

The syslog TCP source

The multiport syslog TCP source

JMS source

Channel selectors

Replicating

Multiplexing

Summary

6. Interceptors, ETL, and Routing

Interceptors

Timestamp

Host

Static

Regular expression filtering

Regular expression extractor

Morphline interceptor

Custom interceptors

The plugins directory

Tiering flows

The Avro source/sink

Compressing Avro

SSL Avro flows

The Thrift source/sink

Using command-line Avro

The Log4J appender

The Log4J load-balancing appender

The embedded agent

Configuration and startup

Sending data

Shutdown

Routing

Summary

7. Putting It All Together

Web logs to searchable UI

Setting up the web server

Configuring log rotation to the spool directory

Setting up the target – Elasticsearch

Setting up Flume on collector/relay

Setting up Flume on the client

Creating more search fields with an interceptor

Setting up a better user interface – Kibana

Archiving to HDFS

Summary

8. Monitoring Flume

Monitoring the agent process

Monit

Nagios

Monitoring performance metrics

Ganglia

Internal HTTP server

Custom monitoring hooks

Summary

9. There Is No Spoon – the Realities of Real-time Distributed Data Collection

Transport time versus log time

Time zones are evil

Capacity planning

Considerations for multiple data centers

Compliance and data expiry

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部