万本电子书0元读

万本电子书0元读

顶部广告

Apache Flume: Distributed Log Collection for Hadoop电子书

售       价:¥

3人正在读 | 0人评论 9.8

作       者:Steve Hoffman

出  版  社:Packt Publishing

出版时间:2013-07-16

字       数:45.6万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
A starter guide that covers Apache Flume in detail.Apache Flume: Distributed Log Collection for Hadoop is intended for people who are responsible for moving datasets into Hadoop in a timely and reliable manner like software engineers, database administrators, and data warehouse administrators
目录展开

Apache Flume: Distributed Log Collection for Hadoop

Table of Contents

Apache Flume: Distributed Log Collection for Hadoop

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Errata

Piracy

Questions

1. Overview and Architecture

Flume 0.9

Flume 1.X (Flume-NG)

The problem with HDFS and streaming data/logs

Sources, channels, and sinks

Flume events

Interceptors, channel selectors, and sink processors

Tiered data collection (multiple flows and/or agents)

Summary

2. Flume Quick Start

Downloading Flume

Flume in Hadoop distributions

Flume configuration file overview

Starting up with "Hello World"

Summary

3. Channels

Memory channel

File channel

Summary

4. Sinks and Sink Processors

HDFS sink

Path and filename

File rotation

Compression codecs

Event serializers

Text output

Text with headers

Apache Avro

File type

Sequence file

Data stream

Compressed stream

Timeouts and workers

Sink groups

Load balancing

Failover

Summary

5. Sources and Channel Selectors

The problem with using tail

The exec source

The spooling directory source

Syslog sources

The syslog UDP source

The syslog TCP source

The multiport syslog TCP source

Channel selectors

Replicating

Multiplexing

Summary

6. Interceptors, ETL, and Routing

Interceptors

Timestamp

Host

Static

Regular expression filtering

Regular expression extractor

Custom interceptors

Tiering data flows

Avro Source/Sink

Command-line Avro

Log4J Appender

The Load Balancing Log4J Appender

Routing

Summary

7. Monitoring Flume

Monitoring the agent process

Monit

Nagios

Monitoring performance metrics

Ganglia

The internal HTTP server

Custom monitoring hooks

Summary

8. There Is No Spoon – The Realities of Real-time Distributed Data Collection

Transport time versus log time

Time zones are evil

Capacity planning

Considerations for multiple data centers

Compliance and data expiry

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部