万本电子书0元读

万本电子书0元读

顶部广告

Elasticsearch for Hadoop电子书

售       价:¥

13人正在读 | 0人评论 6.2

作       者:Vishal Shukla

出  版  社:Packt Publishing

出版时间:2015-10-27

字       数:201.3万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Integrate Elasticsearch into Hadoop to effectively visualize and analyze your dataAbout This BookBuild production-ready analytics applications by integrating the Hadoop ecosystem with ElasticsearchLearn complex Elasticsearch queries and develop real-time monitoring Kibana dashboards to visualize your dataUse Elasticsearch and Kibana to search data in Hadoop easily with this comprehensive, step-by-step guide Who This Book Is For This book is targeted at Java developers with basic knowledge on Hadoop. No prior Elasticsearch experience is expected.What You Will LearnSet up the Elasticsearch-Hadoop environmentImport HDFS data into Elasticsearch with MapReduce jobsPerform full-text search and aggregations efficiently using ElasticsearchVisualize data and create interactive dashboards using KibanaCheck and detect anomalies in streaming data using Storm and ElasticsearchInject and classify real-time streaming data into ElasticsearchGet production-ready for Elasticsearch-Hadoop based projectsIntegrate with Hadoop eco-system such as Pig, Storm, Hive, and Spark In Detail The Hadoop ecosystem is a de-facto standard for processing terra-bytes and peta-bytes of data. Lucene-enabled Elasticsearch is becoming an industry standard for its full-text search and aggregation capabilities. Elasticsearch-Hadoop serves as a perfect tool to bridge the worlds of Elasticsearch and Hadoop ecosystem to get best out of both the worlds. Powered with Kibana, this stack makes it a cakewalk to get surprising insights out of your massive amount of Hadoop ecosystem in a flash. In this book, you'll learn to use Elasticsearch, Kibana and Elasticsearch-Hadoop effectively to analyze and understand your HDFS and streaming data. You begin with an in-depth understanding of the Hadoop, Elasticsearch, Marvel, and Kibana setup. Right after this, you will learn to successfully import Hadoop data into Elasticsearch by writing MapReduce job in a real-world example. This is then followed by a comprehensive look at Elasticsearch essentials, such as full-text search analysis, queries, filters and aggregations; after which you gain an understanding of creating various visualizations and interactive dashboard using Kibana. Classifying your real-world streaming data and identifying trends in it using Storm and Elasticsearch are some of the other topics that we'll cover. You will also gain an insight about key concepts of Elasticsearch and Elasticsearch-hadoop in distributed mode, advanced configurations along with some common configuration presets you may need for your production deployments. You will have “Go production checklist” and high-level view for cluster administration for post-production. Towards the end, you will learn to integrate Elasticsearch with other Hadoop eco-system tools, such as Pig, Hive and Spark.Style and approach A concise yet comprehensive approach has been adopted with real-time examples to help you grasp the concepts easily.
目录展开

Elasticsearch for Hadoop

Table of Contents

Elasticsearch for Hadoop

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Setting Up Environment

Setting up Hadoop for Elasticsearch

Setting up Java

Setting up a dedicated user

Installing SSH and setting up the certificate

Downloading Hadoop

Setting up environment variables

Configuring Hadoop

Configuring core-site.xml

Configuring hdfs-site.xml

Configuring yarn-site.xml

Configuring mapred-site.xml

The format distributed filesystem

Starting Hadoop daemons

Setting up Elasticsearch

Downloading Elasticsearch

Configuring Elasticsearch

Installing Elasticsearch's Head plugin

Installing the Marvel plugin

Running and testing

Running the WordCount example

Getting the examples and building the job JAR file

Importing the test file to HDFS

Running our first job

Exploring data in Head and Marvel

Viewing data in Head

Using the Marvel dashboard

Exploring the data in Sense

Summary

2. Getting Started with ES-Hadoop

Understanding the WordCount program

Understanding Mapper

Understanding the reducer

Understanding the driver

Using the old API – org.apache.hadoop.mapred

Going real — network monitoring data

Getting and understanding the data

Knowing the problems

Solution approaches

Approach 1 – Preaggregate the results

Approach 2 – Aggregate the results at query-time

Writing the NetworkLogsMapper job

Writing the mapper class

Writing Driver

Building the job

Getting the data into HDFS

Running the job

Viewing the Top N results

Getting data from Elasticsearch to HDFS

Understanding the Twitter dataset

Trying it yourself

Creating the MapReduce job to import data from Elasticsearch to HDFS

Writing the Tweets2Hdfs mapper

Running the example

Testing the job execution output

Summary

3. Understanding Elasticsearch

Knowing Search and Elasticsearch

The paradigm mismatch

Index

Type

Document

Field

Talking to Elasticsearch

CRUD with Elasticsearch

Creating the document request

The GET request

The Update request

The Delete request

Creating the index

Mappings

Data types

Create mapping API

Index templates

Controlling the indexing process

What is an inverted index?

The input data analysis

Removing stop words

Case insensitive

Stemming

Synonyms

Analyzers

Elastic searching

Writing search queries

The URI search

Matching all queries

The term query

The boolean query

The match query

The range query

The wildcard query

Filters

The exists filter

The geo distance filter

Aggregations

Executing the aggregation queries

The terms aggregation

Histograms

The range aggregation

The geo distance

Sub-aggregations

Try it yourself

Summary

4. Visualizing Big Data Using Kibana

Setting up and getting started

Setting up Kibana

Setting up datasets

Try it out

Getting started with Kibana

Discovering data

Visualizing the data

The pie chart

The stacked bar chart

The date histogram with the stacked bar chart

The area chart

The split pie chart

The sun burst chart

The geographical chart

Trying it out

Creating dynamic dashboards

Migrating the dashboards

Summary

5. Real-Time Analytics

Getting started with the Twitter Trend Analyser

What are we trying to do?

Setting up Apache Storm

Injecting streaming data into Storm

Writing a Storm spout

Writing Storm bolts

Creating a Storm topology

Building and running a Storm job

Analyzing trends

Significant terms aggregation

Viewing trends in Kibana

Classifying tweets using percolators

Percolator

Building a percolator query effectively

Classifying tweets

Summary

6. ES-Hadoop in Production

Elasticsearch in a distributed environment

Elasticsearch clusters and nodes

Node types

The master node

The data node

The client node

Tribe nodes

Node discovery

Multicast discovery

Unicast discovery

Data inside clusters

Shards

Replicas

Shard allocation

The ES-Hadoop architecture

Dynamic parallelism

Writing to Elasticsearch

Reads from Elasticsearch

Failure handling

Data colocation

Configuring the environment for production

Hardware

Memory

CPU

Disks

Network

Setting up the cluster

The recommended cluster topology

Set names

Paths

Memory configurations

The split-brain problem

Recovery configurations

Configuration presets

Rapid indexing

Lightening a full text search

Faster aggregations

Bonus – the production deployment checklist

Administration of clusters

Monitoring the cluster health

Snapshot and restore

Backing up your data

Restoring your data

Summary

7. Integrating with the Hadoop Ecosystem

Pigging out Elasticsearch

Setting up Apache Pig for Elasticsearch

Importing data to Elasticsearch

Writing from the JSON source

Type conversions

Reading data from Elasticsearch

SQLizing Elasticsearch with Hive

Setting up Apache Hive

Importing data to Elasticsearch

Writing from the JSON source

Type conversions

Reading data from Elasticsearch

Cascading with Elasticsearch

Importing data to Elasticsearch

Writing a cascading job

Running the job

Reading data from Elasticsearch

Writing a reader job

Using Lingual with Elasticsearch

Giving Spark to Elasticsearch

Setting up Spark

Importing data to Elasticsearch

Using SparkSQL

Reading data from Elasticsearch

Using SparkSQL

ES-Hadoop on YARN

Summary

A. Configurations

Basic configurations

es.resource

es.resource.read

es.resource.write

es.nodes

es.port

Write and query configurations

es.query

es.input.json

es.write.operation

es.update.script

es.update.script.lang

es.update.script.params

es.update.script.params.json

es.batch.size.bytes

es.batch.size.entries

es.batch.write.refresh

es.batch.write.retry.count

es.batch.write.retry.wait

es.ser.reader.value.class

es.ser.writer.value.class

es.update.retry.on.conflict

Mapping configurations

es.mapping.id

es.mapping.parent

es.mapping.version

es.mapping.version.type

es.mapping.routing

es.mapping.ttl

es.mapping.timestamp

es.mapping.date.rich

es.mapping.include

es.mapping.exclude

Index configurations

es.index.auto.create

es.index.read.missing.as.empty

es.field.read.empty.as.null

es.field.read.validate.presence

Network configurations

es.nodes.discovery

es.nodes.client.only

es.http.timeout

es.http.retries

es.scroll.keepalive

es.scroll.size

es.action.heart.beat.lead

Authentication configurations

es.net.http.auth.user

es.net.http.auth.pass

SSL configurations

es.net.ssl

es.net.ssl.keystore.location

es.net.ssl.keystore.pass

es.net.ssl.keystore.type

es.net.ssl.truststore.location

es.net.ssl.truststore.pass

es.net.ssl.cert.allow.self.signed

es.net.ssl.protocol

es.scroll.size

Proxy configurations

es.net.proxy.http.host

es.net.proxy.http.port

es.net.proxy.http.user

es.net.proxy.http.pass

es.net.proxy.http.use.system.props

es.net.proxy.socks.host

es.net.proxy.socks.port

es.net.proxy.socks.user

es.net.proxy.socks.pass

es.net.proxy.socks.use.system.props

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部