售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Elasticsearch for Hadoop
Table of Contents
Elasticsearch for Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Setting Up Environment
Setting up Hadoop for Elasticsearch
Setting up Java
Setting up a dedicated user
Installing SSH and setting up the certificate
Downloading Hadoop
Setting up environment variables
Configuring Hadoop
Configuring core-site.xml
Configuring hdfs-site.xml
Configuring yarn-site.xml
Configuring mapred-site.xml
The format distributed filesystem
Starting Hadoop daemons
Setting up Elasticsearch
Downloading Elasticsearch
Configuring Elasticsearch
Installing Elasticsearch's Head plugin
Installing the Marvel plugin
Running and testing
Running the WordCount example
Getting the examples and building the job JAR file
Importing the test file to HDFS
Running our first job
Exploring data in Head and Marvel
Viewing data in Head
Using the Marvel dashboard
Exploring the data in Sense
Summary
2. Getting Started with ES-Hadoop
Understanding the WordCount program
Understanding Mapper
Understanding the reducer
Understanding the driver
Using the old API – org.apache.hadoop.mapred
Going real — network monitoring data
Getting and understanding the data
Knowing the problems
Solution approaches
Approach 1 – Preaggregate the results
Approach 2 – Aggregate the results at query-time
Writing the NetworkLogsMapper job
Writing the mapper class
Writing Driver
Building the job
Getting the data into HDFS
Running the job
Viewing the Top N results
Getting data from Elasticsearch to HDFS
Understanding the Twitter dataset
Trying it yourself
Creating the MapReduce job to import data from Elasticsearch to HDFS
Writing the Tweets2Hdfs mapper
Running the example
Testing the job execution output
Summary
3. Understanding Elasticsearch
Knowing Search and Elasticsearch
The paradigm mismatch
Index
Type
Document
Field
Talking to Elasticsearch
CRUD with Elasticsearch
Creating the document request
The GET request
The Update request
The Delete request
Creating the index
Mappings
Data types
Create mapping API
Index templates
Controlling the indexing process
What is an inverted index?
The input data analysis
Removing stop words
Case insensitive
Stemming
Synonyms
Analyzers
Elastic searching
Writing search queries
The URI search
Matching all queries
The term query
The boolean query
The match query
The range query
The wildcard query
Filters
The exists filter
The geo distance filter
Aggregations
Executing the aggregation queries
The terms aggregation
Histograms
The range aggregation
The geo distance
Sub-aggregations
Try it yourself
Summary
4. Visualizing Big Data Using Kibana
Setting up and getting started
Setting up Kibana
Setting up datasets
Try it out
Getting started with Kibana
Discovering data
Visualizing the data
The pie chart
The stacked bar chart
The date histogram with the stacked bar chart
The area chart
The split pie chart
The sun burst chart
The geographical chart
Trying it out
Creating dynamic dashboards
Migrating the dashboards
Summary
5. Real-Time Analytics
Getting started with the Twitter Trend Analyser
What are we trying to do?
Setting up Apache Storm
Injecting streaming data into Storm
Writing a Storm spout
Writing Storm bolts
Creating a Storm topology
Building and running a Storm job
Analyzing trends
Significant terms aggregation
Viewing trends in Kibana
Classifying tweets using percolators
Percolator
Building a percolator query effectively
Classifying tweets
Summary
6. ES-Hadoop in Production
Elasticsearch in a distributed environment
Elasticsearch clusters and nodes
Node types
The master node
The data node
The client node
Tribe nodes
Node discovery
Multicast discovery
Unicast discovery
Data inside clusters
Shards
Replicas
Shard allocation
The ES-Hadoop architecture
Dynamic parallelism
Writing to Elasticsearch
Reads from Elasticsearch
Failure handling
Data colocation
Configuring the environment for production
Hardware
Memory
CPU
Disks
Network
Setting up the cluster
The recommended cluster topology
Set names
Paths
Memory configurations
The split-brain problem
Recovery configurations
Configuration presets
Rapid indexing
Lightening a full text search
Faster aggregations
Bonus – the production deployment checklist
Administration of clusters
Monitoring the cluster health
Snapshot and restore
Backing up your data
Restoring your data
Summary
7. Integrating with the Hadoop Ecosystem
Pigging out Elasticsearch
Setting up Apache Pig for Elasticsearch
Importing data to Elasticsearch
Writing from the JSON source
Type conversions
Reading data from Elasticsearch
SQLizing Elasticsearch with Hive
Setting up Apache Hive
Importing data to Elasticsearch
Writing from the JSON source
Type conversions
Reading data from Elasticsearch
Cascading with Elasticsearch
Importing data to Elasticsearch
Writing a cascading job
Running the job
Reading data from Elasticsearch
Writing a reader job
Using Lingual with Elasticsearch
Giving Spark to Elasticsearch
Setting up Spark
Importing data to Elasticsearch
Using SparkSQL
Reading data from Elasticsearch
Using SparkSQL
ES-Hadoop on YARN
Summary
A. Configurations
Basic configurations
es.resource
es.resource.read
es.resource.write
es.nodes
es.port
Write and query configurations
es.query
es.input.json
es.write.operation
es.update.script
es.update.script.lang
es.update.script.params
es.update.script.params.json
es.batch.size.bytes
es.batch.size.entries
es.batch.write.refresh
es.batch.write.retry.count
es.batch.write.retry.wait
es.ser.reader.value.class
es.ser.writer.value.class
es.update.retry.on.conflict
Mapping configurations
es.mapping.id
es.mapping.parent
es.mapping.version
es.mapping.version.type
es.mapping.routing
es.mapping.ttl
es.mapping.timestamp
es.mapping.date.rich
es.mapping.include
es.mapping.exclude
Index configurations
es.index.auto.create
es.index.read.missing.as.empty
es.field.read.empty.as.null
es.field.read.validate.presence
Network configurations
es.nodes.discovery
es.nodes.client.only
es.http.timeout
es.http.retries
es.scroll.keepalive
es.scroll.size
es.action.heart.beat.lead
Authentication configurations
es.net.http.auth.user
es.net.http.auth.pass
SSL configurations
es.net.ssl
es.net.ssl.keystore.location
es.net.ssl.keystore.pass
es.net.ssl.keystore.type
es.net.ssl.truststore.location
es.net.ssl.truststore.pass
es.net.ssl.cert.allow.self.signed
es.net.ssl.protocol
es.scroll.size
Proxy configurations
es.net.proxy.http.host
es.net.proxy.http.port
es.net.proxy.http.user
es.net.proxy.http.pass
es.net.proxy.http.use.system.props
es.net.proxy.socks.host
es.net.proxy.socks.port
es.net.proxy.socks.user
es.net.proxy.socks.pass
es.net.proxy.socks.use.system.props
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜