售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Mastering Elasticsearch Second Edition
Table of Contents
Mastering Elasticsearch Second Edition
Credits
About the Author
Acknowledgments
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introduction to Elasticsearch
Introducing Apache Lucene
Getting familiar with Lucene
Overall architecture
Getting deeper into Lucene index
Norms
Term vectors
Posting formats
Doc values
Analyzing your data
Indexing and querying
Lucene query language
Understanding the basics
Querying fields
Term modifiers
Handling special characters
Introducing Elasticsearch
Basic concepts
Index
Document
Type
Mapping
Node
Cluster
Shard
Replica
Key concepts behind Elasticsearch architecture
Workings of Elasticsearch
The startup process
Failure detection
Communicating with Elasticsearch
Indexing data
Querying data
The story
Summary
2. Power User Query DSL
Default Apache Lucene scoring explained
When a document is matched
TF/IDF scoring formula
Lucene conceptual scoring formula
Lucene practical scoring formula
Elasticsearch point of view
An example
Query rewrite explained
Prefix query as an example
Getting back to Apache Lucene
Query rewrite properties
Query templates
Introducing query templates
Templates as strings
The Mustache template engine
Conditional expressions
Loops
Default values
Storing templates in files
Handling filters and why it matters
Filters and query relevance
How filters work
Bool or and/or/not filters
Performance considerations
Post filtering and filtered query
Choosing the right filtering method
Choosing the right query for the job
Query categorization
Basic queries
Compound queries
Not analyzed queries
Full text search queries
Pattern queries
Similarity supporting queries
Score altering queries
Position aware queries
Structure aware queries
The use cases
Example data
Basic queries use cases
Searching for values in range
Simplified query for multiple terms
Compound queries use cases
Boosting some of the matched documents
Ignoring lower scoring partial queries
Not analyzed queries use cases
Limiting results to given tags
Efficient query time stopwords handling
Full text search queries use cases
Using Lucene query syntax in queries
Handling user queries without errors
Pattern queries use cases
Autocomplete using prefixes
Pattern matching
Similarity supporting queries use cases
Finding terms similar to a given one
Finding documents with similar field values
Score altering queries use cases
Favoring newer books
Decreasing importance of books with certain value
Pattern queries use cases
Matching phrases
Spans, spans everywhere
Structure aware queries use cases
Returning parent documents having a certain nested document
Affecting parent document score with the score of nested documents
Summary
3. Not Only Full Text Search
Query rescoring
What is query rescoring?
An example query
Structure of the rescore query
Rescore parameters
Choosing the scoring mode
To sum up
Controlling multimatching
Multimatch types
Best fields matching
Cross fields matching
Most fields matching
Phrase matching
Phrase with prefixes matching
Significant terms aggregation
An example
Choosing significant terms
Multiple values analysis
Significant terms aggregation and full text search fields
Additional configuration options
Controlling the number of returned buckets
Background set filtering
Minimum document count
Execution hint
More options
There are limits
Memory consumption
Shouldn't be used as top-level aggregation
Counts are approximated
Floating point fields are not allowed
Documents grouping
Top hits aggregation
An example
Additional parameters
Relations between documents
The object type
The nested documents
Parent–child relationship
Parent–child relationship in the cluster
A few words about alternatives
Scripting changes between Elasticsearch versions
Scripting changes
Security issues
Groovy – the new default scripting language
Removal of MVEL language
Short Groovy introduction
Using Groovy as your scripting language
Variable definition in scripts
Conditionals
Loops
An example
There is more
Scripting in full text context
Field-related information
Shard level information
Term level information
More advanced term information
Lucene expressions explained
The basics
An example
There is more
Summary
4. Improving the User Search Experience
Correcting user spelling mistakes
Testing data
Getting into technical details
Suggesters
Using the _suggest REST endpoint
Understanding the REST endpoint suggester response
Including suggestion requests in query
The term suggester
Configuration
Common term suggester options
Additional term suggester options
The phrase suggester
Usage example
Configuration
Basic configuration
Configuring smoothing models
Configuring candidate generators
Configuring direct generators
The completion suggester
The logic behind the completion suggester
Using the completion suggester
Indexing data
Querying data
Custom weights
Additional parameters
Improving the query relevance
Data
The quest for relevance improvement
The standard query
The multi match query
Phrases comes into play
Let's throw the garbage away
Now, we boost
Performing a misspelling-proof search
Drill downs with faceting
Summary
5. The Index Distribution Architecture
Choosing the right amount of shards and replicas
Sharding and overallocation
A positive example of overallocation
Multiple shards versus multiple indices
Replicas
Routing explained
Shards and data
Let's test routing
Indexing with routing
Routing in practice
Querying
Aliases
Multiple routing values
Altering the default shard allocation behavior
Allocation awareness
Forcing allocation awareness
Filtering
What include, exclude, and require mean
Runtime allocation updating
Index level updates
Cluster level updates
Defining total shards allowed per node
Defining total shards allowed per physical server
Inclusion
Requirement
Exclusion
Disk-based allocation
Query execution preference
Introducing the preference parameter
Summary
6. Low-level Index Control
Altering Apache Lucene scoring
Available similarity models
Setting a per-field similarity
Similarity model configuration
Choosing the default similarity model
Configuring the chosen similarity model
Configuring the TF/IDF similarity
Configuring the Okapi BM25 similarity
Configuring the DFR similarity
Configuring the IB similarity
Configuring the LM Dirichlet similarity
Configuring the LM Jelinek Mercer similarity
Choosing the right directory implementation – the store module
The store type
The simple filesystem store
The new I/O filesystem store
The MMap filesystem store
The hybrid filesystem store
The memory store
Additional properties
The default store type
The default store type for Elasticsearch 1.3.0 and higher
The default store type for Elasticsearch versions older than 1.3.0
NRT, flush, refresh, and transaction log
Updating the index and committing changes
Changing the default refresh time
The transaction log
The transaction log configuration
Near real-time GET
Segment merging under control
Choosing the right merge policy
The tiered merge policy
The log byte size merge policy
The log doc merge policy
Merge policies' configuration
The tiered merge policy
The log byte size merge policy
The log doc merge policy
Scheduling
The concurrent merge scheduler
The serial merge scheduler
Setting the desired merge scheduler
When it is too much for I/O – throttling explained
Controlling I/O throttling
Configuration
The throttling type
Maximum throughput per second
Node throttling defaults
Performance considerations
The configuration example
Understanding Elasticsearch caching
The filter cache
Filter cache types
Node-level filter cache configuration
Index-level filter cache configuration
The field data cache
Field data or doc values
Node-level field data cache configuration
Index-level field data cache configuration
The field data cache filtering
Adding field data filtering information
Filtering by term frequency
Filtering by regex
Filtering by regex and term frequency
The filtering example
Field data formats
String-based fields
Numeric fields
Geographical-based fields
Field data loading
The shard query cache
Setting up the shard query cache
Using circuit breakers
The field data circuit breaker
The request circuit breaker
The total circuit breaker
Clearing the caches
Index, indices, and all caches clearing
Clearing specific caches
Summary
7. Elasticsearch Administration
Discovery and recovery modules
Discovery configuration
Zen discovery
Multicast Zen discovery configuration
The unicast Zen discovery configuration
Master node
Configuring master and data nodes
Configuring data-only nodes
Configuring master-only nodes
Configuring the query processing-only nodes
The master election configuration
Zen discovery fault detection and configuration
The Amazon EC2 discovery
The EC2 plugin installation
The EC2 plugin's generic configuration
Optional EC2 discovery configuration options
The EC2 nodes scanning configuration
Other discovery implementations
The gateway and recovery configuration
The gateway recovery process
Configuration properties
Expectations on nodes
The local gateway
Low-level recovery configuration
Cluster-level recovery configuration
Index-level recovery settings
The indices recovery API
The human-friendly status API – using the Cat API
The basics
Using the Cat API
Common arguments
The examples
Getting information about the master node
Getting information about the nodes
Backing up
Saving backups in the cloud
The S3 repository
The HDFS repository
The Azure repository
Federated search
The test clusters
Creating the tribe node
Using the unicast discovery for tribes
Reading data with the tribe node
Master-level read operations
Writing data with the tribe node
Master-level write operations
Handling indices conflicts
Blocking write operations
Summary
8. Improving Performance
Using doc values to optimize your queries
The problem with field data cache
The example of doc values usage
Knowing about garbage collector
Java memory
The life cycle of Java objects and garbage collections
Dealing with garbage collection problems
Turning on logging of garbage collection work
Using JStat
Creating memory dumps
More information on the garbage collector work
Adjusting the garbage collector work in Elasticsearch
Using a standard start up script
Service wrapper
Avoid swapping on Unix-like systems
Benchmarking queries
Preparing your cluster configuration for benchmarking
Running benchmarks
Controlling currently run benchmarks
Very hot threads
Usage clarification for the Hot Threads API
The Hot Threads API response
Scaling Elasticsearch
Vertical scaling
Horizontal scaling
Automatically creating replicas
Redundancy and high availability
Cost and performance flexibility
Continuous upgrades
Multiple Elasticsearch instances on a single physical machine
Preventing the shard and its replicas from being on the same node
Designated nodes' roles for larger clusters
Query aggregator nodes
Data nodes
Master eligible nodes
Using Elasticsearch for high load scenarios
General Elasticsearch-tuning advices
Choosing the right store
The index refresh rate
Thread pools tuning
Adjusting the merge process
Data distribution
Advices for high query rate scenarios
Filter caches and shard query caches
Think about the queries
Using routing
Parallelize your queries
Field data cache and breaking the circuit
Keeping size and shard_size under control
High indexing throughput scenarios and Elasticsearch
Bulk indexing
Doc values versus indexing speed
Keep your document fields under control
The index architecture and replication
Tuning write-ahead log
Think about storage
RAM buffer for indexing
Summary
9. Developing Elasticsearch Plugins
Creating the Apache Maven project structure
Understanding the basics
The structure of the Maven Java project
The idea of POM
Running the build process
Introducing the assembly Maven plugin
Creating custom REST action
The assumptions
Implementation details
Using the REST action class
The constructor
Handling requests
Writing response
The plugin class
Informing Elasticsearch about our REST action
Time for testing
Building the REST action plugin
Installing the REST action plugin
Checking whether the REST action plugin works
Creating the custom analysis plugin
Implementation details
Implementing TokenFilter
Implementing the TokenFilter factory
Implementing the class custom analyzer
Implementing the analyzer provider
Implementing the analysis binder
Implementing the analyzer indices component
Implementing the analyzer module
Implementing the analyzer plugin
Informing Elasticsearch about our custom analyzer
Testing our custom analysis plugin
Building our custom analysis plugin
Installing the custom analysis plugin
Checking whether our analysis plugin works
Summary
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜