售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Scaling Apache Solr
Table of Contents
Scaling Apache Solr
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Understanding Apache Solr
Challenges in enterprise search
Apache Solr – an overview
Features of Apache Solr
Solr for end users
Powerful full text search
Search through rich information
Results ranking, pagination, and sorting
Facets for better browsing experience
Advanced search capabilities
Administration
Apache Solr architecture
Storage
Solr application
Integration
Client APIs and SolrJ client
Other interfaces
Practical use cases for Apache Solr
Enterprise search for a job search agency
Problem statement
Approach
Enterprise search for energy industry
Problem statement
Approach
Summary
2. Getting Started with Apache Solr
Setting up Apache Solr
Prerequisites
Running Solr on Jetty
Running Solr on Tomcat
Solr administration
What's next?
Common problems and solution
Understanding the Solr structure
The Solr home directory structure
Solr navigation
Configuring the Apache Solr for enterprise
Defining a Solr schema
Solr fields
Dynamic Fields in Solr
Copying the fields
Field types
Other important elements in the Solr schema
Configuring Solr parameters
solr.xml and Solr core
solrconfig.xml
The Solr plugin
Other configurations
Understanding SolrJ
Summary
3. Analyzing Data with Apache Solr
Understanding enterprise data
Categorizing by characteristics
Categorizing by access pattern
Categorizing by data formats
Loading data using native handlers
Quick and simple data loading – post tool
Working with JSON, XML, and CSV
Handling JSON data
Working with CSV data
Working with XML data
Working with rich documents
Understanding Apache Tika
Using Solr Cell (ExtractingRequestHandler)
Adding metadata to your rich documents
Importing structured data from the database
Configuring the data source
Importing data in Solr
Full import
Delta import
Loading RDBMS tables in Solr
Advanced topics with Solr
Deduplication
Extracting information from scanned documents
Searching through images using LIRE
Summary
4. Designing Enterprise Search
Designing aspects for enterprise search
Identifying requirements
Matching user expectations through relevance
Access to searched entities and user interface
Improving search performance and ensuring instance scalability
Working with applications through federated search
Other differentiators – mobiles, linguistic search, and security
Enterprise search data-processing patterns
Standalone search engine server
Distributed enterprise search pattern
The replicated enterprise search pattern
Distributed and replicated
Data integrating pattern for search
Data import by enterprise search
Applications pushing data
Middleware-based integration
Case study – designing an enterprise knowledge repository search for software IT services
Gathering requirements
Designing the solution
Designing the schema
Integrating subsystems with Apache Solr
Working on end user interface
Summary
5. Integrating Apache Solr
Empowering the Java Enterprise application with Solr search
Embedding Apache Solr as a module (web application) in an enterprise application
How to do it?
Apache Solr in your web application
How to do it?
Integration with client technologies
Integrating Apache Solr with PHP for web portals
Interacting directly with Solr
Using the Solr PHP client
How to do it?
Advanced integration with Solarium
How to do it?
Integrating Apache Solr with JavaScript
Using simple XMLHTTPRequest
Integrating Apache Solr using AJAX Solr
Parsing Solr XML with the help of XSLT
Case study – Apache Solr and Drupal
How to do it?
Summary
6. Distributed Search Using Apache Solr
Need for distributed search
Distributed search architecture
Apache Solr and distributed search
Understanding SolrCloud
Why Zookeeper?
SolrCloud architecture
Building enterprise distributed search using SolrCloud
Setting up a SolrCloud for development
Setting up a SolrCloud for production
Adding a document to SolrCloud
Creating shards, collections, and replicas in SolrCloud
Common problems and resolutions
Case study – distributed enterprise search server for the software industry
Summary
7. Scaling Solr through Sharding, Fault Tolerance, and Integration
Enabling search result clustering with Carrot2
Why Carrot2?
Enabling Carrot2-based document clustering
Understanding Carrot2 result clustering
Viewing Solr results in the Carrot2 workbench
FAQs and problems
Sharding and fault tolerance
Document routing and sharding
Shard splitting
Load balancing and fault tolerance in SolrCloud
Searching Solr documents in near real time
Strategies for near real-time search in Apache Solr
Explicit call to commit from a client
solrconfig.xml – autocommit
CommitWithin – delegating the responsibility to Solr
Real-time search in Apache Solr
Solr with MongoDB
Understanding MongoDB
Installing MongoDB
Creating Solr indexes from MongoDB
Scaling Solr through Storm
Getting along with Apache Storm
Solr and Apache Storm
Summary
8. Scaling Solr through High Performance
Monitoring performance of Apache Solr
What should be monitored?
Hardware and operating system
Java virtual machine
Apache Solr search runtime
Apache Solr indexing time
SolrCloud
Tools for monitoring Solr performance
Solr administration user interface
JConsole
SolrMeter
Tuning Solr JVM and container
Deciding heap size
How can we optimize JVM?
Optimizing JVM container
Optimizing Solr schema and indexing
Stored fields
Indexed fields and field lengths
Copy fields and dynamic fields
Fields for range queries
Index field updates
Synonyms, stemming, and stopwords
Tuning DataImportHandler
Speeding up index generation
Committing the change
Limiting indexing buffer size
SolrJ implementation classes
Speeding Solr through Solr caching
The filter cache
The query result cache
The document cache
The field value cache
The warming up cache
Improving runtime search for Solr
Pagination
Reducing Solr response footprint
Using filter queries
Search query and the parsers
Lazy field loading
Optimizing SolrCloud
Summary
9. Solr and Cloud Computing
Enterprise search on Cloud
Models of engagement
Enterprise search Cloud deployment models
Solr on Cloud strategies
Scaling Solr with a dedicated application
Advantages
Disadvantages
Scaling Solr horizontal as multiple applications
Advantages
Disadvantages
Scaling horizontally through the Solr multicore
Scaling horizontally with replication
Scaling horizontally with Zookeeper
Advantages
Disadvantages
Running Solr on Cloud (IaaS and PaaS)
Running Solr with Amazon Cloud
Running Solr on Windows Azure
Running Solr on Cloud (SaaS) and enterprise search as a service
Running Solr with OpenSolr Cloud
Running Solr with SolrHQ Cloud
Running Solr with Bitnami
Working with Amazon CloudSearch
Drupal-Solr SaaS with Acquia
Summary
10. Scaling Solr Capabilities with Big Data
Apache Solr and HDFS
Big Data search on Katta
How Katta works?
Setting up Katta cluster
Creating Katta indexes
Using the Solr 1045 patch – map-side indexing
Using the Solr 1301 patch – reduce-side indexing
Apache Solr and Cassandra
Working with Cassandra and Solr
Single node configuration
Integrating with multinode Cassandra
Advanced analytics with Solr
Integrating Solr and R
Summary
A. Sample Configuration for Apache Solr
schema.xml
solrconfig.xml
spellings.txt
synonyms.txt
protwords.txt
stopwords.txt
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜