万本电子书0元读

万本电子书0元读

顶部广告

Scaling Apache Solr电子书

售       价:¥

1人正在读 | 0人评论 9.8

作       者:Hrishikesh Vijay Karambelkar

出  版  社:Packt Publishing

出版时间:2014-07-25

字       数:218.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This book is a step-by-step guide for readers who would like to learn how to build complete enterprise search solutions, with ample real-world examples and case studies. If you are a developer, designer, or architect who would like to build enterprise search solutions for your customers or organization, but have no prior knowledge of Apache Solr/Lucene technologies, this is the book for you.
目录展开

Scaling Apache Solr

Table of Contents

Scaling Apache Solr

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Understanding Apache Solr

Challenges in enterprise search

Apache Solr – an overview

Features of Apache Solr

Solr for end users

Powerful full text search

Search through rich information

Results ranking, pagination, and sorting

Facets for better browsing experience

Advanced search capabilities

Administration

Apache Solr architecture

Storage

Solr application

Integration

Client APIs and SolrJ client

Other interfaces

Practical use cases for Apache Solr

Enterprise search for a job search agency

Problem statement

Approach

Enterprise search for energy industry

Problem statement

Approach

Summary

2. Getting Started with Apache Solr

Setting up Apache Solr

Prerequisites

Running Solr on Jetty

Running Solr on Tomcat

Solr administration

What's next?

Common problems and solution

Understanding the Solr structure

The Solr home directory structure

Solr navigation

Configuring the Apache Solr for enterprise

Defining a Solr schema

Solr fields

Dynamic Fields in Solr

Copying the fields

Field types

Other important elements in the Solr schema

Configuring Solr parameters

solr.xml and Solr core

solrconfig.xml

The Solr plugin

Other configurations

Understanding SolrJ

Summary

3. Analyzing Data with Apache Solr

Understanding enterprise data

Categorizing by characteristics

Categorizing by access pattern

Categorizing by data formats

Loading data using native handlers

Quick and simple data loading – post tool

Working with JSON, XML, and CSV

Handling JSON data

Working with CSV data

Working with XML data

Working with rich documents

Understanding Apache Tika

Using Solr Cell (ExtractingRequestHandler)

Adding metadata to your rich documents

Importing structured data from the database

Configuring the data source

Importing data in Solr

Full import

Delta import

Loading RDBMS tables in Solr

Advanced topics with Solr

Deduplication

Extracting information from scanned documents

Searching through images using LIRE

Summary

4. Designing Enterprise Search

Designing aspects for enterprise search

Identifying requirements

Matching user expectations through relevance

Access to searched entities and user interface

Improving search performance and ensuring instance scalability

Working with applications through federated search

Other differentiators – mobiles, linguistic search, and security

Enterprise search data-processing patterns

Standalone search engine server

Distributed enterprise search pattern

The replicated enterprise search pattern

Distributed and replicated

Data integrating pattern for search

Data import by enterprise search

Applications pushing data

Middleware-based integration

Case study – designing an enterprise knowledge repository search for software IT services

Gathering requirements

Designing the solution

Designing the schema

Integrating subsystems with Apache Solr

Working on end user interface

Summary

5. Integrating Apache Solr

Empowering the Java Enterprise application with Solr search

Embedding Apache Solr as a module (web application) in an enterprise application

How to do it?

Apache Solr in your web application

How to do it?

Integration with client technologies

Integrating Apache Solr with PHP for web portals

Interacting directly with Solr

Using the Solr PHP client

How to do it?

Advanced integration with Solarium

How to do it?

Integrating Apache Solr with JavaScript

Using simple XMLHTTPRequest

Integrating Apache Solr using AJAX Solr

Parsing Solr XML with the help of XSLT

Case study – Apache Solr and Drupal

How to do it?

Summary

6. Distributed Search Using Apache Solr

Need for distributed search

Distributed search architecture

Apache Solr and distributed search

Understanding SolrCloud

Why Zookeeper?

SolrCloud architecture

Building enterprise distributed search using SolrCloud

Setting up a SolrCloud for development

Setting up a SolrCloud for production

Adding a document to SolrCloud

Creating shards, collections, and replicas in SolrCloud

Common problems and resolutions

Case study – distributed enterprise search server for the software industry

Summary

7. Scaling Solr through Sharding, Fault Tolerance, and Integration

Enabling search result clustering with Carrot2

Why Carrot2?

Enabling Carrot2-based document clustering

Understanding Carrot2 result clustering

Viewing Solr results in the Carrot2 workbench

FAQs and problems

Sharding and fault tolerance

Document routing and sharding

Shard splitting

Load balancing and fault tolerance in SolrCloud

Searching Solr documents in near real time

Strategies for near real-time search in Apache Solr

Explicit call to commit from a client

solrconfig.xml – autocommit

CommitWithin – delegating the responsibility to Solr

Real-time search in Apache Solr

Solr with MongoDB

Understanding MongoDB

Installing MongoDB

Creating Solr indexes from MongoDB

Scaling Solr through Storm

Getting along with Apache Storm

Solr and Apache Storm

Summary

8. Scaling Solr through High Performance

Monitoring performance of Apache Solr

What should be monitored?

Hardware and operating system

Java virtual machine

Apache Solr search runtime

Apache Solr indexing time

SolrCloud

Tools for monitoring Solr performance

Solr administration user interface

JConsole

SolrMeter

Tuning Solr JVM and container

Deciding heap size

How can we optimize JVM?

Optimizing JVM container

Optimizing Solr schema and indexing

Stored fields

Indexed fields and field lengths

Copy fields and dynamic fields

Fields for range queries

Index field updates

Synonyms, stemming, and stopwords

Tuning DataImportHandler

Speeding up index generation

Committing the change

Limiting indexing buffer size

SolrJ implementation classes

Speeding Solr through Solr caching

The filter cache

The query result cache

The document cache

The field value cache

The warming up cache

Improving runtime search for Solr

Pagination

Reducing Solr response footprint

Using filter queries

Search query and the parsers

Lazy field loading

Optimizing SolrCloud

Summary

9. Solr and Cloud Computing

Enterprise search on Cloud

Models of engagement

Enterprise search Cloud deployment models

Solr on Cloud strategies

Scaling Solr with a dedicated application

Advantages

Disadvantages

Scaling Solr horizontal as multiple applications

Advantages

Disadvantages

Scaling horizontally through the Solr multicore

Scaling horizontally with replication

Scaling horizontally with Zookeeper

Advantages

Disadvantages

Running Solr on Cloud (IaaS and PaaS)

Running Solr with Amazon Cloud

Running Solr on Windows Azure

Running Solr on Cloud (SaaS) and enterprise search as a service

Running Solr with OpenSolr Cloud

Running Solr with SolrHQ Cloud

Running Solr with Bitnami

Working with Amazon CloudSearch

Drupal-Solr SaaS with Acquia

Summary

10. Scaling Solr Capabilities with Big Data

Apache Solr and HDFS

Big Data search on Katta

How Katta works?

Setting up Katta cluster

Creating Katta indexes

Using the Solr 1045 patch – map-side indexing

Using the Solr 1301 patch – reduce-side indexing

Apache Solr and Cassandra

Working with Cassandra and Solr

Single node configuration

Integrating with multinode Cassandra

Advanced analytics with Solr

Integrating Solr and R

Summary

A. Sample Configuration for Apache Solr

schema.xml

solrconfig.xml

spellings.txt

synonyms.txt

protwords.txt

stopwords.txt

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部