售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Natural Language Processing with Java and LingPipe Cookbook
Table of Contents
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Simple Classifiers
Introduction
LingPipe and its installation
Projects similar to LingPipe
So, why use LingPipe?
Downloading the book code and data
Downloading LingPipe
Deserializing and running a classifier
How to do it...
How it works...
Getting confidence estimates from a classifier
Getting ready
How to do it…
How it works…
See also
Getting data from the Twitter API
Getting ready
How to do it...
How it works...
See also
Applying a classifier to a .csv file
How to do it...
How it works…
Evaluation of classifiers – the confusion matrix
Getting ready
How to do it...
How it works...
There's more...
Training your own language model classifier
Getting ready
How to do it...
How it works...
There's more...
See also
How to train and evaluate with cross validation
Getting ready
How to do it...
How it works…
There's more…
Viewing error categories – false positives
How to do it...
How it works…
Understanding precision and recall
How to serialize a LingPipe object – classifier example
Getting ready
How to do it...
How it works…
There's more…
Eliminate near duplicates with the Jaccard distance
How to do it…
How it works…
How to classify sentiment – simple version
How to do it…
How it works...
There's more…
Common problems as a classification problem
Topic detection
Question answering
Degree of sentiment
Non-exclusive category classification
Person/company/location detection
2. Finding and Working with Words
Introduction
Introduction to tokenizer factories – finding words in a character stream
Getting ready
How to do it...
How it works...
There's more…
Combining tokenizers – lowercase tokenizer
Getting ready
How to do it...
How it works...
See also
Combining tokenizers – stop word tokenizers
Getting ready
How to do it...
How it works...
See also
Using Lucene/Solr tokenizers
Getting ready
How to do it...
How it works...
See also
Using Lucene/Solr tokenizers with LingPipe
How to do it...
How it works...
Evaluating tokenizers with unit tests
How to do it...
Modifying tokenizer factories
How to do it...
How it works...
Finding words for languages without white spaces
Getting ready
How to do it...
How it works...
There's more...
See also
3. Advanced Classifiers
Introduction
A simple classifier
How to do it...
How it works...
There's more…
Language model classifier with tokens
How to do it...
There's more...
Naïve Bayes
Getting ready
How to do it...
See also
Feature extractors
How to do it...
How it works…
Logistic regression
How logistic regression works
Getting ready
How to do it...
Multithreaded cross validation
How to do it...
How it works…
Tuning parameters in logistic regression
How to do it...
How it works…
Tuning feature extraction
Priors
Annealing schedule and epochs
Customizing feature extraction
How to do it…
There's more…
Combining feature extractors
How to do it…
There's more…
Classifier-building life cycle
Getting ready
How to do it…
Sanity check – test on training data
Establishing a baseline with cross validation and metrics
Picking a single metric to optimize against
Implementing the evaluation metric
Linguistic tuning
How to do it…
Thresholding classifiers
How to do it...
How it works…
Train a little, learn a little – active learning
Getting ready
How to do it…
How it works...
Annotation
How to do it...
How it works…
There's more…
4. Tagging Words and Tokens
Introduction
Interesting phrase detection
How to do it...
How it works...
There's more...
Foreground- or background-driven interesting phrase detection
Getting ready
How to do it...
How it works...
There's more...
Hidden Markov Models (HMM) – part-of-speech
How to do it...
How it works...
N-best word tagging
How to do it...
How it works...
Confidence-based tagging
How to do it...
How it works…
Training word tagging
How to do it...
How it works…
There's more…
Word-tagging evaluation
Getting ready
How to do it…
There's more…
Conditional random fields (CRF) for word/token tagging
How to do it...
How it works…
SimpleCrfFeatureExtractor
There's more…
Modifying CRFs
How to do it...
How it works…
Candidate-edge features
Node features
There's more…
5. Finding Spans in Text – Chunking
Introduction
Sentence detection
How to do it...
How it works...
There's more...
Nested sentences
Evaluation of sentence detection
How to do it...
How it works...
Parsing annotated data
Tuning sentence detection
How to do it...
There's more...
Marking embedded chunks in a string – sentence chunk example
How to do it...
Paragraph detection
How to do it...
Simple noun phrases and verb phrases
How to do it…
How it works…
Regular expression-based chunking for NER
How to do it…
How it works…
See also
Dictionary-based chunking for NER
How to do it…
How it works…
Translating between word tagging and chunks – BIO codec
Getting ready
How to do it…
How it works…
There's more…
HMM-based NER
Getting ready
How to do it…
How it works…
There's more…
See also
Mixing the NER sources
How to do it…
How it works…
CRFs for chunking
Getting ready
How to do it...
How it works…
NER using CRFs with better features
How to do it…
How it works…
6. String Comparison and Clustering
Introduction
Distance and proximity – simple edit distance
How to do it...
How it works...
See also
Weighted edit distance
How to do it...
How it works...
See also
The Jaccard distance
How to do it...
How it works...
The Tf-Idf distance
How to do it...
How it works...
There's more...
Difference between supervised and unsupervised trainings
Training on test data is OK
Using edit distance and language models for spelling correction
How to do it...
How it works...
See also
The case restoring corrector
How to do it...
How it works...
See also
Automatic phrase completion
How to do it...
How it works...
See also
Single-link and complete-link clustering using edit distance
How to do it…
There's more…
See also…
Latent Dirichlet allocation (LDA) for multitopic clustering
Getting ready
How to do it…
7. Finding Coreference Between Concepts/People
Introduction
Named entity coreference with a document
Getting ready
How to do it…
How it works…
Adding pronouns to coreference
How to do it…
How it works…
See also
Cross-document coreference
How to do it...
How it works…
The batch process life cycle
Setting up the entity universe
ProcessDocuments() and ProcessDocument()
Computing XDoc
The promote() method
The createEntitySpeculative() method
The XDocCoref.addMentionChainToEntity() entity
The XDocCoref.resolveMentionChain() entity
The resolveCandidates() method
The John Smith problem
Getting ready
How to do it...
See also
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜