万本电子书0元读

万本电子书0元读

顶部广告

Natural Language Processing with Java and LingPipe Cookbook电子书

售       价:¥

1人正在读 | 0人评论 9.8

作       者:Breck Baldwin

出  版  社:Packt Publishing

出版时间:2014-11-28

字       数:212.3万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This book is for experienced Java developers with NLP needs, whether academics, industrialists, or hobbyists. A basic knowledge of NLP terminology will be beneficial.
目录展开

Natural Language Processing with Java and LingPipe Cookbook

Table of Contents

Natural Language Processing with Java and LingPipe Cookbook

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Simple Classifiers

Introduction

LingPipe and its installation

Projects similar to LingPipe

So, why use LingPipe?

Downloading the book code and data

Downloading LingPipe

Deserializing and running a classifier

How to do it...

How it works...

Getting confidence estimates from a classifier

Getting ready

How to do it…

How it works…

See also

Getting data from the Twitter API

Getting ready

How to do it...

How it works...

See also

Applying a classifier to a .csv file

How to do it...

How it works…

Evaluation of classifiers – the confusion matrix

Getting ready

How to do it...

How it works...

There's more...

Training your own language model classifier

Getting ready

How to do it...

How it works...

There's more...

See also

How to train and evaluate with cross validation

Getting ready

How to do it...

How it works…

There's more…

Viewing error categories – false positives

How to do it...

How it works…

Understanding precision and recall

How to serialize a LingPipe object – classifier example

Getting ready

How to do it...

How it works…

There's more…

Eliminate near duplicates with the Jaccard distance

How to do it…

How it works…

How to classify sentiment – simple version

How to do it…

How it works...

There's more…

Common problems as a classification problem

Topic detection

Question answering

Degree of sentiment

Non-exclusive category classification

Person/company/location detection

2. Finding and Working with Words

Introduction

Introduction to tokenizer factories – finding words in a character stream

Getting ready

How to do it...

How it works...

There's more…

Combining tokenizers – lowercase tokenizer

Getting ready

How to do it...

How it works...

See also

Combining tokenizers – stop word tokenizers

Getting ready

How to do it...

How it works...

See also

Using Lucene/Solr tokenizers

Getting ready

How to do it...

How it works...

See also

Using Lucene/Solr tokenizers with LingPipe

How to do it...

How it works...

Evaluating tokenizers with unit tests

How to do it...

Modifying tokenizer factories

How to do it...

How it works...

Finding words for languages without white spaces

Getting ready

How to do it...

How it works...

There's more...

See also

3. Advanced Classifiers

Introduction

A simple classifier

How to do it...

How it works...

There's more…

Language model classifier with tokens

How to do it...

There's more...

Naïve Bayes

Getting ready

How to do it...

See also

Feature extractors

How to do it...

How it works…

Logistic regression

How logistic regression works

Getting ready

How to do it...

Multithreaded cross validation

How to do it...

How it works…

Tuning parameters in logistic regression

How to do it...

How it works…

Tuning feature extraction

Priors

Annealing schedule and epochs

Customizing feature extraction

How to do it…

There's more…

Combining feature extractors

How to do it…

There's more…

Classifier-building life cycle

Getting ready

How to do it…

Sanity check – test on training data

Establishing a baseline with cross validation and metrics

Picking a single metric to optimize against

Implementing the evaluation metric

Linguistic tuning

How to do it…

Thresholding classifiers

How to do it...

How it works…

Train a little, learn a little – active learning

Getting ready

How to do it…

How it works...

Annotation

How to do it...

How it works…

There's more…

4. Tagging Words and Tokens

Introduction

Interesting phrase detection

How to do it...

How it works...

There's more...

Foreground- or background-driven interesting phrase detection

Getting ready

How to do it...

How it works...

There's more...

Hidden Markov Models (HMM) – part-of-speech

How to do it...

How it works...

N-best word tagging

How to do it...

How it works...

Confidence-based tagging

How to do it...

How it works…

Training word tagging

How to do it...

How it works…

There's more…

Word-tagging evaluation

Getting ready

How to do it…

There's more…

Conditional random fields (CRF) for word/token tagging

How to do it...

How it works…

SimpleCrfFeatureExtractor

There's more…

Modifying CRFs

How to do it...

How it works…

Candidate-edge features

Node features

There's more…

5. Finding Spans in Text – Chunking

Introduction

Sentence detection

How to do it...

How it works...

There's more...

Nested sentences

Evaluation of sentence detection

How to do it...

How it works...

Parsing annotated data

Tuning sentence detection

How to do it...

There's more...

Marking embedded chunks in a string – sentence chunk example

How to do it...

Paragraph detection

How to do it...

Simple noun phrases and verb phrases

How to do it…

How it works…

Regular expression-based chunking for NER

How to do it…

How it works…

See also

Dictionary-based chunking for NER

How to do it…

How it works…

Translating between word tagging and chunks – BIO codec

Getting ready

How to do it…

How it works…

There's more…

HMM-based NER

Getting ready

How to do it…

How it works…

There's more…

See also

Mixing the NER sources

How to do it…

How it works…

CRFs for chunking

Getting ready

How to do it...

How it works…

NER using CRFs with better features

How to do it…

How it works…

6. String Comparison and Clustering

Introduction

Distance and proximity – simple edit distance

How to do it...

How it works...

See also

Weighted edit distance

How to do it...

How it works...

See also

The Jaccard distance

How to do it...

How it works...

The Tf-Idf distance

How to do it...

How it works...

There's more...

Difference between supervised and unsupervised trainings

Training on test data is OK

Using edit distance and language models for spelling correction

How to do it...

How it works...

See also

The case restoring corrector

How to do it...

How it works...

See also

Automatic phrase completion

How to do it...

How it works...

See also

Single-link and complete-link clustering using edit distance

How to do it…

There's more…

See also…

Latent Dirichlet allocation (LDA) for multitopic clustering

Getting ready

How to do it…

7. Finding Coreference Between Concepts/People

Introduction

Named entity coreference with a document

Getting ready

How to do it…

How it works…

Adding pronouns to coreference

How to do it…

How it works…

See also

Cross-document coreference

How to do it...

How it works…

The batch process life cycle

Setting up the entity universe

ProcessDocuments() and ProcessDocument()

Computing XDoc

The promote() method

The createEntitySpeculative() method

The XDocCoref.addMentionChainToEntity() entity

The XDocCoref.resolveMentionChain() entity

The resolveCandidates() method

The John Smith problem

Getting ready

How to do it...

See also

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部