万本电子书0元读

万本电子书0元读

顶部广告

Natural Language Processing with Java电子书

售       价:¥

7人正在读 | 0人评论 9.8

作       者:Richard M Reese

出  版  社:Packt Publishing

出版时间:2015-03-27

字       数:261.5万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
If you are a Java programmer who wants to learn about the fundamental tasks underlying natural language processing, this book is for you. You will be able to identify and use NLP tasks for many common problems, and integrate them in your applications to solve more difficult problems. Readers should be familiar/experienced with Java software development.
目录展开

Natural Language Processing with Java

Table of Contents

Natural Language Processing with Java

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Introduction to NLP

What is NLP?

Why use NLP?

Why is NLP so hard?

Survey of NLP tools

Apache OpenNLP

Stanford NLP

LingPipe

GATE

UIMA

Overview of text processing tasks

Finding parts of text

Finding sentences

Finding people and things

Detecting Parts of Speech

Classifying text and documents

Extracting relationships

Using combined approaches

Understanding NLP models

Identifying the task

Selecting a model

Building and training the model

Verifying the model

Using the model

Preparing data

Summary

2. Finding Parts of Text

Understanding the parts of text

What is tokenization?

Uses of tokenizers

Simple Java tokenizers

Using the Scanner class

Specifying the delimiter

Using the split method

Using the BreakIterator class

Using the StreamTokenizer class

Using the StringTokenizer class

Performance considerations with java core tokenization

NLP tokenizer APIs

Using the OpenNLPTokenizer class

Using the SimpleTokenizer class

Using the WhitespaceTokenizer class

Using the TokenizerME class

Using the Stanford tokenizer

Using the PTBTokenizer class

Using the DocumentPreprocessor class

Using a pipeline

Using LingPipe tokenizers

Training a tokenizer to find parts of text

Comparing tokenizers

Understanding normalization

Converting to lowercase

Removing stopwords

Creating a StopWords class

Using LingPipe to remove stopwords

Using stemming

Using the Porter Stemmer

Stemming with LingPipe

Using lemmatization

Using the StanfordLemmatizer class

Using lemmatization in OpenNLP

Normalizing using a pipeline

Summary

3. Finding Sentences

The SBD process

What makes SBD difficult?

Understanding SBD rules of LingPipe's HeuristicSentenceModel class

Simple Java SBDs

Using regular expressions

Using the BreakIterator class

Using NLP APIs

Using OpenNLP

Using the SentenceDetectorME class

Using the sentPosDetect method

Using the Stanford API

Using the PTBTokenizer class

Using the DocumentPreprocessor class

Using the StanfordCoreNLP class

Using LingPipe

Using the IndoEuropeanSentenceModel class

Using the SentenceChunker class

Using the MedlineSentenceModel class

Training a Sentence Detector model

Using the Trained model

Evaluating the model using the SentenceDetectorEvaluator class

Summary

4. Finding People and Things

Why NER is difficult?

Techniques for name recognition

Lists and regular expressions

Statistical classifiers

Using regular expressions for NER

Using Java's regular expressions to find entities

Using LingPipe's RegExChunker class

Using NLP APIs

Using OpenNLP for NER

Determining the accuracy of the entity

Using other entity types

Processing multiple entity types

Using the Stanford API for NER

Using LingPipe for NER

Using LingPipe's name entity models

Using the ExactDictionaryChunker class

Training a model

Evaluating a model

Summary

5. Detecting Part of Speech

The tagging process

Importance of POS taggers

What makes POS difficult?

Using the NLP APIs

Using OpenNLP POS taggers

Using the OpenNLP POSTaggerME class for POS taggers

Using OpenNLP chunking

Using the POSDictionary class

Obtaining the tag dictionary for a tagger

Determining a word's tags

Changing a word's tags

Adding a new tag dictionary

Creating a dictionary from a file

Using Stanford POS taggers

Using Stanford MaxentTagger

Using the MaxentTagger class to tag textese

Using Stanford pipeline to perform tagging

Using LingPipe POS taggers

Using the HmmDecoder class with Best_First tags

Using the HmmDecoder class with NBest tags

Determining tag confidence with the HmmDecoder class

Training the OpenNLP POSModel

Summary

6. Classifying Texts and Documents

How classification is used

Understanding sentiment analysis

Text classifying techniques

Using APIs to classify text

Using OpenNLP

Training an OpenNLP classification model

Using DocumentCategorizerME to classify text

Using Stanford API

Using the ColumnDataClassifier class for classification

Using the Stanford pipeline to perform sentiment analysis

Using LingPipe to classify text

Training text using the Classified class

Using other training categories

Classifying text using LingPipe

Sentiment analysis using LingPipe

Language identification using LingPipe

Summary

7. Using Parser to Extract Relationships

Relationship types

Understanding parse trees

Using extracted relationships

Extracting relationships

Using NLP APIs

Using OpenNLP

Using the Stanford API

Using the LexicalizedParser class

Using the TreePrint class

Finding word dependencies using the GrammaticalStructure class

Finding coreference resolution entities

Extracting relationships for a question-answer system

Finding the word dependencies

Determining the question type

Searching for the answer

Summary

8. Combined Approaches

Preparing data

Using Boilerpipe to extract text from HTML

Using POI to extract text from Word documents

Using PDFBox to extract text from PDF documents

Pipelines

Using the Stanford pipeline

Using multiple cores with the Stanford pipeline

Creating a pipeline to search text

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部