万本电子书0元读

万本电子书0元读

顶部广告

Python Text Processing with NLTK 2.0 Cookbook Update电子书

售       价:¥

3人正在读 | 0人评论 9.8

作       者:Jacob Perkins

出  版  社:Packt Publishing

出版时间:2014-08-26

字       数:286.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This book is intended for Python programmers interested in learning how to do natural language processing. Maybe you’ve learned the limits of regular expressions the hard way, or you’ve realized that human language cannot be deterministically parsed like a computer language. Perhaps you have more text than you know what to do with, and need automated ways to analyze and structure that text. This Cookbook will show you how to train and use statistical language models to process text in ways that are practically impossible with standard programming tools. A basic knowledge of Python and the basic text processing concepts is expected. Some experience with regular expressions will also be helpful.
目录展开

Python 3 Text Processing with NLTK 3 Cookbook

Table of Contents

Python 3 Text Processing with NLTK 3 Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Tokenizing Text and WordNet Basics

Introduction

Tokenizing text into sentences

Getting ready

How to do it...

How it works...

There's more...

Tokenizing sentences in other languages

See also

Tokenizing sentences into words

How to do it...

How it works...

There's more...

Separating contractions

PunktWordTokenizer

WordPunctTokenizer

See also

Tokenizing sentences using regular expressions

Getting ready

How to do it...

How it works...

There's more...

Simple whitespace tokenizer

See also

Training a sentence tokenizer

Getting ready

How to do it...

How it works...

There's more...

See also

Filtering stopwords in a tokenized sentence

Getting ready

How to do it...

How it works...

There's more...

See also

Looking up Synsets for a word in WordNet

Getting ready

How to do it...

How it works...

There's more...

Working with hypernyms

Part of speech (POS)

See also

Looking up lemmas and synonyms in WordNet

How to do it...

How it works...

There's more...

All possible synonyms

Antonyms

See also

Calculating WordNet Synset similarity

How to do it...

How it works...

There's more...

Comparing verbs

Path and Leacock Chordorow (LCH) similarity

See also

Discovering word collocations

Getting ready

How to do it...

How it works...

There's more...

Scoring functions

Scoring ngrams

See also

2. Replacing and Correcting Words

Introduction

Stemming words

How to do it...

How it works...

There's more...

The LancasterStemmer class

The RegexpStemmer class

The SnowballStemmer class

See also

Lemmatizing words with WordNet

Getting ready

How to do it...

How it works...

There's more...

Combining stemming with lemmatization

See also

Replacing words matching regular expressions

Getting ready

How to do it...

How it works...

There's more...

Replacement before tokenization

See also

Removing repeating characters

Getting ready

How to do it...

How it works...

There's more...

See also

Spelling correction with Enchant

Getting ready

How to do it...

How it works...

There's more...

The en_GB dictionary

Personal word lists

See also

Replacing synonyms

Getting ready

How to do it...

How it works...

There's more...

CSV synonym replacement

YAML synonym replacement

See also

Replacing negations with antonyms

How to do it...

How it works...

There's more...

See also

3. Creating Custom Corpora

Introduction

Setting up a custom corpus

Getting ready

How to do it...

How it works...

There's more...

Loading a YAML file

See also

Creating a wordlist corpus

Getting ready

How to do it...

How it works...

There's more...

Names wordlist corpus

English words corpus

See also

Creating a part-of-speech tagged word corpus

Getting ready

How to do it...

How it works...

There's more...

Customizing the word tokenizer

Customizing the sentence tokenizer

Customizing the paragraph block reader

Customizing the tag separator

Converting tags to a universal tagset

See also

Creating a chunked phrase corpus

Getting ready

How to do it...

How it works...

There's more...

Tree leaves

Treebank chunk corpus

CoNLL2000 corpus

See also

Creating a categorized text corpus

Getting ready

How to do it...

How it works...

There's more...

Category file

Categorized tagged corpus reader

Categorized corpora

See also

Creating a categorized chunk corpus reader

Getting ready

How to do it...

How it works...

There's more...

Categorized CoNLL chunk corpus reader

See also

Lazy corpus loading

How to do it...

How it works...

There's more...

Creating a custom corpus view

How to do it...

How it works...

There's more...

Block reader functions

Pickle corpus view

Concatenated corpus view

See also

Creating a MongoDB-backed corpus reader

Getting ready

How to do it...

How it works...

There's more...

See also

Corpus editing with file locking

Getting ready

How to do it...

How it works...

4. Part-of-speech Tagging

Introduction

Default tagging

Getting ready

How to do it...

How it works...

There's more...

Evaluating accuracy

Tagging sentences

Untagging a tagged sentence

See also

Training a unigram part-of-speech tagger

How to do it...

How it works...

There's more...

Overriding the context model

Minimum frequency cutoff

See also

Combining taggers with backoff tagging

How to do it...

How it works...

There's more...

Saving and loading a trained tagger with pickle

See also

Training and combining ngram taggers

Getting ready

How to do it...

How it works...

There's more...

Quadgram tagger

See also

Creating a model of likely word tags

How to do it...

How it works...

There's more...

See also

Tagging with regular expressions

Getting ready

How to do it...

How it works...

There's more...

See also

Affix tagging

How to do it...

How it works...

There's more...

Working with min_stem_length

See also

Training a Brill tagger

How to do it...

How it works...

There's more...

Tracing

See also

Training the TnT tagger

How to do it...

How it works...

There's more...

Controlling the beam search

Significance of capitalization

See also

Using WordNet for tagging

Getting ready

How to do it...

How it works...

See also

Tagging proper names

How to do it...

How it works...

See also

Classifier-based tagging

How to do it...

How it works...

There's more...

Detecting features with a custom feature detector

Setting a cutoff probability

Using a pre-trained classifier

See also

Training a tagger with NLTK-Trainer

How to do it...

How it works...

There's more...

Saving a pickled tagger

Training on a custom corpus

Training with universal tags

Analyzing a tagger against a tagged corpus

Analyzing a tagged corpus

See also

5. Extracting Chunks

Introduction

Chunking and chinking with regular expressions

Getting ready

How to do it...

How it works...

There's more...

Parsing different chunk types

Parsing alternative patterns

Chunk rule with context

See also

Merging and splitting chunks with regular expressions

How to do it...

How it works...

There's more...

Specifying rule descriptions

See also

Expanding and removing chunks with regular expressions

How to do it...

How it works...

There's more...

See also

Partial parsing with regular expressions

How to do it...

How it works...

There's more...

The ChunkScore metrics

Looping and tracing chunk rules

See also

Training a tagger-based chunker

How to do it...

How it works...

There's more...

Using different taggers

See also

Classification-based chunking

How to do it...

How it works...

There's more...

Using a different classifier builder

See also

Extracting named entities

How to do it...

How it works...

There's more...

Binary named entity extraction

See also

Extracting proper noun chunks

How to do it...

How it works...

There's more...

See also

Extracting location chunks

How to do it...

How it works...

There's more...

See also

Training a named entity chunker

How to do it...

How it works...

There's more...

See also

Training a chunker with NLTK-Trainer

How to do it...

How it works...

There's more...

Saving a pickled chunker

Training a named entity chunker

Training on a custom corpus

Training on parse trees

Analyzing a chunker against a chunked corpus

Analyzing a chunked corpus

See also

6. Transforming Chunks and Trees

Introduction

Filtering insignificant words from a sentence

Getting ready

How to do it...

How it works...

There's more...

See also

Correcting verb forms

Getting ready

How to do it...

How it works...

See also

Swapping verb phrases

How to do it...

How it works...

There's more...

See also

Swapping noun cardinals

How to do it...

How it works...

See also

Swapping infinitive phrases

How to do it...

How it works...

There's more...

See also

Singularizing plural nouns

How to do it...

How it works...

See also

Chaining chunk transformations

How to do it...

How it works...

There's more...

See also

Converting a chunk tree to text

How to do it...

How it works...

There's more...

See also

Flattening a deep tree

Getting ready

How to do it...

How it works...

There's more...

The cess_esp and cess_cat treebank

See also

Creating a shallow tree

How to do it...

How it works...

See also

Converting tree labels

Getting ready

How to do it...

How it works...

See also

7. Text Classification

Introduction

Bag of words feature extraction

How to do it...

How it works...

There's more...

Filtering stopwords

Including significant bigrams

See also

Training a Naive Bayes classifier

Getting ready

How to do it...

How it works...

There's more...

Classification probability

Most informative features

Training estimator

Manual training

See also

Training a decision tree classifier

How to do it...

How it works...

There's more...

Controlling uncertainty with entropy_cutoff

Controlling tree depth with depth_cutoff

Controlling decisions with support_cutoff

See also

Training a maximum entropy classifier

Getting ready

How to do it...

How it works...

There's more...

Megam algorithm

See also

Training scikit-learn classifiers

Getting ready

How to do it...

How it works...

There's more...

Comparing Naive Bayes algorithms

Training with logistic regression

Training with LinearSVC

See also

Measuring precision and recall of a classifier

How to do it...

How it works...

There's more...

F-measure

See also

Calculating high information words

How to do it...

How it works...

There's more...

The MaxentClassifier class with high information words

The DecisionTreeClassifier class with high information words

The SklearnClassifier class with high information words

See also

Combining classifiers with voting

Getting ready

How to do it...

How it works...

See also

Classifying with multiple binary classifiers

Getting ready

How to do it...

How it works...

There's more...

See also

Training a classifier with NLTK-Trainer

How to do it...

How it works...

There's more...

Saving a pickled classifier

Using different training instances

The most informative features

The Maxent and LogisticRegression classifiers

SVMs

Combining classifiers

High information words and bigrams

Cross-fold validation

Analyzing a classifier

See also

8. Distributed Processing and Handling Large Datasets

Introduction

Distributed tagging with execnet

Getting ready

How to do it...

How it works...

There's more...

Creating multiple channels

Local versus remote gateways

See also

Distributed chunking with execnet

Getting ready

How to do it...

How it works...

There's more...

Python subprocesses

See also

Parallel list processing with execnet

How to do it...

How it works...

There's more...

See also

Storing a frequency distribution in Redis

Getting ready

How to do it...

How it works...

There's more...

See also

Storing a conditional frequency distribution in Redis

Getting ready

How to do it...

How it works...

There's more...

See also

Storing an ordered dictionary in Redis

Getting ready

How to do it...

How it works...

There's more...

See also

Distributed word scoring with Redis and execnet

Getting ready

How to do it...

How it works...

There's more...

See also

9. Parsing Specific Data Types

Introduction

Parsing dates and times with dateutil

Getting ready

How to do it...

How it works...

There's more...

See also

Timezone lookup and conversion

Getting ready

How to do it...

How it works...

There's more...

Local timezone

Custom offsets

See also

Extracting URLs from HTML with lxml

Getting ready

How to do it...

How it works...

There's more...

Extracting links directly

Parsing HTML from URLs or files

Extracting links with XPaths

See also

Cleaning and stripping HTML

Getting ready

How to do it...

How it works...

There's more...

See also

Converting HTML entities with BeautifulSoup

Getting ready

How to do it...

How it works...

There's more...

Extracting URLs with BeautifulSoup

See also

Detecting and converting character encodings

Getting ready

How to do it...

How it works...

There's more...

Converting to ASCII

UnicodeDammit conversion

See also

A. Penn Treebank Part-of-speech Tags

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部