售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Python 3 Text Processing with NLTK 3 Cookbook
Table of Contents
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Tokenizing Text and WordNet Basics
Introduction
Tokenizing text into sentences
Getting ready
How to do it...
How it works...
There's more...
Tokenizing sentences in other languages
See also
Tokenizing sentences into words
How to do it...
How it works...
There's more...
Separating contractions
PunktWordTokenizer
WordPunctTokenizer
See also
Tokenizing sentences using regular expressions
Getting ready
How to do it...
How it works...
There's more...
Simple whitespace tokenizer
See also
Training a sentence tokenizer
Getting ready
How to do it...
How it works...
There's more...
See also
Filtering stopwords in a tokenized sentence
Getting ready
How to do it...
How it works...
There's more...
See also
Looking up Synsets for a word in WordNet
Getting ready
How to do it...
How it works...
There's more...
Working with hypernyms
Part of speech (POS)
See also
Looking up lemmas and synonyms in WordNet
How to do it...
How it works...
There's more...
All possible synonyms
Antonyms
See also
Calculating WordNet Synset similarity
How to do it...
How it works...
There's more...
Comparing verbs
Path and Leacock Chordorow (LCH) similarity
See also
Discovering word collocations
Getting ready
How to do it...
How it works...
There's more...
Scoring functions
Scoring ngrams
See also
2. Replacing and Correcting Words
Introduction
Stemming words
How to do it...
How it works...
There's more...
The LancasterStemmer class
The RegexpStemmer class
The SnowballStemmer class
See also
Lemmatizing words with WordNet
Getting ready
How to do it...
How it works...
There's more...
Combining stemming with lemmatization
See also
Replacing words matching regular expressions
Getting ready
How to do it...
How it works...
There's more...
Replacement before tokenization
See also
Removing repeating characters
Getting ready
How to do it...
How it works...
There's more...
See also
Spelling correction with Enchant
Getting ready
How to do it...
How it works...
There's more...
The en_GB dictionary
Personal word lists
See also
Replacing synonyms
Getting ready
How to do it...
How it works...
There's more...
CSV synonym replacement
YAML synonym replacement
See also
Replacing negations with antonyms
How to do it...
How it works...
There's more...
See also
3. Creating Custom Corpora
Introduction
Setting up a custom corpus
Getting ready
How to do it...
How it works...
There's more...
Loading a YAML file
See also
Creating a wordlist corpus
Getting ready
How to do it...
How it works...
There's more...
Names wordlist corpus
English words corpus
See also
Creating a part-of-speech tagged word corpus
Getting ready
How to do it...
How it works...
There's more...
Customizing the word tokenizer
Customizing the sentence tokenizer
Customizing the paragraph block reader
Customizing the tag separator
Converting tags to a universal tagset
See also
Creating a chunked phrase corpus
Getting ready
How to do it...
How it works...
There's more...
Tree leaves
Treebank chunk corpus
CoNLL2000 corpus
See also
Creating a categorized text corpus
Getting ready
How to do it...
How it works...
There's more...
Category file
Categorized tagged corpus reader
Categorized corpora
See also
Creating a categorized chunk corpus reader
Getting ready
How to do it...
How it works...
There's more...
Categorized CoNLL chunk corpus reader
See also
Lazy corpus loading
How to do it...
How it works...
There's more...
Creating a custom corpus view
How to do it...
How it works...
There's more...
Block reader functions
Pickle corpus view
Concatenated corpus view
See also
Creating a MongoDB-backed corpus reader
Getting ready
How to do it...
How it works...
There's more...
See also
Corpus editing with file locking
Getting ready
How to do it...
How it works...
4. Part-of-speech Tagging
Introduction
Default tagging
Getting ready
How to do it...
How it works...
There's more...
Evaluating accuracy
Tagging sentences
Untagging a tagged sentence
See also
Training a unigram part-of-speech tagger
How to do it...
How it works...
There's more...
Overriding the context model
Minimum frequency cutoff
See also
Combining taggers with backoff tagging
How to do it...
How it works...
There's more...
Saving and loading a trained tagger with pickle
See also
Training and combining ngram taggers
Getting ready
How to do it...
How it works...
There's more...
Quadgram tagger
See also
Creating a model of likely word tags
How to do it...
How it works...
There's more...
See also
Tagging with regular expressions
Getting ready
How to do it...
How it works...
There's more...
See also
Affix tagging
How to do it...
How it works...
There's more...
Working with min_stem_length
See also
Training a Brill tagger
How to do it...
How it works...
There's more...
Tracing
See also
Training the TnT tagger
How to do it...
How it works...
There's more...
Controlling the beam search
Significance of capitalization
See also
Using WordNet for tagging
Getting ready
How to do it...
How it works...
See also
Tagging proper names
How to do it...
How it works...
See also
Classifier-based tagging
How to do it...
How it works...
There's more...
Detecting features with a custom feature detector
Setting a cutoff probability
Using a pre-trained classifier
See also
Training a tagger with NLTK-Trainer
How to do it...
How it works...
There's more...
Saving a pickled tagger
Training on a custom corpus
Training with universal tags
Analyzing a tagger against a tagged corpus
Analyzing a tagged corpus
See also
5. Extracting Chunks
Introduction
Chunking and chinking with regular expressions
Getting ready
How to do it...
How it works...
There's more...
Parsing different chunk types
Parsing alternative patterns
Chunk rule with context
See also
Merging and splitting chunks with regular expressions
How to do it...
How it works...
There's more...
Specifying rule descriptions
See also
Expanding and removing chunks with regular expressions
How to do it...
How it works...
There's more...
See also
Partial parsing with regular expressions
How to do it...
How it works...
There's more...
The ChunkScore metrics
Looping and tracing chunk rules
See also
Training a tagger-based chunker
How to do it...
How it works...
There's more...
Using different taggers
See also
Classification-based chunking
How to do it...
How it works...
There's more...
Using a different classifier builder
See also
Extracting named entities
How to do it...
How it works...
There's more...
Binary named entity extraction
See also
Extracting proper noun chunks
How to do it...
How it works...
There's more...
See also
Extracting location chunks
How to do it...
How it works...
There's more...
See also
Training a named entity chunker
How to do it...
How it works...
There's more...
See also
Training a chunker with NLTK-Trainer
How to do it...
How it works...
There's more...
Saving a pickled chunker
Training a named entity chunker
Training on a custom corpus
Training on parse trees
Analyzing a chunker against a chunked corpus
Analyzing a chunked corpus
See also
6. Transforming Chunks and Trees
Introduction
Filtering insignificant words from a sentence
Getting ready
How to do it...
How it works...
There's more...
See also
Correcting verb forms
Getting ready
How to do it...
How it works...
See also
Swapping verb phrases
How to do it...
How it works...
There's more...
See also
Swapping noun cardinals
How to do it...
How it works...
See also
Swapping infinitive phrases
How to do it...
How it works...
There's more...
See also
Singularizing plural nouns
How to do it...
How it works...
See also
Chaining chunk transformations
How to do it...
How it works...
There's more...
See also
Converting a chunk tree to text
How to do it...
How it works...
There's more...
See also
Flattening a deep tree
Getting ready
How to do it...
How it works...
There's more...
The cess_esp and cess_cat treebank
See also
Creating a shallow tree
How to do it...
How it works...
See also
Converting tree labels
Getting ready
How to do it...
How it works...
See also
7. Text Classification
Introduction
Bag of words feature extraction
How to do it...
How it works...
There's more...
Filtering stopwords
Including significant bigrams
See also
Training a Naive Bayes classifier
Getting ready
How to do it...
How it works...
There's more...
Classification probability
Most informative features
Training estimator
Manual training
See also
Training a decision tree classifier
How to do it...
How it works...
There's more...
Controlling uncertainty with entropy_cutoff
Controlling tree depth with depth_cutoff
Controlling decisions with support_cutoff
See also
Training a maximum entropy classifier
Getting ready
How to do it...
How it works...
There's more...
Megam algorithm
See also
Training scikit-learn classifiers
Getting ready
How to do it...
How it works...
There's more...
Comparing Naive Bayes algorithms
Training with logistic regression
Training with LinearSVC
See also
Measuring precision and recall of a classifier
How to do it...
How it works...
There's more...
F-measure
See also
Calculating high information words
How to do it...
How it works...
There's more...
The MaxentClassifier class with high information words
The DecisionTreeClassifier class with high information words
The SklearnClassifier class with high information words
See also
Combining classifiers with voting
Getting ready
How to do it...
How it works...
See also
Classifying with multiple binary classifiers
Getting ready
How to do it...
How it works...
There's more...
See also
Training a classifier with NLTK-Trainer
How to do it...
How it works...
There's more...
Saving a pickled classifier
Using different training instances
The most informative features
The Maxent and LogisticRegression classifiers
SVMs
Combining classifiers
High information words and bigrams
Cross-fold validation
Analyzing a classifier
See also
8. Distributed Processing and Handling Large Datasets
Introduction
Distributed tagging with execnet
Getting ready
How to do it...
How it works...
There's more...
Creating multiple channels
Local versus remote gateways
See also
Distributed chunking with execnet
Getting ready
How to do it...
How it works...
There's more...
Python subprocesses
See also
Parallel list processing with execnet
How to do it...
How it works...
There's more...
See also
Storing a frequency distribution in Redis
Getting ready
How to do it...
How it works...
There's more...
See also
Storing a conditional frequency distribution in Redis
Getting ready
How to do it...
How it works...
There's more...
See also
Storing an ordered dictionary in Redis
Getting ready
How to do it...
How it works...
There's more...
See also
Distributed word scoring with Redis and execnet
Getting ready
How to do it...
How it works...
There's more...
See also
9. Parsing Specific Data Types
Introduction
Parsing dates and times with dateutil
Getting ready
How to do it...
How it works...
There's more...
See also
Timezone lookup and conversion
Getting ready
How to do it...
How it works...
There's more...
Local timezone
Custom offsets
See also
Extracting URLs from HTML with lxml
Getting ready
How to do it...
How it works...
There's more...
Extracting links directly
Parsing HTML from URLs or files
Extracting links with XPaths
See also
Cleaning and stripping HTML
Getting ready
How to do it...
How it works...
There's more...
See also
Converting HTML entities with BeautifulSoup
Getting ready
How to do it...
How it works...
There's more...
Extracting URLs with BeautifulSoup
See also
Detecting and converting character encodings
Getting ready
How to do it...
How it works...
There's more...
Converting to ASCII
UnicodeDammit conversion
See also
A. Penn Treebank Part-of-speech Tags
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜