万本电子书0元读

万本电子书0元读

顶部广告

Natural Language Processing Fundamentals电子书

售       价:¥

1人正在读 | 0人评论 9.8

作       者:Sohom Ghosh

出  版  社:Packt Publishing

出版时间:2019-03-30

字       数:945.9万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Use Python and NLTK (Natural Language Toolkit) to build out your own text classifiers and solve common NLP problems. Key Features * Assimilate key NLP concepts and terminologies * Explore popular NLP tools and techniques * Gain practical experience using NLP in application code Book Description If NLP hasn't been your forte, Natural Language Processing Fundamentals will make sure you set off to a steady start. This comprehensive guide will show you how to effectively use Python libraries and NLP concepts to solve various problems. You'll be introduced to natural language processing and its applications through examples and exercises. This will be followed by an introduction to the initial stages of solving a problem, which includes problem definition, getting text data, and preparing it for modeling. With exposure to concepts like advanced natural language processing algorithms and visualization techniques, you'll learn how to create applications that can extract information from unstructured data and present it as impactful visuals. Although you will continue to learn NLP-based techniques, the focus will gradually shift to developing useful applications. In these sections, you'll understand how to apply NLP techniques to answer questions as can be used in chatbots. By the end of this book, you'll be able to accomplish a varied range of assignments ranging from identifying the most suitable type of NLP task for solving a problem to using a tool like spacy or gensim for performing sentiment analysis. The book will easily equip you with the knowledge you need to build applications that interpret human language. What you will learn * Obtain, verify, and clean data before transforming it into a correct format for use * Perform data analysis and machine learning tasks using Python * Understand the basics of computational linguistics * Build models for general natural language processing tasks * Evaluate the performance of a model with the right metrics * Visualize, quantify, and perform exploratory analysis from any text data Who this book is for Natural Language Processing Fundamentals is designed for novice and mid-level data scientists and machine learning developers who want to gather and analyze text data to build an NLP-powered product. It'll help you to have prior experience of coding in Python using data types, writing functions, and importing libraries. Some experience with linguistics and probability is useful but not necessary.
目录展开

About the Book

About the Authors

Learning Objectives

Audience

Approach

Hardware Requirements

Software Requirements

Conventions

Installation and Setup

Working with the Jupyter Notebook

Importing Python Libraries

Installing the Code Bundle

Additional Resources

Chapter 1

Introduction to Natural Language Processing

Introduction

History of NLP

Text Analytics and NLP

Exercise 1: Basic Text Analytics

Various Steps in NLP

Tokenization

Exercise 2: Tokenization of a Simple Sentence

PoS Tagging

Exercise 3: PoS Tagging

Stop Word Removal

Exercise 4: Stop Word Removal

Text Normalization

Exercise 5: Text Normalization

Spelling Correction

Exercise 6: Spelling Correction of a Word and a Sentence

Stemming

Exercise 7: Stemming

Lemmatization

Exercise 8: Extracting the base word using Lemmatization

NER

Exercise 9: Treating Named Entities

Word Sense Disambiguation

Exercise 10: Word Sense Disambiguation

Sentence Boundary Detection

Exercise 11: Sentence Boundary Detection

Activity 1: Preprocessing of Raw Text

Kick Starting an NLP Project

Data Collection

Data Preprocessing

Feature Extraction

Model Development

Model Assessment

Model Deployment

Summary

Chapter 2

Basic Feature Extraction Methods

Introduction

Types of Data

Categorizing Data Based on Structure

Categorization of Data Based on Content

Cleaning Text Data

Tokenization

Exercise 12: Text Cleaning and Tokenization

Exercise 13: Extracting n-grams

Exercise 14: Tokenizing Texts with Different Packages – Keras and TextBlob

Types of Tokenizers

Exercise 15: Tokenizing Text Using Various Tokenizers

Issues with Tokenization

Stemming

RegexpStemmer

Exercise 16: Converting words in gerund form into base words using RegexpStemmer

The Porter Stemmer

Exercise 17: The Porter Stemmer

Lemmatization

Exercise 18: Lemmatization

Exercise 19: Singularizing and Pluralizing Words

Language Translation

Exercise 20: Language Translation

Stop-Word Removal

Exercise 21: Stop-Word Removal

Feature Extraction from Texts

Extracting General Features from Raw Text

Exercise 22: Extracting General Features from Raw Text

Activity 2: Extracting General Features from Text

Bag of Words

Exercise 23: Creating a BoW

Zipf's Law

Exercise 24: Zipf's Law

TF-IDF

Exercise 25: TF-IDF Representation

Activity 3: Extracting Specific Features from Texts

Feature Engineering

Exercise 26: Feature Engineering (Text Similarity)

Word Clouds

Exercise 27: Word Clouds

Other Visualizations

Exercise 28: Other Visualizations (Dependency Parse Trees and Named Entities)

Activity 4: Text Visualization

Summary

Chapter 3

Developing a Text classifier

Introduction

Machine Learning

Unsupervised Learning

Hierarchical Clustering

Exercise 29: Hierarchical Clustering

K-Means Clustering

Exercise 30: K-Means Clustering

Supervised Learning

Classification

Logistic Regression

Naive Bayes Classifiers

K-Nearest Neighbors

Exercise 31: Text Classification (Logistic regression, Naive Bayes, and KNN)

Regression

Linear Regression

Exercise 32: Regression Analysis Using Textual Data

Tree Methods

Random Forest

GBM and XGBoost

Exercise 33: Tree-Based Methods (Decision Tree, Random Forest, GBM, and XGBoost)

Sampling

Exercise 34: Sampling (Simple Random, Stratified, Multi-Stage)

Developing a Text Classifier

Feature Extraction

Feature Engineering

Removing Correlated Features

Exercise 35: Removing Highly Correlated Features (Tokens)

Dimensionality Reduction

Exercise 36: Dimensionality Reduction (PCA)

Deciding on a Model Type

Evaluating the Performance of a Model

Exercise 37: Calculate the RMSE and MAPE

Activity 5: Developing End-to-End Text Classifiers

Building Pipelines for NLP Projects

Exercise 38: Building Pipelines for NLP Projects

Saving and Loading Models

Exercise 39: Saving and Loading Models

Summary

Chapter 4

Collecting Text Data from the Web

Introduction

Collecting Data by Scraping Web Pages

Exercise 40: Extraction of Tag-Based Information from HTML Files

Requesting Content from Web Pages

Exercise 41: Collecting Online Text Data

Exercise 42: Analyzing the Content of Jupyter Notebooks (in HTML Format)

Activity 6: Extracting Information from an Online HTML Page

Activity 7: Extracting and Analyzing Data Using Regular Expressions

Dealing with Semi-Structured Data

JSON

Exercise 43: Dealing with JSON Files

Activity 8: Dealing with Online JSON Files

XML

Exercise 44: Dealing with a Local XML File

Using APIs to Retrieve Real-Time Data

Exercise 45: Collecting Data Using APIs

API Creation

Activity 9: Extracting Data from Twitter

Extracting Data from Local Files

Exercise 46: Extracting Data from Local Files

Exercise 47: Performing Various Operations on Local Files

Summary

Chapter 5

Topic Modeling

Introduction

Topic Discovery

Discovering Themes

Exploratory Data Analysis

Document Clustering

Dimensionality Reduction

Historical Analysis

Bag of Words

Topic Modeling Algorithms

Latent Semantic Analysis

LSA – How It Works

Exercise 48: Analyzing Reuters News Articles with Latent Semantic Analysis

Latent Dirichlet Allocation

LDA – How It Works

Exercise 49: Topics in Airline Tweets

Topic Fingerprinting

Exercise 50: Visualizing Documents Using Topic Vectors

Activity 10: Topic Modelling Jeopardy Questions

Summary

Chapter 6

Text Summarization and Text Generation

Introduction

What is Automated Text Summarization?

Benefits of Automated Text Summarization

High-Level View of Text Summarization

Purpose

Input

Output

Extractive Text Summarization

Abstractive Text Summarization

Sequence to Sequence

Encoder Decoder

TextRank

Exercise 51: TextRank from Scratch

Summarizing Text Using Gensim

Activity 11: Summarizing a Downloaded Page Using the Gensim Text Summarizer

Summarizing Text Using Word Frequency

Exercise 52: Word Frequency Text Summarization

Generating Text with Markov Chains

Markov Chains

Exercise 53: Generating Text Using Markov Chains

Summary

Chapter 7

Vector Representation

Introduction

Vector Definition

Why Vector Representations?

Encoding

Character-Level Encoding

Exercise 54: Character Encoding Using ASCII Values

Exercise 55: Character Encoding with the Help of NumPy Arrays

Positional Character-Level Encoding

Exercise 56: Character-Level Encoding Using Positions

One-Hot Encoding

Key Steps in One-Hot Encoding

Exercise 57: Character One-Hot Encoding – Manual

Exercise 58: Character-Level One-Hot Encoding with Keras

Word-Level One-Hot Encoding

Exercise 59: Word-Level One-Hot Encoding

Word Embeddings

Word2Vec

Exercise 60: Training Word Vectors

Using Pre-Trained Word Vectors

Exercise 61: Loading Pre-Trained Word Vectors

Document Vectors

Uses of Document Vectors

Exercise 62: From Movie Dialogue to Document Vectors

Activity 12: Finding Similar Movie Lines Using Document Vectors

Summary

Chapter 8

Sentiment Analysis

Introduction

Why is Sentiment Analysis Required?

Growth of Sentiment Analysis

Monetization of Emotion

Types of Sentiments

Key Ideas and Terms

Applications of Sentiment Analysis

Tools Used for Sentiment Analysis

NLP Services from Major Cloud Providers

Online Marketplaces

Python NLP Libraries

Deep Learning Libraries

TextBlob

Exercise 63: Basic Sentiment Analysis Using the TextBlob Library

Activity 13: Tweet Sentiment Analysis Using the TextBlob library

Understanding Data for Sentiment Analysis

Exercise 64: Loading Data for Sentiment Analysis

Training Sentiment Models

Exercise 65: Training a Sentiment Model Using TFIDF and Logistic Regression

Summary

Appendix

Chapter 1: Introduction to Natural Language Processing

Activity 1: Preprocessing of Raw Text

Chapter 2: Basic Feature Extraction Methods

Activity 2: Extracting General Features from Text

Activity 3: Extracting Specific Features from Texts

Activity 4: Text Visualization

Chapter 3: Developing a Text classifier

Activity 5: Developing End-to-End Text Classifiers

Chapter 4: Collecting Text Data from the Web

Activity 6: Extracting Information from an Online HTML Page

Activity 7: Extracting and Analyzing Data Using Regular Expressions

Activity 8: Dealing with Online JSON Files

Activity 9: Extracting Data from Twitter

Chapter 5: Topic Modeling

Activity 10: Topic Modelling Jeopardy Questions

Chapter 6: Text Summarization and Text Generation

Activity 11: Summarizing a Downloaded Page Using the Gensim Text Summarizer

Chapter 7: Vector Representation

Activity 12: Finding Similar Movie Lines Using Document Vectors

Solution

Chapter 8: Sentiment Analysis

Activity 13: Tweet Sentiment Analysis Using the TextBlob library

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部