万本电子书0元读

万本电子书0元读

顶部广告

Machine Learning Solutions电子书

售       价:¥

7人正在读 | 0人评论 6.2

作       者:Jalaj Thanaki

出  版  社:Packt Publishing

出版时间:2018-04-27

字       数:524.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Practical, hands-on solutions in Python to overcome any problem in Machine Learning About This Book ? Master the advanced concepts, methodologies, and use cases of machine learning ? Build ML applications for analytics, NLP and computer vision domains ? Solve the most common problems in building machine learning models Who This Book Is For This book is for the intermediate users such as machine learning engineers, data engineers, data scientists, and more, who want to solve simple to complex machine learning problems in their day-to-day work and build powerful and efficient machine learning models. A basic understanding of the machine learning concepts and some experience with Python programming is all you need to get started with this book. What You Will Learn ? Select the right algorithm to derive the best solution in ML domains ? Perform predictive analysis effciently using ML algorithms ? Predict stock prices using the stock index value ? Perform customer analytics for an e-commerce platform ? Build recommendation engines for various domains ? Build NLP applications for the health domain ? Build language generation applications using different NLP techniques ? Build computer vision applications such as facial emotion recognition In Detail Machine learning (ML) helps you find hidden insights from your data without the need for explicit programming. This book is your key to solving any kind of ML problem you might come across in your job. You’ll encounter a set of simple to complex problems while building ML models, and you'll not only resolve these problems, but you’ll also learn how to build projects based on each problem, with a practical approach and easy-to-follow examples. The book includes a wide range of applications: from analytics and NLP, to computer vision domains. Some of the applications you will be working on include stock price prediction, a recommendation engine, building a chat-bot, a facial expression recognition system, and many more. The problem examples we cover include identifying the right algorithm for your dataset and use cases, creating and labeling datasets, getting enough clean data to carry out processing, identifying outliers, overftting datasets, hyperparameter tuning, and more. Here, you'll also learn to make more timely and accurate predictions. In addition, you'll deal with more advanced use cases, such as building a gaming bot, building an extractive summarization tool for medical documents, and you'll also tackle the problems faced while building an ML model. By the end of this book, you'll be able to fine-tune your models as per your needs to deliver maximum productivity. Style and approach This book is a step-by-step guide on how to develop machine learning applications for various domains. Each chapter of this book contains the practical guide on how to build specific machine learning applications from its base-line approach to the best possible approach. Basic necessary concepts, conman mistakes for every approach and optimization techniques are discussed for each application.
目录展开

Table of Contents

Machine Learning Solutions

Machine Learning Solutions

Mapt

Why subscribe?

PacktPub.com

Foreword

Contributors

About the author

About the reviewer

Packt is Searching for Authors Like You

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Note

Tip

Get in touch

Reviews

Chapter 1. Credit Risk Modeling

Introducing the problem statement

Understanding the dataset

Understanding attributes of the dataset

Data analysis

Data preprocessing

First change

Second change

Implementing the changes

Basic data analysis followed by data preprocessing

Listing statistical properties

Finding missing values

Replacing missing values

Correlation

Detecting outliers

Outliers detection techniques

Percentile-based outlier detection

Median Absolute Deviation (MAD)-based outlier detection

Standard Deviation (STD)-based outlier detection

Majority-vote-based outlier detection:

Visualization of outliers

Handling outliers

Revolving utilization of unsecured lines

Age

Number of time 30-59 days past due not worse

Debt ratio

Monthly income

Number of open credit lines and loans

Number of times 90 days late

Number of real estate loans or lines

Number of times 60-89 days past due not worse

Number of dependents

Feature engineering for the baseline model

Finding out Feature importance

Selecting machine learning algorithms

K-Nearest Neighbor (KNN)

Logistic regression

AdaBoost

GradientBoosting

RandomForest

Training the baseline model

Understanding the testing matrix

The Mean accuracy of the trained models

The ROC-AUC score

ROC

AUC

Testing the baseline model

Problems with the existing approach

Optimizing the existing approach

Understanding key concepts to optimize the approach

Cross-validation

The approach of using CV

Hyperparameter tuning

Grid search parameter tuning

Random search parameter tuning

Implementing the revised approach

Implementing a cross-validation based approach

Implementing hyperparameter tuning

Implementing and testing the revised approach

Understanding problems with the revised approach

Best approach

Implementing the best approach

Log transformation of features

Voting-based ensemble ML model

Running ML models on real test data

Summary

Chapter 2. Stock Market Price Prediction

Introducing the problem statement

Collecting the dataset

Collecting DJIA index prices

Collecting news articles

Understanding the dataset

Understanding the DJIA dataset

Understanding the NYTimes news article dataset

Data preprocessing and data analysis

Preparing the DJIA training dataset

Basic data analysis for a DJIA dataset

Preparing the NYTimes news dataset

Converting publication date into the YYYY-MM-DD format

Filtering news articles by category

Implementing the filter functionality and merging the dataset

Saving the merged dataset in the pickle file format

Feature engineering

Loading the dataset

Minor preprocessing

Converting adj close price into the integer format

Tip

Removing the leftmost dot from news headlines

Tip

Feature engineering

Sentiment analysis of NYTimes news articles

Note

Selecting the Machine Learning algorithm

Training the baseline model

Splitting the training and testing dataset

Splitting prediction labels for the training and testing datasets

Converting sentiment scores into the numpy array

Note

Training of the ML model

Understanding the testing matrix

The default testing matrix

The visualization approach

Testing the baseline model

Generating and interpreting the output

Generating the accuracy score

Visualizing the output

Note

Exploring problems with the existing approach

Alignment

Smoothing

Trying a different ML algorithm

Understanding the revised approach

Understanding concepts and approaches

Alignment-based approach

Smoothing-based approach

Logistic Regression-based approach

Implementing the revised approach

Implementation

Implementing alignment

Implementing smoothing

Implementing logistic regression

Testing the revised approach

Understanding the problem with the revised approach

The best approach

Summary

Chapter 3. Customer Analytics

Introducing customer segmentation

Introducing the problem statement

Understanding the datasets

Note

Description of the dataset

Downloading the dataset

Attributes of the dataset

Building the baseline approach

Implementing the baseline approach

Data preparation

Loading the dataset

Exploratory data analysis (EDA)

Removing null data entries

Removing duplicate data entries

EDA for various data attributes

Country

Customer and products

Product categories

Analyzing the product description

Defining product categories

Characterizing the content of clusters

Silhouette intra-cluster score analysis

Note

Analysis using a word cloud

Principal component analysis (PCA)

Generating customer categories

Formatting data

Grouping products

Splitting the dataset

Grouping orders

Creating customer categories

Data encoding

Generating customer categories

PCA analysis

Analyzing the cluster using silhouette scores

Classifying customers

Defining helper functions

Splitting the data into training and testing

Implementing the Machine Learning (ML) algorithm

Understanding the testing matrix

Confusion matrix

Learning curve

Testing the result of the baseline approach

Generating the accuracy score for classifier

Generating the confusion matrix for the classifier

Generating the learning curve for the classifier

Problems with the baseline approach

Optimizing the baseline approach

Building the revised approach

Implementing the revised approach

Testing the revised approach

Problems with the revised approach

Understanding how to improve the revised approach

The best approach

Implementing the best approach

Testing the best approach

Transforming the hold-out corpus in the form of the training dataset

Converting the transformed dataset into a matrix form

Generating the predictions

Customer segmentation for various domains

Summary

Chapter 4. Recommendation Systems for E-Commerce

Introducing the problem statement

Understanding the datasets

e-commerce Item Data

The Book-Crossing dataset

BX-Book-Ratings.csv

BX-Books.csv

BX-Users.csv

Building the baseline approach

Understanding the basic concepts

Understanding the content-based approach

Implementing the baseline approach

Architecture of the recommendation system

Steps for implementing the baseline approach

Loading the dataset

Generating features using TF-IDF

Building the cosine similarity matrix

Generating the prediction

Understanding the testing matrix

Testing the result of the baseline approach

Problems with the baseline approach

Optimizing the baseline approach

Building the revised approach

Implementing the revised approach

Loading dataset

EDA of the book-rating datafile

Exploring the book datafile

EDA of the user datafile

Implementing the logic of correlation for the recommendation engine

Recommendations based on the rating of the books

Recommendations based on correlations

Note

Testing the revised approach

Problems with the revised approach

Understanding how to improve the revised approach

The best approach

Understanding the key concepts

Collaborative filtering

Memory-based CF

User-user collaborative filtering

Item-item collaborative filtering

Model-based CF

Matrix-factorization-based algorithms

Difference between memory-based CF and model-based CF

Implementing the best approach

Loading the dataset

Merging the data frames

EDA for the merged data frames

Filtering data based on geolocation

Applying the KNN algorithm

Recommendation using the KNN algorithm

Applying matrix factorization

Recommendation using matrix factorization

Summary

Chapter 5. Sentiment Analysis

Introducing problem statements

Understanding the dataset

Understanding the content of the dataset

Train folder

Test folder

imdb.vocab file

imdbEr.txt file

README

Understanding the contents of the movie review files

Building the training and testing datasets for the baseline model

Feature engineering for the baseline model

Note

Selecting the machine learning algorithm

Training the baseline model

Implementing the baseline model

Multinomial naive Bayes

C-support vector classification with kernel rbf

C-support vector classification with kernel linear

Linear support vector classification

Understanding the testing matrix

Precision

Recall

F1-Score

Support

Training accuracy

Testing the baseline model

Testing of Multinomial naive Bayes

Testing of SVM with rbf kernel

Testing SVM with the linear kernel

Testing SVM with linearSVC

Problem with the existing approach

How to optimize the existing approach

Understanding key concepts for optimizing the approach

Implementing the revised approach

Importing the dependencies

Downloading and loading the IMDb dataset

Choosing the top words and the maximum text length

Implementing word embedding

Building a convolutional neural net (CNN)

Training and obtaining the accuracy

Testing the revised approach

Understanding problems with the revised approach

The best approach

Implementing the best approach

Loading the glove model

Loading the dataset

Preprocessing

Loading precomputed ID matrix

Splitting the train and test datasets

Building a neural network

Training the neural network

Loading the trained model

Testing the trained model

Summary

Chapter 6. Job Recommendation Engine

Introducing the problem statement

Understanding the datasets

Scraped dataset

Job recommendation challenge dataset

apps.tsv

users.tsv

Jobs.zip

user_history.tsv

Building the baseline approach

Implementing the baseline approach

Defining constants

Loading the dataset

Defining the helper function

Generating TF-IDF vectors and cosine similarity

Building the training dataset

Generating IF-IDF vectors for the training dataset

Building the testing dataset

Generating the similarity score

Understanding the testing matrix

Problems with the baseline approach

Optimizing the baseline approach

Building the revised approach

Loading the dataset

Splitting the training and testing datasets

Exploratory Data Analysis

Building the recommendation engine using the jobs datafile

Testing the revised approach

Problems with the revised approach

Understanding how to improve the revised approach

The best approach

Implementing the best approach

Filtering the dataset

Preparing the training dataset

Applying the concatenation operation

Generating the TF-IDF and cosine similarity score

Generating recommendations

Summary

Chapter 7. Text Summarization

Understanding the basics of summarization

Extractive summarization

Abstractive summarization

Introducing the problem statement

Understanding datasets

Challenges in obtaining the dataset

Understanding the medical transcription dataset

Understanding Amazon's review dataset

Building the baseline approach

Implementing the baseline approach

Installing python dependencies

Note

Writing the code and generating the summary

Problems with the baseline approach

Optimizing the baseline approach

Building the revised approach

Implementing the revised approach

The get_summarized function

The reorder_sentences function

The summarize function

Generating the summary

Problems with the revised approach

Understanding how to improve the revised approach

The LSA algorithm

Note

The idea behind the best approach

The best approach

Implementing the best approach

Understanding the structure of the project

Understanding helper functions

Normalization.py

Utils.py

Generating the summary

Building the summarization application using Amazon reviews

Loading the dataset

Exploring the dataset

Preparing the dataset

Building the DL model

Training the DL model

Testing the DL model

Summary

Chapter 8. Developing Chatbots

Introducing the problem statement

Retrieval-based approach

Generative-based approach

Open domain

Closed domain

Short conversation

Long conversation

Open domain and generative-based approach

Open domain and retrieval-based approach

Closed domain and retrieval-based approach

Closed domain and generative-based approach

Understanding datasets

Cornell Movie-Dialogs dataset

Content details of movie_conversations.txt

Content details of movie_lines.txt

The bAbI dataset

The (20) QA bAbI tasks

Building the basic version of a chatbot

Why does the rule-based system work?

Understanding the rule-based system

Understanding the approach

Listing down possible questions and answers

Deciding standard messages

Understanding the architecture

Implementing the rule-based chatbot

Implementing the conversation flow

Implementing RESTful APIs using flask

Testing the rule-based chatbot

Advantages of the rule-based chatbot

Problems with the existing approach

Understanding key concepts for optimizing the approach

Understanding the seq2seq model

Implementing the revised approach

Data preparation

Generating question-answer pairs

Preprocessing the dataset

Splitting the dataset into the training dataset and the testing dataset

Building a vocabulary for the training and testing datasets

Implementing the seq2seq model

Creating the model

Training the model

Testing the revised approach

Understanding the testing metrics

Perplexity

Loss

Testing the revised version of the chatbot

Problems with the revised approach

Understanding key concepts to solve existing problems

Memory networks

Dynamic memory network (DMN)

Input module

Question module

Episodic memory

The best approach

Implementing the best approach

Note

Random testing mode

User interactive testing mode

Discussing the hybrid approach

Summary

Chapter 9. Building a Real-Time Object Recognition App

Introducing the problem statement

Understanding the dataset

The COCO dataset

The PASCAL VOC dataset

PASCAL VOC classes

Transfer Learning

What is Transfer Learning?

What is a pre-trained model?

Why should we use a pre-trained model?

How can we use a pre-trained model?

Setting up the coding environment

Setting up and installing OpenCV

Features engineering for the baseline model

Selecting the machine learning algorithm

Architecture of the MobileNet SSD model

Building the baseline model

Understanding the testing metrics

Intersection over Union (IoU)

mean Average Precision

Testing the baseline model

Problem with existing approach

How to optimize the existing approach

Understanding the process for optimization

Implementing the revised approach

Testing the revised approach

Understanding problems with the revised approach

The best approach

Understanding YOLO

The working of YOLO

The architecture of YOLO

Implementing the best approach using YOLO

Implementation using Darknet

Environment setup for Darknet

Compiling the Darknet

Downloading the pre-trained weight

Running object detection for the image

Running the object detection on the video stream

Implementation using Darkflow

Installing Cython

Building the already provided setup file

Testing the environment

Loading the model and running object detection on images

Loading the model and running object detection on the video stream

Summary

Chapter 10. Face Recognition and Face Emotion Recognition

Introducing the problem statement

Face recognition application

Face emotion recognition application

Setting up the coding environment

Installing dlib

Installing face_recognition

Understanding the concepts of face recognition

Understanding the face recognition dataset

CAS-PEAL Face Dataset

Labeled Faces in the Wild

Note

Algorithms for face recognition

Histogram of Oriented Gradients (HOG)

Convolutional Neural Network (CNN) for FR

Simple CNN architecture

Understanding how CNN works for FR

Approaches for implementing face recognition

Implementing the HOG-based approach

Implementing the CNN-based approach

Implementing real-time face recognition

Understanding the dataset for face emotion recognition

Understanding the concepts of face emotion recognition

Understanding the convolutional layer

Understanding the ReLU layer

Understanding the pooling layer

Understanding the fully connected layer

Understanding the SoftMax layer

Updating the weight based on backpropagation

Building the face emotion recognition model

Preparing the data

Loading the data

Training the model

Loading the data using the dataset_loader script

Building the Convolutional Neural Network

Training for the FER application

Predicting and saving the trained model

Understanding the testing matrix

Testing the model

Problems with the existing approach

How to optimize the existing approach

Understanding the process for optimization

The best approach

Implementing the best approach

Summary

Chapter 11. Building Gaming Bot

Introducing the problem statement

Setting up the coding environment

Understanding Reinforcement Learning (RL)

Markov Decision Process (MDP)

Discounted Future Reward

Basic Atari gaming bot

Understanding the key concepts

Rules for the game

Understanding the Q-Learning algorithm

Implementing the basic version of the gaming bot

Building the Space Invaders gaming bot

Understanding the key concepts

Understanding a deep Q-network (DQN)

Architecture of DQN

Steps for the DQN algorithm

Understanding Experience Replay

Implementing the Space Invaders gaming bot

Building the Pong gaming bot

Understanding the key concepts

Architecture of the gaming bot

Approach for the gaming bot

Implementing the Pong gaming bot

Initialization of the parameters

Weights stored in the form of matrices

Updating weights

How to move the agent

Understanding the process using NN

Just for fun - implementing the Flappy Bird gaming bot

Summary

Appendix A. List of Cheat Sheets

Cheat sheets

Summary

Appendix B. Strategy for Wining Hackathons

Strategy for winning hackathons

Keeping up to date

Summary

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

R

S

T

U

V

Y

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部