Practical, hands-on solutions in Python to overcome any problem in Machine Learning About This Book ? Master the advanced concepts, methodologies, and use cases of machine learning ? Build ML applications for analytics, NLP and computer vision domains ? Solve the most common problems in building machine learning models Who This Book Is For This book is for the intermediate users such as machine learning engineers, data engineers, data scientists, and more, who want to solve simple to complex machine learning problems in their day-to-day work and build powerful and efficient machine learning models. A basic understanding of the machine learning concepts and some experience with Python programming is all you need to get started with this book. What You Will Learn ? Select the right algorithm to derive the best solution in ML domains ? Perform predictive analysis effciently using ML algorithms ? Predict stock prices using the stock index value ? Perform customer analytics for an e-commerce platform ? Build recommendation engines for various domains ? Build NLP applications for the health domain ? Build language generation applications using different NLP techniques ? Build computer vision applications such as facial emotion recognition In Detail Machine learning (ML) helps you find hidden insights from your data without the need for explicit programming. This book is your key to solving any kind of ML problem you might come across in your job. You’ll encounter a set of simple to complex problems while building ML models, and you'll not only resolve these problems, but you’ll also learn how to build projects based on each problem, with a practical approach and easy-to-follow examples. The book includes a wide range of applications: from analytics and NLP, to computer vision domains. Some of the applications you will be working on include stock price prediction, a recommendation engine, building a chat-bot, a facial expression recognition system, and many more. The problem examples we cover include identifying the right algorithm for your dataset and use cases, creating and labeling datasets, getting enough clean data to carry out processing, identifying outliers, overftting datasets, hyperparameter tuning, and more. Here, you'll also learn to make more timely and accurate predictions. In addition, you'll deal with more advanced use cases, such as building a gaming bot, building an extractive summarization tool for medical documents, and you'll also tackle the problems faced while building an ML model. By the end of this book, you'll be able to fine-tune your models as per your needs to deliver maximum productivity. Style and approach This book is a step-by-step guide on how to develop machine learning applications for various domains. Each chapter of this book contains the practical guide on how to build specific machine learning applications from its base-line approach to the best possible approach. Basic necessary concepts, conman mistakes for every approach and optimization techniques are discussed for each application.

Table of Contents

Machine Learning Solutions

Machine Learning Solutions


Why subscribe?




About the author

About the reviewer

Packt is Searching for Authors Like You


Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used



Get in touch


Chapter 1. Credit Risk Modeling

Introducing the problem statement

Understanding the dataset

Understanding attributes of the dataset

Data analysis

Data preprocessing

First change

Second change

Implementing the changes

Basic data analysis followed by data preprocessing

Listing statistical properties

Finding missing values

Replacing missing values


Detecting outliers

Outliers detection techniques

Percentile-based outlier detection

Median Absolute Deviation (MAD)-based outlier detection

Standard Deviation (STD)-based outlier detection

Majority-vote-based outlier detection:

Visualization of outliers

Handling outliers

Revolving utilization of unsecured lines


Number of time 30-59 days past due not worse

Debt ratio

Monthly income

Number of open credit lines and loans

Number of times 90 days late

Number of real estate loans or lines

Number of times 60-89 days past due not worse

Number of dependents

Feature engineering for the baseline model

Finding out Feature importance

Selecting machine learning algorithms

K-Nearest Neighbor (KNN)

Logistic regression




Training the baseline model

Understanding the testing matrix

The Mean accuracy of the trained models

The ROC-AUC score



Testing the baseline model

Problems with the existing approach

Optimizing the existing approach

Understanding key concepts to optimize the approach


The approach of using CV

Hyperparameter tuning

Grid search parameter tuning

Random search parameter tuning

Implementing the revised approach

Implementing a cross-validation based approach

Implementing hyperparameter tuning

Implementing and testing the revised approach

Understanding problems with the revised approach

Best approach

Implementing the best approach

Log transformation of features

Voting-based ensemble ML model

Running ML models on real test data


Chapter 2. Stock Market Price Prediction

Introducing the problem statement

Collecting the dataset

Collecting DJIA index prices

Collecting news articles

Understanding the dataset

Understanding the DJIA dataset

Understanding the NYTimes news article dataset

Data preprocessing and data analysis

Preparing the DJIA training dataset

Basic data analysis for a DJIA dataset

Preparing the NYTimes news dataset

Converting publication date into the YYYY-MM-DD format

Filtering news articles by category

Implementing the filter functionality and merging the dataset

Saving the merged dataset in the pickle file format

Feature engineering

Loading the dataset

Minor preprocessing

Converting adj close price into the integer format


Removing the leftmost dot from news headlines


Feature engineering

Sentiment analysis of NYTimes news articles


Selecting the Machine Learning algorithm

Training the baseline model

Splitting the training and testing dataset

Splitting prediction labels for the training and testing datasets

Converting sentiment scores into the numpy array


Training of the ML model

Understanding the testing matrix

The default testing matrix

The visualization approach

Testing the baseline model

Generating and interpreting the output

Generating the accuracy score

Visualizing the output


Exploring problems with the existing approach



Trying a different ML algorithm

Understanding the revised approach

Understanding concepts and approaches

Alignment-based approach

Smoothing-based approach

Logistic Regression-based approach

Implementing the revised approach


Implementing alignment

Implementing smoothing

Implementing logistic regression

Testing the revised approach

Understanding the problem with the revised approach

The best approach


Chapter 3. Customer Analytics

Introducing customer segmentation

Introducing the problem statement

Understanding the datasets


Description of the dataset

Downloading the dataset

Attributes of the dataset

Building the baseline approach

Implementing the baseline approach

Data preparation

Loading the dataset

Exploratory data analysis (EDA)

Removing null data entries

Removing duplicate data entries

EDA for various data attributes


Customer and products

Product categories

Analyzing the product description

Defining product categories

Characterizing the content of clusters

Silhouette intra-cluster score analysis


Analysis using a word cloud

Principal component analysis (PCA)

Generating customer categories

Formatting data

Grouping products

Splitting the dataset

Grouping orders

Creating customer categories

Data encoding

Generating customer categories

PCA analysis

Analyzing the cluster using silhouette scores

Classifying customers

Defining helper functions

Splitting the data into training and testing

Implementing the Machine Learning (ML) algorithm

Understanding the testing matrix

Confusion matrix

Learning curve

Testing the result of the baseline approach

Generating the accuracy score for classifier

Generating the confusion matrix for the classifier

Generating the learning curve for the classifier

Problems with the baseline approach

Optimizing the baseline approach

Building the revised approach

Implementing the revised approach

Testing the revised approach

Problems with the revised approach

Understanding how to improve the revised approach

The best approach

Implementing the best approach

Testing the best approach

Transforming the hold-out corpus in the form of the training dataset

Converting the transformed dataset into a matrix form

Generating the predictions

Customer segmentation for various domains


Chapter 4. Recommendation Systems for E-Commerce

Introducing the problem statement

Understanding the datasets

e-commerce Item Data

The Book-Crossing dataset




Building the baseline approach

Understanding the basic concepts

Understanding the content-based approach

Implementing the baseline approach

Architecture of the recommendation system

Steps for implementing the baseline approach

Loading the dataset

Generating features using TF-IDF

Building the cosine similarity matrix

Generating the prediction

Understanding the testing matrix

Testing the result of the baseline approach

Problems with the baseline approach

Optimizing the baseline approach

Building the revised approach

Implementing the revised approach

Loading dataset

EDA of the book-rating datafile

Exploring the book datafile

EDA of the user datafile

Implementing the logic of correlation for the recommendation engine

Recommendations based on the rating of the books

Recommendations based on correlations


Testing the revised approach

Problems with the revised approach

Understanding how to improve the revised approach

The best approach

Understanding the key concepts

Collaborative filtering

Memory-based CF

User-user collaborative filtering

Item-item collaborative filtering

Model-based CF

Matrix-factorization-based algorithms

Difference between memory-based CF and model-based CF

Implementing the best approach

Loading the dataset

Merging the data frames

EDA for the merged data frames

Filtering data based on geolocation

Applying the KNN algorithm

Recommendation using the KNN algorithm

Applying matrix factorization

Recommendation using matrix factorization


Chapter 5. Sentiment Analysis

Introducing problem statements

Understanding the dataset

Understanding the content of the dataset

Train folder

Test folder

imdb.vocab file

imdbEr.txt file


Understanding the contents of the movie review files

Building the training and testing datasets for the baseline model

Feature engineering for the baseline model


Selecting the machine learning algorithm

Training the baseline model

Implementing the baseline model

Multinomial naive Bayes

C-support vector classification with kernel rbf

C-support vector classification with kernel linear

Linear support vector classification

Understanding the testing matrix





Training accuracy

Testing the baseline model

Testing of Multinomial naive Bayes

Testing of SVM with rbf kernel

Testing SVM with the linear kernel

Testing SVM with linearSVC

Problem with the existing approach

How to optimize the existing approach

Understanding key concepts for optimizing the approach

Implementing the revised approach

Importing the dependencies

Downloading and loading the IMDb dataset

Choosing the top words and the maximum text length

Implementing word embedding

Building a convolutional neural net (CNN)

Training and obtaining the accuracy

Testing the revised approach

Understanding problems with the revised approach

The best approach

Implementing the best approach

Loading the glove model

Loading the dataset


Loading precomputed ID matrix

Splitting the train and test datasets

Building a neural network

Training the neural network

Loading the trained model

Testing the trained model


Chapter 6. Job Recommendation Engine

Introducing the problem statement

Understanding the datasets

Scraped dataset

Job recommendation challenge dataset





Building the baseline approach

Implementing the baseline approach

Defining constants

Loading the dataset

Defining the helper function

Generating TF-IDF vectors and cosine similarity

Building the training dataset

Generating IF-IDF vectors for the training dataset

Building the testing dataset

Generating the similarity score

Understanding the testing matrix

Problems with the baseline approach

Optimizing the baseline approach

Building the revised approach

Loading the dataset

Splitting the training and testing datasets

Exploratory Data Analysis

Building the recommendation engine using the jobs datafile

Testing the revised approach

Problems with the revised approach

Understanding how to improve the revised approach

The best approach

Implementing the best approach

Filtering the dataset

Preparing the training dataset

Applying the concatenation operation

Generating the TF-IDF and cosine similarity score

Generating recommendations


Chapter 7. Text Summarization

Understanding the basics of summarization

Extractive summarization

Abstractive summarization

Introducing the problem statement

Understanding datasets

Challenges in obtaining the dataset

Understanding the medical transcription dataset

Understanding Amazon's review dataset

Building the baseline approach

Implementing the baseline approach

Installing python dependencies


Writing the code and generating the summary

Problems with the baseline approach

Optimizing the baseline approach

Building the revised approach

Implementing the revised approach

The get_summarized function

The reorder_sentences function

The summarize function

Generating the summary

Problems with the revised approach

Understanding how to improve the revised approach

The LSA algorithm


The idea behind the best approach

The best approach

Implementing the best approach

Understanding the structure of the project

Understanding helper functions



Generating the summary

Building the summarization application using Amazon reviews

Loading the dataset

Exploring the dataset

Preparing the dataset

Building the DL model

Training the DL model

Testing the DL model


Chapter 8. Developing Chatbots

Introducing the problem statement

Retrieval-based approach

Generative-based approach

Open domain

Closed domain

Short conversation

Long conversation

Open domain and generative-based approach

Open domain and retrieval-based approach

Closed domain and retrieval-based approach

Closed domain and generative-based approach

Understanding datasets

Cornell Movie-Dialogs dataset

Content details of movie_conversations.txt

Content details of movie_lines.txt

The bAbI dataset

The (20) QA bAbI tasks

Building the basic version of a chatbot

Why does the rule-based system work?

Understanding the rule-based system

Understanding the approach

Listing down possible questions and answers

Deciding standard messages

Understanding the architecture

Implementing the rule-based chatbot

Implementing the conversation flow

Implementing RESTful APIs using flask

Testing the rule-based chatbot

Advantages of the rule-based chatbot

Problems with the existing approach

Understanding key concepts for optimizing the approach

Understanding the seq2seq model

Implementing the revised approach

Data preparation

Generating question-answer pairs

Preprocessing the dataset

Splitting the dataset into the training dataset and the testing dataset

Building a vocabulary for the training and testing datasets

Implementing the seq2seq model

Creating the model

Training the model

Testing the revised approach

Understanding the testing metrics



Testing the revised version of the chatbot

Problems with the revised approach

Understanding key concepts to solve existing problems

Memory networks

Dynamic memory network (DMN)

Input module

Question module

Episodic memory

The best approach

Implementing the best approach


Random testing mode

User interactive testing mode

Discussing the hybrid approach


Chapter 9. Building a Real-Time Object Recognition App

Introducing the problem statement

Understanding the dataset

The COCO dataset

The PASCAL VOC dataset

PASCAL VOC classes

Transfer Learning

What is Transfer Learning?

What is a pre-trained model?

Why should we use a pre-trained model?

How can we use a pre-trained model?

Setting up the coding environment

Setting up and installing OpenCV

Features engineering for the baseline model

Selecting the machine learning algorithm

Architecture of the MobileNet SSD model

Building the baseline model

Understanding the testing metrics

Intersection over Union (IoU)

mean Average Precision

Testing the baseline model

Problem with existing approach

How to optimize the existing approach

Understanding the process for optimization

Implementing the revised approach

Testing the revised approach

Understanding problems with the revised approach

The best approach

Understanding YOLO

The working of YOLO

The architecture of YOLO

Implementing the best approach using YOLO

Implementation using Darknet

Environment setup for Darknet

Compiling the Darknet

Downloading the pre-trained weight

Running object detection for the image

Running the object detection on the video stream

Implementation using Darkflow

Installing Cython

Building the already provided setup file

Testing the environment

Loading the model and running object detection on images

Loading the model and running object detection on the video stream


Chapter 10. Face Recognition and Face Emotion Recognition

Introducing the problem statement

Face recognition application

Face emotion recognition application

Setting up the coding environment

Installing dlib

Installing face_recognition

Understanding the concepts of face recognition

Understanding the face recognition dataset

CAS-PEAL Face Dataset

Labeled Faces in the Wild


Algorithms for face recognition

Histogram of Oriented Gradients (HOG)

Convolutional Neural Network (CNN) for FR

Simple CNN architecture

Understanding how CNN works for FR

Approaches for implementing face recognition

Implementing the HOG-based approach

Implementing the CNN-based approach

Implementing real-time face recognition

Understanding the dataset for face emotion recognition

Understanding the concepts of face emotion recognition

Understanding the convolutional layer

Understanding the ReLU layer

Understanding the pooling layer

Understanding the fully connected layer

Understanding the SoftMax layer

Updating the weight based on backpropagation

Building the face emotion recognition model

Preparing the data

Loading the data

Training the model

Loading the data using the dataset_loader script

Building the Convolutional Neural Network

Training for the FER application

Predicting and saving the trained model

Understanding the testing matrix

Testing the model

Problems with the existing approach

How to optimize the existing approach

Understanding the process for optimization

The best approach

Implementing the best approach


Chapter 11. Building Gaming Bot

Introducing the problem statement

Setting up the coding environment

Understanding Reinforcement Learning (RL)

Markov Decision Process (MDP)

Discounted Future Reward

Basic Atari gaming bot

Understanding the key concepts

Rules for the game

Understanding the Q-Learning algorithm

Implementing the basic version of the gaming bot

Building the Space Invaders gaming bot

Understanding the key concepts

Understanding a deep Q-network (DQN)

Architecture of DQN

Steps for the DQN algorithm

Understanding Experience Replay

Implementing the Space Invaders gaming bot

Building the Pong gaming bot

Understanding the key concepts

Architecture of the gaming bot

Approach for the gaming bot

Implementing the Pong gaming bot

Initialization of the parameters

Weights stored in the form of matrices

Updating weights

How to move the agent

Understanding the process using NN

Just for fun - implementing the Flappy Bird gaming bot


Appendix A. List of Cheat Sheets

Cheat sheets


Appendix B. Strategy for Wining Hackathons

Strategy for winning hackathons

Keeping up to date

























