万本电子书0元读

万本电子书0元读

顶部广告

Feature Engineering Made Easy电子书

售       价:¥

6人正在读 | 0人评论 6.2

作       者:Sinan Ozdemir,Divya Susarla

出  版  社:Packt Publishing

出版时间:2018-01-22

字       数:37.2万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
A perfect guide to speed up the predicting power of machine learning algorithms About This Book ? Design, discover, and create dynamic, efficient features for your machine learning application ? Understand your data in-depth and derive astonishing data insights with the help of this Guide ? Grasp powerful feature-engineering techniques and build machine learning systems Who This Book Is For If you are a data science professional or a machine learning engineer looking to strengthen your predictive analytics model, then this book is a perfect guide for you. Some basic understanding of the machine learning concepts and Python *ing would be enough to get started with this book. What You Will Learn ? Identify and leverage different feature types ? Clean features in data to improve predictive power ? Understand why and how to perform feature selection, and model error analysis ? Leverage domain knowledge to construct new features ? Deliver features based on mathematical insights ? Use machine-learning algorithms to construct features ? Master feature engineering and optimization ? Harness feature engineering for real world applications through a structured case study In Detail Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization. Style and approach This step-by-step guide with use cases, examples, and illustrations will help you master the concepts of feature engineering. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts later on and will help you implement these techniques in the real world.
目录展开

Title Page

Copyright and Credits

Feature Engineering Made Easy

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the authors

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Introduction to Feature Engineering

Motivating example – AI-powered communications

Why feature engineering matters

What is feature engineering?

Understanding the basics of data and machine learning

Supervised learning

Unsupervised learning

Unsupervised learning example – marketing segments

Evaluation of machine learning algorithms and feature engineering procedures

Example of feature engineering procedures – can anyone really predict the weather?

Steps to evaluate a feature engineering procedure

Evaluating supervised learning algorithms

Evaluating unsupervised learning algorithms

Feature understanding – what’s in my dataset?

Feature improvement – cleaning datasets

Feature selection – say no to bad attributes

Feature construction – can we build it?

Feature transformation – enter math-man

Feature learning – using AI to better our AI

Summary

Feature Understanding – What's in My Dataset?

The structure, or lack thereof, of data

An example of unstructured data – server logs

Quantitative versus qualitative data

Salary ranges by job classification

The four levels of data

The nominal level

Mathematical operations allowed

The ordinal level

Mathematical operations allowed

The interval level

Mathematical operations allowed

Plotting two columns at the interval level

The ratio level

Mathematical operations allowed

Recap of the levels of data

Summary

Feature Improvement - Cleaning Datasets

Identifying missing values in data

The Pima Indian Diabetes Prediction dataset

The exploratory data analysis (EDA)

Dealing with missing values in a dataset

Removing harmful rows of data

Imputing the missing values in data

Imputing values in a machine learning pipeline

Pipelines in machine learning

Standardization and normalization

Z-score standardization

The min-max scaling method

The row normalization method

Putting it all together

Summary

Feature Construction

Examining our dataset

Imputing categorical features

Custom imputers

Custom category imputer

Custom quantitative imputer

Encoding categorical variables

Encoding at the nominal level

Encoding at the ordinal level

Bucketing continuous features into categories

Creating our pipeline

Extending numerical features

Activity recognition from the Single Chest-Mounted Accelerometer dataset

Polynomial features

Parameters

Exploratory data analysis

Text-specific feature construction

Bag of words representation

CountVectorizer

CountVectorizer parameters

The Tf-idf vectorizer

Using text in machine learning pipelines

Summary

Feature Selection

Achieving better performance in feature engineering

A case study – a credit card defaulting dataset

Creating a baseline machine learning pipeline

The types of feature selection

Statistical-based feature selection

Using Pearson correlation to select features

Feature selection using hypothesis testing

Interpreting the p-value

Ranking the p-value

Model-based feature selection

A brief refresher on natural language processing

Using machine learning to select features

Tree-based model feature selection metrics

Linear models and regularization

A brief introduction to regularization

Linear model coefficients as another feature importance metric

Choosing the right feature selection method

Summary

Feature Transformations

Dimension reduction – feature transformations versus feature selection versus feature construction

Principal Component Analysis

How PCA works

PCA with the Iris dataset – manual example

Creating the covariance matrix of the dataset

Calculating the eigenvalues of the covariance matrix

Keeping the top k eigenvalues (sorted by the descending eigenvalues)

Using the kept eigenvectors to transform new data-points

Scikit-learn's PCA

How centering and scaling data affects PCA

A deeper look into the principal components

Linear Discriminant Analysis

How LDA works

Calculating the mean vectors of each class

Calculating within-class and between-class scatter matrices

Calculating eigenvalues and eigenvectors for SW-1SB

Keeping the top k eigenvectors by ordering them by descending eigenvalues

Using the top eigenvectors to project onto the new space

How to use LDA in scikit-learn

LDA versus PCA – iris dataset

Summary

Feature Learning

Parametric assumptions of data

Non-parametric fallacy

The algorithms of this chapter

Restricted Boltzmann Machines

Not necessarily dimension reduction

The graph of a Restricted Boltzmann Machine

The restriction of a Boltzmann Machine

Reconstructing the data

MNIST dataset

The BernoulliRBM

Extracting PCA components from MNIST

Extracting RBM components from MNIST

Using RBMs in a machine learning pipeline

Using a linear model on raw pixel values

Using a linear model on extracted PCA components

Using a linear model on extracted RBM components

Learning text features – word vectorizations

Word embeddings

Two approaches to word embeddings - Word2vec and GloVe

Word2Vec - another shallow neural network

The gensim package for creating Word2vec embeddings

Application of word embeddings - information retrieval

Summary

Case Studies

Case study 1 - facial recognition

Applications of facial recognition

The data

Some data exploration

Applied facial recognition

Case study 2 - predicting topics of hotel reviews data

Applications of text clustering

Hotel review data

Exploration of the data

The clustering model

SVD versus PCA components

Latent semantic analysis

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部