万本电子书0元读

万本电子书0元读

顶部广告

Applied Unsupervised Learning with R电子书

售       价:¥

0人正在读 | 0人评论 9.8

作       者:Alok Malik

出  版  社:Packt Publishing

出版时间:2019-03-27

字       数:610.3万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Design clever algorithms that discover hidden patterns and draw responses from unstructured, unlabeled data. Key Features * Build state-of-the-art algorithms that can solve your business' problems * Learn how to find hidden patterns in your data * Revise key concepts with hands-on exercises using real-world datasets Book Description Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and features of R that enable you to understand your data better and get answers to your most pressing business questions. This book begins with the most important and commonly used method for unsupervised learning - clustering - and explains the three main clustering algorithms - k-means, divisive, and agglomerative. Following this, you'll study market basket analysis, kernel density estimation, principal component analysis, and anomaly detection. You'll be introduced to these methods using code written in R, with further instructions on how to work with, edit, and improve R code. To help you gain a practical understanding, the book also features useful tips on applying these methods to real business problems, including market segmentation and fraud detection. By working through interesting activities, you'll explore data encoders and latent variable models. By the end of this book, you will have a better understanding of different anomaly detection methods, such as outlier detection, Mahalanobis distances, and contextual and collective anomaly detection. What you will learn * Implement clustering methods such as k-means, agglomerative, and divisive * Write code in R to analyze market segmentation and consumer behavior * Estimate distribution and probabilities of different outcomes * Implement dimension reduction using principal component analysis * Apply anomaly detection methods to identify fraud * Design algorithms with R and learn how to edit or improve code Who this book is for Applied Unsupervised Learning with R is designed for business professionals who want to learn about methods to understand their data better, and developers who have an interest in unsupervised learning. Although the book is for beginners, it will be beneficial to have some basic, beginner-level familiarity with R. This includes an understanding of how to open the R console, how to read data, and how to create a loop. To easily understand the concepts of this book, you should also know basic mathematical concepts, including exponents, square roots, means, and medians.
目录展开

Preface

About the Book

About the Authors

Elevator Pitch

Key Features

Description

Learning Objectives

Audience

Approach

Hardware Requirements

Software Requirements

Conventions

Installation and Setup

Installing R on Windows

Installing R on macOS X

Installing R on Linux

Chapter 1

Introduction to Clustering Methods

Introduction

Introduction to Clustering

Uses of Clustering

Introduction to the Iris Dataset

Exercise 1: Exploring the Iris Dataset

Types of Clustering

Introduction to k-means Clustering

Euclidean Distance

Manhattan Distance

Cosine Distance

The Hamming Distance

k-means Clustering Algorithm

Steps to Implement k-means Clustering

Exercise 2: Implementing k-means Clustering on the Iris Dataset

Activity 1: k-means Clustering with Three Clusters

Introduction to k-means Clustering with Built-In Functions

k-means Clustering with Three Clusters

Exercise 3: k-means Clustering with R Libraries

Introduction to Market Segmentation

Exercise 4: Exploring the Wholesale Customer Dataset

Activity 2: Customer Segmentation with k-means

Introduction to k-medoids Clustering

The k-medoids Clustering Algorithm

k-medoids Clustering Code

Exercise 5: Implementing k-medoid Clustering

k-means Clustering versus k-medoids Clustering

Activity 3: Performing Customer Segmentation with k-medoids Clustering

Deciding the Optimal Number of Clusters

Types of Clustering Metrics

Silhouette Score

Exercise 6: Calculating the Silhouette Score

Exercise 7: Identifying the Optimum Number of Clusters

WSS/Elbow Method

Exercise 8: Using WSS to Determine the Number of Clusters

The Gap Statistic

Exercise 9: Calculating the Ideal Number of Clusters with the Gap Statistic

Activity 4: Finding the Ideal Number of Market Segments

Summary

Chapter 2

Advanced Clustering Methods

Introduction

Introduction to k-modes Clustering

Steps for k-Modes Clustering

Exercise 10: Implementing k-modes Clustering

Activity 5: Implementing k-modes Clustering on the Mushroom Dataset

Introduction to Density-Based Clustering (DBSCAN)

Steps for DBSCAN

Exercise 11: Implementing DBSCAN

Uses of DBSCAN

Activity 6: Implementing DBSCAN and Visualizing the Results

Introduction to Hierarchical Clustering

Types of Similarity Metrics

Steps to Perform Agglomerative Hierarchical Clustering

Exercise 12: Agglomerative Clustering with Different Similarity Measures

Divisive Clustering

Steps to Perform Divisive Clustering

Exercise 13: Performing DIANA Clustering

Activity 7: Performing Hierarchical Cluster Analysis on the Seeds Dataset

Summary

Chapter 3

Probability Distributions

Introduction

Basic Terminology of Probability Distributions

Uniform Distribution

Exercise 14: Generating and Plotting Uniform Samples in R

Normal Distribution

Exercise 15: Generating and Plotting a Normal Distribution in R

Skew and Kurtosis

Log-Normal Distributions

Exercise 16: Generating a Log-Normal Distribution from a Normal Distribution

The Binomial Distribution

Exercise 17: Generating a Binomial Distribution

The Poisson Distribution

The Pareto Distribution

Introduction to Kernel Density Estimation

KDE Algorithm

Exercise 18: Visualizing and Understanding KDE

Exercise 19: Studying the Effect of Changing Kernels on a Distribution

Activity 8: Finding the Standard Distribution Closest to the Distribution of Variables of the Iris Dataset

Introduction to the Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov Test Algorithm

Exercise 20: Performing the Kolmogorov-Smirnov Test on Two Samples

Activity 9: Calculating the CDF and Performing the Kolmogorov-Smirnov Test with the Normal Distribution

Summary

Chapter 4

Dimension Reduction

Introduction

The Idea of Dimension Reduction

Exercise 21: Examining a Dataset that Contains the Chemical Attributes of Different Wines

Importance of Dimension Reduction

Market Basket Analysis

Exercise 22: Data Preparation for the Apriori Algorithm

Exercise 23: Passing through the Data to Find the Most Common Baskets

Exercise 24: More Passes through the Data

Exercise 25: Generating Associative Rules as the Final Step of the Apriori Algorithm

Principal Component Analysis

Linear Algebra Refresher

Matrices

Variance

Covariance

Exercise 26: Examining Variance and Covariance on the Wine Dataset

Eigenvectors and Eigenvalues

The Idea of PCA

Exercise 27: Performing PCA

Exercise 28: Performing Dimension Reduction with PCA

Activity 10: Performing PCA and Market Basket Analysis on a New Dataset

Summary

Chapter 5

Data Comparison Methods

Introduction

Hash Functions

Exercise 29: Creating and Using a Hash Function

Exercise 30: Verifying Our Hash Function

Analytic Signatures

Exercise 31: Perform the Data Preparation for Creating an Analytic Signature for an Image

Exercise 32: Creating a Brightness Comparison Function

Exercise 33: Creating a Function to Compare Image Sections to All of the Neighboring Sections

Exercise 34: Creating a Function that Generates an Analytic Signature for an Image

Activity 11: Creating an Image Signature for a Photograph of a Person

Comparison of Signatures

Activity 12: Creating an Image Signature for the Watermarked Image

Applying Other Unsupervised Learning Methods to Analytic Signatures

Latent Variable Models – Factor Analysis

Exercise 35: Preparing for Factor Analysis

Linear Algebra behind Factor Analysis

Exercise 36: More Exploration with Factor Analysis

Activity 13: Performing Factor Analysis

Summary

Chapter 6

Anomaly Detection

Introduction

Univariate Outlier Detection

Exercise 37: Performing an Exploratory Visual Check for Outliers Using R's boxplot Function

Exercise 38: Transforming a Fat-Tailed Dataset to Improve Outlier Classification

Exercise 39: Finding Outliers without Using R's Built-In boxplot Function

Exercise 40: Detecting Outliers Using a Parametric Method

Multivariate Outlier Detection

Exercise 41: Calculating Mahalanobis Distance

Detecting Anomalies in Clusters

Other Methods for Multivariate Outlier Detection

Exercise 42: Classifying Outliers based on Comparisons of Mahalanobis Distances

Detecting Outliers in Seasonal Data

Exercise 43: Performing Seasonality Modeling

Exercise 44: Finding Anomalies in Seasonal Data Using a Parametric Method

Contextual and Collective Anomalies

Exercise 45: Detecting Contextual Anomalies

Exercise 46: Detecting Collective Anomalies

Kernel Density

Exercise 47: Finding Anomalies Using Kernel Density Estimation

Continuing in Your Studies of Anomaly Detection

Activity 14: Finding Univariate Anomalies Using a Parametric Method and a Non-parametric Method

Activity 15: Using Mahalanobis Distance to Find Anomalies

Summary

Appendix

Chapter 1: Introduction to Clustering Methods

Activity 1: k-means Clustering with Three Clusters

Activity 2: Customer Segmentation with k-means

Activity 3: Performing Customer Segmentation with k-medoids Clustering

Activity 4: Finding the Ideal Number of Market Segments

Chapter 2: Advanced Clustering Methods

Activity 5: Implementing k-modes Clustering on the Mushroom Dataset

Activity 6: Implementing DBSCAN and Visualizing the Results

Activity 7: Performing a Hierarchical Cluster Analysis on the Seeds Dataset

Chapter 3: Probability Distributions

Activity 8: Finding the Standard Distribution Closest to the Distribution of Variables of the Iris Dataset

Activity 9: Calculating the CDF and Performing the Kolmogorov-Simonov Test with the Normal Distribution

Chapter 4: Dimension Reduction

Activity 10: Performing PCA and Market Basket Analysis on a New Dataset

Chapter 5: Data Comparison Methods

Activity 11: Create an Image Signature for a Photograph of a Person

Activity 12: Create an Image Signature for the Watermarked Image

Activity 13: Performing Factor Analysis

Chapter 6: Anomaly Detection

Activity 14: Finding Univariate Anomalies Using a Parametric Method and a Non-parametric Method

Activity 15: Using Mahalanobis Distance to Find Anomalies

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部