万本电子书0元读

万本电子书0元读

顶部广告

Effective Amazon Machine Learning电子书

售       价:¥

5人正在读 | 0人评论 9.8

作       者:Alexis Perrier

出  版  社:Packt Publishing

出版时间:2017-04-25

字       数:34.6万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Predictive analytics is a complex domain requiring coding skills, an understanding of the mathematical concepts underpinning machine learning algorithms, and the ability to create compelling data visualizations. Following AWS simplifying Machine learning, this book will help you bring predictive analytics projects to fruition in three easy steps: data preparation, model tuning, and model selection. This book will introduce you to the Amazon Machine Learning platform and will implement core data science concepts such as classification, regression, regularization, overfitting, model selection, and evaluation. Furthermore, you will learn to leverage the Amazon Web Service (AWS) ecosystem for extended access to data sources, implement realtime predictions, and run Amazon Machine Learning projects via the command line and the Python SDK. Towards the end of the book, you will also learn how to apply these services to other problems, such as text mining, and to more complex datasets. What you will learn ?Learn how to use the Amazon Machine Learning service from scratch for predictive analytics ?Gain hands-on experience of key Data Science concepts ?Solve classic regression and classification problems ?Run projects programmatically via the command line and the Python SDK
目录展开

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Dedication

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Introduction to Machine Learning and Predictive Analytics

Introducing Amazon Machine Learning

Machine Learning as a Service

Leveraging full AWS integration

Comparing performances

Engineering data versus model variety

Amazon's expertise and the gradient descent algorithm

Pricing

Understanding predictive analytics

Building the simplest predictive analytics algorithm

Regression versus classification

Expanding regression to classification with logistic regression

Extracting features to predict outcomes

Diving further into linear modeling for prediction

Validating the dataset

Missing from Amazon ML

The statistical approach versus the machine learning approach

Summary

Machine Learning Definitions and Concepts

What's an algorithm? What's a model?

Dealing with messy data

Classic datasets versus real-world datasets

Assumptions for multiclass linear models

Missing values

Normalization

Imbalanced datasets

Addressing multicollinearity

Detecting outliers

Accepting non-linear patterns

Adding features?

Preprocessing recapitulation

The predictive analytics workflow

Training and evaluation in Amazon ML

Identifying and correcting poor performances

Underfitting

Overfitting

Regularization on linear models

L2 regularization and Ridge

L1 regularization and Lasso

Evaluating the performance of your model

Summary

Overview of an Amazon Machine Learning Workflow

Opening an Amazon Web Services Account

Security

Setting up the account

Creating a user

Defining policies

Creating login credentials

Choosing a region

Overview of a standard Amazon Machine Learning workflow

The dataset

Loading the data on S3

Declaring a datasource

Creating the datasource

The model

The evaluation of the model

Comparing with a baseline

Making batch predictions

Summary

Loading and Preparing the Dataset

Working with datasets

Finding open datasets

Introducing the Titanic dataset

Preparing the data

Splitting the data

Loading data on S3

Creating a bucket

Loading the data

Granting permissions

Formatting the data

Creating the datasource

Verifying the data schema

Reusing the schema

Examining data statistics

Feature engineering with Athena

Introducing Athena

A brief tour of AWS Athena

Creating a titanic database

Using the wizard

Creating the database and table directly in SQL

Data munging in SQL

Missing values

Handling outliers in the fare

Extracting the title from the name

Inferring the deck from the cabin

Calculating family size

Wrapping up

Creating an improved datasource

Summary

Model Creation

Transforming data with recipes

Managing variables

Grouping variables

Naming variables with assignments

Specifying outputs

Data processing through seven transformations

Using simple transformations

Text mining

Coupling variables

Binning numeric values

Creating a model

Editing the suggested recipe

Applying recipes to the Titanic dataset

Choosing between recipes and data pre-processing.

Parametrizing the model

Setting model memory

Setting the number of data passes

Choosing regularization

Creating an evaluation

Evaluating the model

Evaluating binary classification

Exploring the model performances

Evaluating linear regression

Evaluating multiclass classification

Analyzing the logs

Optimizing the learning rate

Visualizing convergence

Impact of regularization

Comparing different recipes on the Titanic dataset

Keeping variables as numeric or applying quantile binning?

Parsing the model logs

Summary

Predictions and Performances

Making batch predictions

Creating the batch prediction job

Interpreting prediction outputs

Reading the manifest file

Reading the results file

Assessing our predictions

Evaluating the held-out dataset

Finding out who will survive

Multiplying trials

Making real-time predictions

Manually exploring variable influence

Setting up real-time predictions

AWS SDK

Setting up AWS credentials

AWS access keys

Setting up AWS CLI

Python SDK

Summary

Command Line and SDK

Getting started and setting up

Using the CLI versus SDK

Installing AWS CLI

Picking up CLI syntax

Passing parameters using JSON files

Introducing the Ames Housing dataset

Splitting the dataset with shell commands

A simple project using the CLI

An overview of Amazon ML CLI commands

Creating the datasource

Creating the model

Evaluating our model with create-evaluation

What is cross-validation?

Implementing Monte Carlo cross-validation

Generating the shuffled datasets

Generating the datasources template

Generating the models template

Generating the evaluations template

The results

Conclusion

Boto3, the Python SDK

Working with the Python SDK for Amazon Machine Learning

Waiting on operation completion

Wrapping up the Python-based workflow

Implementing recursive feature selection with Boto3

Managing schema and recipe

Summary

Creating Datasources from Redshift

Choosing between RDS and Redshift

Creating a Redshift instance

Connecting through the command line

Executing Redshift queries using Psql

Creating our own non-linear dataset

Uploading the nonlinear data to Redshift

Introducing polynomial regression

Establishing a baseline

Polynomial regression in Amazon ML

Driving the trials in Python

Interpreting the results

Summary

Building a Streaming Data Analysis Pipeline

Streaming Twitter sentiment analysis

Popularity contest on twitter

The training dataset and the model

Kinesis

Kinesis Stream

Kinesis Analytics

Setting up Kinesis Firehose

Producing tweets

The Redshift database

Adding Redshift to the Kinesis Firehose

Setting up the roles and policies

Dependencies and debugging

Data format synchronization

Debugging

Preprocessing with Lambda

Analyzing the results

Download the dataset from RedShift

Sentiment analysis with TextBlob

Removing duplicate tweets

And what is the most popular vegetable?

Going beyond classification and regression

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部