当当云阅读 > 进口书 > 外文原版书 > 电脑/网络 > Learning Data Mining with Python

| | 手机阅读

扫描下载当当云阅读App

Learning Data Mining with Python电子书

售价：¥

9人正在读 | 0人评论

9.8

作者：Robert Layton

出版社：Packt Publishing

出版时间：2015-07-29

字数：204.0万

所属分类：进口书 > 外文原版书 > 电脑/网络

温馨提示：数字商品不支持退换货，不提供源文件，不支持导出打印

为你推荐

Mastering pandas for Finance

￥80.65

FreeCAD [How-to]

￥35.96

Swift Essentials

￥90.46

Creating your MySQL Database: Practical Design Tips and Techniques

￥35.96
Building Telephony Systems with OpenSIPS - Second Edition

￥80.65

Learn Web Development with Python

￥90.46

Beginning Data Science with Python and Jupyter

￥90.46

Python Deep Learning

￥99.18

读书简介
目录
累计评论(0条)

读书简介
目录
累计评论(0条)

If you are a programmer who wants to get started with data mining, then this book is for you.

目录展开

Learning Data Mining with Python

Table of Contents

Learning Data Mining with Python

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started with Data Mining

Introducing data mining

Using Python and the IPython Notebook

Installing Python

Installing IPython

Installing scikit-learn

A simple affinity analysis example

What is affinity analysis?

Product recommendations

Loading the dataset with NumPy

Implementing a simple ranking of rules

Ranking to find the best rules

A simple classification example

What is classification?

Loading and preparing the dataset

Implementing the OneR algorithm

Testing the algorithm

Summary

2. Classifying with scikit-learn Estimators

scikit-learn estimators

Nearest neighbors

Distance metrics

Loading the dataset

Moving towards a standard workflow

Running the algorithm

Setting parameters

Preprocessing using pipelines

An example

Standard preprocessing

Putting it all together

Pipelines

Summary

3. Predicting Sports Winners with Decision Trees

Loading the dataset

Collecting the data

Using pandas to load the dataset

Cleaning up the dataset

Extracting new features

Decision trees

Parameters in decision trees

Using decision trees

Sports outcome prediction

Putting it all together

Random forests

How do ensembles work?

Parameters in Random forests

Applying Random forests

Engineering new features

Summary

4. Recommending Movies Using Affinity Analysis

Affinity analysis

Algorithms for affinity analysis

Choosing parameters

The movie recommendation problem

Obtaining the dataset

Loading with pandas

Sparse data formats

The Apriori implementation

The Apriori algorithm

Implementation

Extracting association rules

Evaluation

Summary

5. Extracting Features with Transformers

Feature extraction

Representing reality in models

Common feature patterns

Creating good features

Feature selection

Selecting the best individual features

Feature creation

Principal Component Analysis

Creating your own transformer

The transformer API

Implementation details

Unit testing

Putting it all together

Summary

6. Social Media Insight Using Naive Bayes

Disambiguation

Downloading data from a social network

Loading and classifying the dataset

Creating a replicable dataset from Twitter

Text transformers

Bag-of-words

N-grams

Other features

Naive Bayes

Bayes' theorem

Naive Bayes algorithm

How it works

Application

Extracting word counts

Converting dictionaries to a matrix

Training the Naive Bayes classifier

Putting it all together

Evaluation using the F1-score

Getting useful features from models

Summary

7. Discovering Accounts to Follow Using Graph Mining

Loading the dataset

Classifying with an existing model

Getting follower information from Twitter

Building the network

Creating a graph

Creating a similarity graph

Finding subgraphs

Connected components

Optimizing criteria

Summary

8. Beating CAPTCHAs with Neural Networks

Artificial neural networks

An introduction to neural networks

Creating the dataset

Drawing basic CAPTCHAs

Splitting the image into individual letters

Creating a training dataset

Adjusting our training dataset to our methodology

Training and classifying

Back propagation

Predicting words

Improving accuracy using a dictionary

Ranking mechanisms for words

Putting it all together

Summary

9. Authorship Attribution

Attributing documents to authors

Applications and use cases

Attributing authorship

Getting the data

Function words

Counting function words

Classifying with function words

Support vector machines

Classifying with SVMs

Kernels

Character n-grams

Extracting character n-grams

Using the Enron dataset

Accessing the Enron dataset

Creating a dataset loader

Putting it all together

Evaluation

Summary

10. Clustering News Articles

Obtaining news articles

Using a Web API to get data

Reddit as a data source

Getting the data

Extracting text from arbitrary websites

Finding the stories in arbitrary websites

Putting it all together

Grouping news articles

The k-means algorithm

Evaluating the results

Extracting topic information from clusters

Using clustering algorithms as transformers

Clustering ensembles

Evidence accumulation

How it works

Implementation

Online learning

An introduction to online learning

Implementation

Summary

11. Classifying Objects in Images Using Deep Learning

Object classification

Application scenario and goals

Use cases

Deep neural networks

Intuition

Implementation

An introduction to Theano

An introduction to Lasagne

Implementing neural networks with nolearn

GPU optimization

When to use GPUs for computation

Running our code on a GPU

Setting up the environment

Application

Getting the data

Creating the neural network

Putting it all together

Summary

12. Working with Big Data

Big data

Application scenario and goals

MapReduce

Intuition

A word count example

Hadoop MapReduce

Application

Getting the data

Naive Bayes prediction

The mrjob package

Extracting the blog posts

Training Naive Bayes

Putting it all together

Training on Amazon's EMR infrastructure

Summary

A. Next Steps…

Chapter 1 – Getting Started with Data Mining

Scikit-learn tutorials

Extending the IPython Notebook

Chapter 2 – Classifying with scikit-learn Estimators

Scalability with the nearest neighbor

More complex pipelines

Comparing classifiers

Chapter 3: Predicting Sports Winners with Decision Trees

支持设备

Mastering pandas for Finance ￥80.65

Michael Heydt

￥80.65

FreeCAD [How-to] ￥35.96

Daniel Falck

￥35.96

Swift Essentials ￥90.46

Dr Alex Blewitt

￥90.46

Creating your MySQL Database: Practical Design Tips and Techniques ￥35.96

Marc Delisle

￥35.96

Building Telephony Systems with OpenSIPS - Second Edition ￥80.65

Flavio E. Goncalves

￥80.65

Hands-On MQTT Programming with Python ￥63.21

Gaston C. Hillar

￥63.21

Learning ServiceNow ￥90.46

Tim Woodruff

￥90.46

Learn Web Development with Python ￥90.46

Fabrizio Romano

￥90.46

Beginning Data Science with Python and Jupyter ￥90.46

Alex Galea

￥90.46

WiX 3.6: A Developer’s Guide to Windows Installer XML ￥90.46

Nick Ramirez

￥90.46

更多同类图书 >