万本电子书0元读

万本电子书0元读

顶部广告

Learning Data Mining with Python电子书

售       价:¥

9人正在读 | 0人评论 9.8

作       者:Robert Layton

出  版  社:Packt Publishing

出版时间:2015-07-29

字       数:204.0万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
If you are a programmer who wants to get started with data mining, then this book is for you.
目录展开

Learning Data Mining with Python

Table of Contents

Learning Data Mining with Python

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started with Data Mining

Introducing data mining

Using Python and the IPython Notebook

Installing Python

Installing IPython

Installing scikit-learn

A simple affinity analysis example

What is affinity analysis?

Product recommendations

Loading the dataset with NumPy

Implementing a simple ranking of rules

Ranking to find the best rules

A simple classification example

What is classification?

Loading and preparing the dataset

Implementing the OneR algorithm

Testing the algorithm

Summary

2. Classifying with scikit-learn Estimators

scikit-learn estimators

Nearest neighbors

Distance metrics

Loading the dataset

Moving towards a standard workflow

Running the algorithm

Setting parameters

Preprocessing using pipelines

An example

Standard preprocessing

Putting it all together

Pipelines

Summary

3. Predicting Sports Winners with Decision Trees

Loading the dataset

Collecting the data

Using pandas to load the dataset

Cleaning up the dataset

Extracting new features

Decision trees

Parameters in decision trees

Using decision trees

Sports outcome prediction

Putting it all together

Random forests

How do ensembles work?

Parameters in Random forests

Applying Random forests

Engineering new features

Summary

4. Recommending Movies Using Affinity Analysis

Affinity analysis

Algorithms for affinity analysis

Choosing parameters

The movie recommendation problem

Obtaining the dataset

Loading with pandas

Sparse data formats

The Apriori implementation

The Apriori algorithm

Implementation

Extracting association rules

Evaluation

Summary

5. Extracting Features with Transformers

Feature extraction

Representing reality in models

Common feature patterns

Creating good features

Feature selection

Selecting the best individual features

Feature creation

Principal Component Analysis

Creating your own transformer

The transformer API

Implementation details

Unit testing

Putting it all together

Summary

6. Social Media Insight Using Naive Bayes

Disambiguation

Downloading data from a social network

Loading and classifying the dataset

Creating a replicable dataset from Twitter

Text transformers

Bag-of-words

N-grams

Other features

Naive Bayes

Bayes' theorem

Naive Bayes algorithm

How it works

Application

Extracting word counts

Converting dictionaries to a matrix

Training the Naive Bayes classifier

Putting it all together

Evaluation using the F1-score

Getting useful features from models

Summary

7. Discovering Accounts to Follow Using Graph Mining

Loading the dataset

Classifying with an existing model

Getting follower information from Twitter

Building the network

Creating a graph

Creating a similarity graph

Finding subgraphs

Connected components

Optimizing criteria

Summary

8. Beating CAPTCHAs with Neural Networks

Artificial neural networks

An introduction to neural networks

Creating the dataset

Drawing basic CAPTCHAs

Splitting the image into individual letters

Creating a training dataset

Adjusting our training dataset to our methodology

Training and classifying

Back propagation

Predicting words

Improving accuracy using a dictionary

Ranking mechanisms for words

Putting it all together

Summary

9. Authorship Attribution

Attributing documents to authors

Applications and use cases

Attributing authorship

Getting the data

Function words

Counting function words

Classifying with function words

Support vector machines

Classifying with SVMs

Kernels

Character n-grams

Extracting character n-grams

Using the Enron dataset

Accessing the Enron dataset

Creating a dataset loader

Putting it all together

Evaluation

Summary

10. Clustering News Articles

Obtaining news articles

Using a Web API to get data

Reddit as a data source

Getting the data

Extracting text from arbitrary websites

Finding the stories in arbitrary websites

Putting it all together

Grouping news articles

The k-means algorithm

Evaluating the results

Extracting topic information from clusters

Using clustering algorithms as transformers

Clustering ensembles

Evidence accumulation

How it works

Implementation

Online learning

An introduction to online learning

Implementation

Summary

11. Classifying Objects in Images Using Deep Learning

Object classification

Application scenario and goals

Use cases

Deep neural networks

Intuition

Implementation

An introduction to Theano

An introduction to Lasagne

Implementing neural networks with nolearn

GPU optimization

When to use GPUs for computation

Running our code on a GPU

Setting up the environment

Application

Getting the data

Creating the neural network

Putting it all together

Summary

12. Working with Big Data

Big data

Application scenario and goals

MapReduce

Intuition

A word count example

Hadoop MapReduce

Application

Getting the data

Naive Bayes prediction

The mrjob package

Extracting the blog posts

Training Naive Bayes

Putting it all together

Training on Amazon's EMR infrastructure

Summary

A. Next Steps…

Chapter 1 – Getting Started with Data Mining

Scikit-learn tutorials

Extending the IPython Notebook

Chapter 2 – Classifying with scikit-learn Estimators

Scalability with the nearest neighbor

More complex pipelines

Comparing classifiers

Chapter 3: Predicting Sports Winners with Decision Trees

More on pandas

More complex features

Chapter 4 – Recommending Movies Using Affinity Analysis

New datasets

The Eclat algorithm

Chapter 5 – Extracting Features with Transformers

Adding noise

Vowpal Wabbit

Chapter 6 – Social Media Insight Using Naive Bayes

Spam detection

Natural language processing and part-of-speech tagging

Chapter 7 – Discovering Accounts to Follow Using Graph Mining

More complex algorithms

NetworkX

Chapter 8 – Beating CAPTCHAs with Neural Networks

Better (worse?) CAPTCHAs

Deeper networks

Reinforcement learning

Chapter 9 – Authorship Attribution

Increasing the sample size

Blogs dataset

Local n-grams

Chapter 10 – Clustering News Articles

Evaluation

Temporal analysis

Real-time clusterings

Chapter 11: Classifying Objects in Images Using Deep Learning

Keras and Pylearn2

Mahotas

Chapter 12 – Working with Big Data

Courses on Hadoop

Pydoop

Recommendation engine

More resources

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部