万本电子书0元读

万本电子书0元读

顶部广告

Hands-On Data Science and Python Machine Learning电子书

售       价:¥

1人正在读 | 0人评论 9.8

作       者:Frank Kane

出  版  社:Packt Publishing

出版时间:2017-07-31

字       数:52.3万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This book covers the fundamentals of machine learning with Python in a concise and dynamic manner. It covers data mining and large-scale machine learning using Apache Spark. About This Book ? Take your first steps in the world of data science by understanding the tools and techniques of data analysis ? Train efficient Machine Learning models in Python using the supervised and unsupervised learning methods ? Learn how to use Apache Spark for processing Big Data efficiently Who This Book Is For If you are a budding data scientist or a data analyst who wants to analyze and gain actionable insights from data using Python, this book is for you. Programmers with some experience in Python who want to enter the lucrative world of Data Science will also find this book to be very useful, but you don't need to be an expert Python coder or mathematician to get the most from this book. What You Will Learn ? Learn how to clean your data and ready it for analysis ? Implement the popular clustering and regression methods in Python ? Train efficient machine learning models using decision trees and random forests ? Visualize the results of your analysis using Python’s Matplotlib library ? Use Apache Spark’s MLlib package to perform machine learning on large datasets In Detail Join Frank Kane, who worked on Amazon and IMDb’s machine learning algorithms, as he guides you on your first steps into the world of data science. Hands-On Data Science and Python Machine Learning gives you the tools that you need to understand and explore the core topics in the field, and the confidence and practice to build and analyze your own machine learning models. With the help of interesting and easy-to-follow practical examples, Frank Kane explains potentially complex topics such as Bayesian methods and K-means clustering in a way that anybody can understand them. Based on Frank’s successful data science course, Hands-On Data Science and Python Machine Learning empowers you to conduct data analysis and perform efficient machine learning using Python. Let Frank help you unearth the value in your data using the various data mining and data analysis techniques available in Python, and to develop efficient predictive models to predict future results. You will also learn how to perform large-scale machine learning on Big Data using Apache Spark. The book covers preparing your data for analysis, training machine learning models, and visualizing the final data analysis. Style and approach This comprehensive book is a perfect blend of theory and hands-on code examples in Python which can be used for your reference at any time.
目录展开

Title Page

Copyright

Hands-On Data Science and Python Machine Learning

Credits

About the Author

www.PacktPub.com

Why subscribe?

Customer Feedback

Preface

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Getting Started

Installing Enthought Canopy

Giving the installation a test run

If you occasionally get problems opening your IPNYB files

Using and understanding IPython (Jupyter) Notebooks

Python basics - Part 1

Understanding Python code

Importing modules

Data structures

Experimenting with lists

Pre colon

Post colon

Negative syntax

Adding list to list

The append function

Complex data structures

Dereferencing a single element

The sort function

Reverse sort

Tuples

Dereferencing an element

List of tuples

Dictionaries

Iterating through entries

Python basics - Part 2

Functions in Python

Lambda functions - functional programming

Understanding boolean expressions

The if statement

The if-else loop

Looping

The while loop

Exploring activity

Running Python scripts

More options than just the IPython/Jupyter Notebook

Running Python scripts in command prompt

Using the Canopy IDE

Summary

Statistics and Probability Refresher, and Python Practice

Types of data

Numerical data

Discrete data

Continuous data

Categorical data

Ordinal data

Mean, median, and mode

Mean

Median

The factor of outliers

Mode

Using mean, median, and mode in Python

Calculating mean using the NumPy package

Visualizing data using matplotlib

Calculating median using the NumPy package

Analyzing the effect of outliers

Calculating mode using the SciPy package

Some exercises

Standard deviation and variance

Variance

Measuring variance

Standard deviation

Identifying outliers with standard deviation

Population variance versus sample variance

The Mathematical explanation

Analyzing standard deviation and variance on a histogram

Using Python to compute standard deviation and variance

Try it yourself

Probability density function and probability mass function

The probability density function and probability mass functions

Probability density functions

Probability mass functions

Types of data distributions

Uniform distribution

Normal or Gaussian distribution

The exponential probability distribution or Power law

Binomial probability mass function

Poisson probability mass function

Percentiles and moments

Percentiles

Quartiles

Computing percentiles in Python

Moments

Computing moments in Python

Summary

Matplotlib and Advanced Probability Concepts

A crash course in Matplotlib

Generating multiple plots on one graph

Saving graphs as images

Adjusting the axes

Adding a grid

Changing line types and colors

Labeling axes and adding a legend

A fun example

Generating pie charts

Generating bar charts

Generating scatter plots

Generating histograms

Generating box-and-whisker plots

Try it yourself

Covariance and correlation

Defining the concepts

Measuring covariance

Correlation

Computing covariance and correlation in Python

Computing correlation – The hard way

Computing correlation – The NumPy way

Correlation activity

Conditional probability

Conditional probability exercises in Python

Conditional probability assignment

My assignment solution

Bayes' theorem

Summary

Predictive Models

Linear regression

The ordinary least squares technique

The gradient descent technique

The co-efficient of determination or r-squared

Computing r-squared

Interpreting r-squared

Computing linear regression and r-squared using Python

Activity for linear regression

Polynomial regression

Implementing polynomial regression using NumPy

Computing the r-squared error

Activity for polynomial regression

Multivariate regression and predicting car prices

Multivariate regression using Python

Activity for multivariate regression

Multi-level models

Summary

Machine Learning with Python

Machine learning and train/test

Unsupervised learning

Supervised learning

Evaluating supervised learning

K-fold cross validation

Using train/test to prevent overfitting of a polynomial regression

Activity

Bayesian methods - Concepts

Implementing a spam classifier with Naïve Bayes

Activity

K-Means clustering

Limitations to k-means clustering

Clustering people based on income and age

Activity

Measuring entropy

Decision trees - Concepts

Decision tree example

Walking through a decision tree

Random forests technique

Decision trees - Predicting hiring decisions using Python

Ensemble learning – Using a random forest

Activity

Ensemble learning

Support vector machine overview

Using SVM to cluster people by using scikit-learn

Activity

Summary

Recommender Systems

What are recommender systems?

User-based collaborative filtering

Limitations of user-based collaborative filtering

Item-based collaborative filtering

Understanding item-based collaborative filtering

How item-based collaborative filtering works?

Collaborative filtering using Python

Finding movie similarities

Understanding the code

The corrwith function

Improving the results of movie similarities

Making movie recommendations to people

Understanding movie recommendations with an example

Using the groupby command to combine rows

Removing entries with the drop command

Improving the recommendation results

Summary

More Data Mining and Machine Learning Techniques

K-nearest neighbors - concepts

Using KNN to predict a rating for a movie

Activity

Dimensionality reduction and principal component analysis

Dimensionality reduction

Principal component analysis

A PCA example with the Iris dataset

Activity

Data warehousing overview

ETL versus ELT

Reinforcement learning

Q-learning

The exploration problem

The simple approach

The better way

Fancy words

Markov decision process

Dynamic programming

Summary

Dealing with Real-World Data

Bias/variance trade-off

K-fold cross-validation to avoid overfitting

Example of k-fold cross-validation using scikit-learn

Data cleaning and normalisation

Cleaning web log data

Applying a regular expression on the web log

Modification one - filtering the request field

Modification two - filtering post requests

Modification three - checking the user agents

Filtering the activity of spiders/robots

Modification four - applying website-specific filters

Activity for web log data

Normalizing numerical data

Detecting outliers

Dealing with outliers

Activity for outliers

Summary

Apache Spark - Machine Learning on Big Data

Installing Spark

Installing Spark on Windows

Installing Spark on other operating systems

Installing the Java Development Kit

Installing Spark

Spark introduction

It's scalable

It's fast

It's young

It's not difficult

Components of Spark

Python versus Scala for Spark

Spark and Resilient Distributed Datasets (RDD)

The SparkContext object

Creating RDDs

Creating an RDD using a Python list

Loading an RDD from a text file

More ways to create RDDs

RDD operations

Transformations

Using map()

Actions

Introducing MLlib

Some MLlib Capabilities

Special MLlib data types

The vector data type

LabeledPoint data type

Rating data type

Decision Trees in Spark with MLlib

Exploring decision trees code

Creating the SparkContext

Importing and cleaning our data

Creating a test candidate and building our decision tree

Running the script

K-Means Clustering in Spark

Within set sum of squared errors (WSSSE)

Running the code

TF-IDF

TF-IDF in practice

Using TF- IDF

Searching wikipedia with Spark MLlib

Import statements

Creating the initial RDD

Creating and transforming a HashingTF object

Computing the TF-IDF score

Using the Wikipedia search engine algorithm

Running the algorithm

Using the Spark 2.0 DataFrame API for MLlib

How Spark 2.0 MLlib works

Implementing linear regression

Summary

Testing and Experimental Design

A/B testing concepts

A/B tests

Measuring conversion for A/B testing

How to attribute conversions

Variance is your enemy

T-test and p-value

The t-statistic or t-test

The p-value

Measuring t-statistics and p-values using Python

Running A/B test on some experimental data

When there's no real difference between the two groups

Does the sample size make a difference?

Sample size increased to six-digits

Sample size increased seven-digits

A/A testing

Determining how long to run an experiment for

A/B test gotchas

Novelty effects

Seasonal effects

Selection bias

Auditing selection bias issues

Data pollution

Attribution errors

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部