万本电子书0元读

万本电子书0元读

顶部广告

Practical Data Analysis - Second Edition电子书

售       价:¥

5人正在读 | 0人评论 9.8

作       者:Hector Cuesta,Dr. Sampath Kumar

出  版  社:Packt Publishing

出版时间:2016-09-01

字       数:106.9万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
A practical guide to obtaining, transforming, exploring, and analyzing data using Python, MongoDB, and Apache Spark About This Book Learn to use various data analysis tools and algorithms to classify, cluster, visualize, simulate, and forecast your data Apply Machine Learning algorithms to different kinds of data such as social networks, time series, and images A hands-on guide to understanding the nature of data and how to turn it into insight Who This Book Is For This book is for developers who want to implement data analysis and data-driven algorithms in a practical way. It is also suitable for those without a background in data analysis or data processing. Basic knowledge of Python programming, statistics, and linear algebra is assumed. What You Will Learn Acquire, format, and visualize your data Build an image-similarity search engine Generate meaningful visualizations anyone can understand Get started with analyzing social network graphs Find out how to implement sentiment text analysis Install data analysis tools such as Pandas, MongoDB, and Apache Spark Get to grips with Apache Spark Implement machine learning algorithms such as classification or forecasting In Detail Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you’ll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark. Style and approach This is a hands-on guide to data analysis and data processing. The concrete examples are explained with simple code and accessible data.
目录展开

Practical Data Analysis - Second Edition

Practical Data Analysis - Second Edition

Credits

About the Authors

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started

Computer science

Artificial intelligence

Machine learning

Statistics

Mathematics

Knowledge domain

Data, information, and knowledge

Inter-relationship between data, information, and knowledge

The nature of data

The data analysis process

The problem

Data preparation

Data exploration

Predictive modeling

Visualization of results

Quantitative versus qualitative data analysis

Importance of data visualization

What about big data?

Quantified self

Sensors and cameras

Social network analysis

Tools and toys for this book

Why Python?

Why mlpy?

Why D3.js?

Why MongoDB?

Summary

2. Preprocessing Data

Data sources

Open data

Text files

Excel files

SQL databases

NoSQL databases

Multimedia

Web scraping

Data scrubbing

Statistical methods

Text parsing

Data transformation

Data formats

Parsing a CSV file with the CSV module

Parsing CSV file using NumPy

JSON

Parsing JSON file using the JSON module

XML

Parsing XML in Python using the XML module

YAML

Data reduction methods

Filtering and sampling

Binned algorithm

Dimensionality reduction

Getting started with OpenRefine

Text facet

Clustering

Text filters

Numeric facets

Transforming data

Exporting data

Operation history

Summary

3. Getting to Grips with Visualization

What is visualization?

Working with web-based visualization

Exploring scientific visualization

Visualization in art

The visualization life cycle

Visualizing different types of data

HTML

DOM

CSS

JavaScript

SVG

Getting started with D3.js

Bar chart

Pie chart

Scatter plots

Single line chart

Multiple line chart

Interaction and animation

Data from social networks

An overview of visual analytics

Summary

4. Text Classification

Learning and classification

Bayesian classification

Naïve Bayes

E-mail subject line tester

The data

The algorithm

Classifier accuracy

Summary

5. Similarity-Based Image Retrieval

Image similarity search

Dynamic time warping

Processing the image dataset

Implementing DTW

Analyzing the results

Summary

6. Simulation of Stock Prices

Financial time series

Random Walk simulation

Monte Carlo methods

Generating random numbers

Implementation in D3js

Quantitative analyst

Summary

7. Predicting Gold Prices

Working with time series data

Components of a time series

Smoothing time series

Lineal regression

The data - historical gold prices

Nonlinear regressions

Kernel Ridge Regressions

Smoothing the gold prices time series

Predicting in the smoothed time series

Contrasting the predicted value

Summary

8. Working with Support Vector Machines

Understanding the multivariate dataset

Dimensionality reduction

Linear Discriminant Analysis (LDA)

Principal Component Analysis (PCA)

Getting started with SVM

Kernel functions

The double spiral problem

SVM implemented on mlpy

Summary

9. Modeling Infectious Diseases with Cellular Automata

Introduction to epidemiology

The epidemiology triangle

The epidemic models

The SIR model

Solving the ordinary differential equation for the SIR model with SciPy

The SIRS model

Modeling with Cellular Automaton

Cell, state, grid, neighborhood

Global stochastic contact model

Simulation of the SIRS model in CA with D3.js

Summary

10. Working with Social Graphs

Structure of a graph

Undirected graph

Directed graph

Social networks analysis

Acquiring the Facebook graph

Working with graphs using Gephi

Statistical analysis

Male to female ratio

Degree distribution

Histogram of a graph

Centrality

Transforming GDF to JSON

Graph visualization with D3.js

Summary

11. Working with Twitter Data

The anatomy of Twitter data

Tweet

Followers

Trending topics

Using OAuth to access Twitter API

Getting started with Twython

Simple search using Twython

Working with timelines

Working with followers

Working with places and trends

Working with user data

Streaming API

Summary

12. Data Processing and Aggregation with MongoDB

Getting started with MongoDB

Database

Collection

Document

Mongo shell

Insert/Update/Delete

Queries

Data preparation

Data transformation with OpenRefine

Inserting documents with PyMongo

Group

Aggregation framework

Pipelines

Expressions

Summary

13. Working with MapReduce

An overview of MapReduce

Programming model

Using MapReduce with MongoDB

Map function

Reduce function

Using mongo shell

Using Jupyter

Using PyMongo

Filtering the input collection

Grouping and aggregation

Counting the most common words in tweets

Summary

14. Online Data Analysis with Jupyter and Wakari

Getting started with Wakari

Creating an account in Wakari

Getting started with IPython notebook

Data visualization

Introduction to image processing with PIL

Opening an image

Working with an image histogram

Filtering

Operations

Transformations

Getting started with pandas

Working with Time Series

Working with multivariate datasets with DataFrame

Grouping, Aggregation, and Correlation

Sharing your Notebook

The data

Summary

15. Understanding Data Processing using Apache Spark

Platform for data processing

The Cloudera platform

Installing Cloudera VM

An introduction to the distributed file system

First steps with Hadoop Distributed File System - HDFS

File management with HUE - web interface

An introduction to Apache Spark

The Spark ecosystem

The Spark programming model

An introductory working example of Apache Startup

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部