万本电子书0元读

万本电子书0元读

顶部广告

Mastering Clojure Data Analysis电子书

售       价:¥

1人正在读 | 0人评论 9.8

作       者:Eric Rochester

出  版  社:Packt Publishing

出版时间:2014-05-26

字       数:382.1万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This book consists of a practical, exampleoriented approach that aims to help you learn how to use Clojure for data analysis quickly and efficiently. This book is great for those who have experience with Clojure and need to use it to perform data analysis. This book will also be hugely beneficial for readers with basic experience in data analysis and statistics.
目录展开

Mastering Clojure Data Analysis

Table of Contents

Mastering Clojure Data Analysis

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Network Analysis – The Six Degrees of Kevin Bacon

Analyzing social networks

Getting the data

Understanding graphs

Implementing the graphs

Loading the data

Measuring social network graphs

Density

Degrees

Paths

Average path length

Network diameter

Clustering coefficient

Centrality

Degrees of separation

Visualizing the graph

Setting up ClojureScript

A force-directed layout

A hive plot

A pie chart

Summary

2. GIS Analysis – Mapping Climate Change

Understanding GIS

Mapping the climate change

Downloading and extracting the data

Downloading the files

Extracting the files

Transforming the data – filtering

Rolling averages

Reading the data

Interpolating sample points and generating heat maps using inverse distance weighting (IDW)

Working with map projections

Finding a base map

Working with ArcGIS

Summary

3. Topic Modeling – Changing Concerns in the State of the Union Addresses

Understanding data in the State of Union addresses

Understanding topic modeling

Preparing for visualizations

Setting up the project

Getting the data

Loading the data into MALLET

Visualizing with D3 and ClojureScript

Exploring the topics

Exploring topic 43

Exploring topic 26

Exploring topic 42

Summary

4. Classifying UFO Sightings

Getting the data

Extracting the data

Dealing with messy data

Visualizing UFO data

Description

Topic modeling descriptions

Hoaxes

Preparing the data

Reading the data into a sequence of data records

Splitting the NUFORC comments

Categorizing the documents based on the comments

Partitioning the documents into directories based on the categories

Dividing them into training and test sets

Classifying the data

Coding the classifier interface

Setting up the Pipe and InstanceList

Training

Classifying

Validating

Tying it all together

Running the classifier and examining the results

Summary

5. Benford's Law – Detecting Natural Progressions of Numbers

Learning about Benford's Law

Applying Benford's law to compound interest

Looking at the world population data

Failing Benford's Law

Case studies

Summary

6. Sentiment Analysis – Categorizing Hotel Reviews

Understanding sentiment analysis

Getting hotel review data

Exploring the data

Preparing the data

Tokenizing

Creating feature vectors

Creating feature vector functions and POS tagging

Cross-validating the results

Calculating error rates

Using the Weka machine learning library

Connecting Weka and cross-validation

Understanding maximum entropy classifiers

Understanding naive Bayesian classifiers

Running the experiment

Examining the results

Combining the error rates

Improving the results

Summary

7. Null Hypothesis Tests – Analyzing Crime Data

Introducing confirmatory data analysis

Understanding null hypothesis testing

Understanding the process

Formulating an initial hypothesis

Stating the null and alternative hypotheses

Determining appropriate tests

Selecting the significance level

Determining the critical region

Calculating the test statistics and its probability

Deciding whether to reject the null hypothesis or not

Flipping coins

Formulating an initial hypothesis

Stating the null and alternative hypotheses

Identifying the statistical assumptions in the sample

Determining appropriate tests

Selecting the significance level

Determining the critical region

Calculating the test statistic and its probability

Deciding whether to reject the null hypothesis or not

Understanding burglary rates

Getting the data

Parsing the Excel files

Pulling out raw data

Growing a data tree

Cutting down the data tree

Putting it all together

Transforming the data

Joining the data sources

Pivoting the data

Filtering the missing data

Putting it all together

Exploring the data

Generating summary statistics

Summarizing UNODC crime data

Summarizing World Bank land area and GNI data

Generating more charts and graphs

Conducting the experiment

Formulating an initial hypothesis

Stating the null and alternative hypotheses

Identifying the statistical assumptions in the sample

Determining which tests are appropriate

Understanding Spearman's rank correlation coefficient

Selecting the significance level

Determining the critical region

Calculating the test statistic and its probability

Deciding whether to reject the null hypothesis or not

Interpreting the results

Summary

8. A/B Testing – Statistical Experiments for the Web

Defining A/B testing

Conducting an A/B test

Planning the experiment

Framing the statistics

Building the experiment

Looking at options to build the site

Implementing A/B testing on the server

Understanding the scaffolded site

Building the test site

Implementing A/B testing

Viewing the results

Looking at A/B testing as a user

Analyzing the results

Understanding the t-test

Testing coin tosses

Testing the results

Summary

9. Analyzing Social Data Participation

Setting up the project

Understanding the analyses

Understanding social network data

Understanding knowledge-based social networks

Introducing the 80/20 rule

Getting the data

Looking at the amount of data

Looking at the data format

Defining and loading the data

Counting frequencies

Sorting and ranking

Finding the patterns of participation

Matching the 80/20 rule

Looking for the 20 percent of questioners

Looking for the 20 percent of respondents

Combining ranks

Looking at those who only post questions

Looking at those who only post answers

Looking at those who post both questions and answers

Finding the up-voted answers

Processing the answers

Predicting the accepted answer

Setting up

Creating the InstanceList object

Training sets and Test sets

Training

Testing

Evaluating the outcome

Summary

10. Modeling Stock Data

Learning about financial data analysis

Setting up the basics

Setting up the library

Getting the data

Getting prepared with data

Working with news articles

Working with stock data

Analyzing the text

Analyzing vocabulary

Stop lists

Hapax and Dis Legomena

TF-IDF

Inspecting the stock prices

Merging text and stock features

Analyzing both text and stock features together with neural nets

Understanding neural nets

Setting up the neural net

Training the neural net

Running the neural net

Validating the neural net

Finding the best parameters

Predicting the future

Loading stock prices

Loading news articles

Creating training and test sets

Finding the best parameters for the neural network

Training and validating the neural network

Running the network on new data

Taking it with a grain of salt

Related to this project

Related to machine learning and market modeling in general

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部