万本电子书0元读

万本电子书0元读

顶部广告

Big Data Analytics with R电子书

售       价:¥

2人正在读 | 0人评论 9.8

作       者:Simon Walkowiak

出  版  社:Packt Publishing

出版时间:2016-07-01

字       数:404.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Utilize R to uncover hidden patterns in your Big Data About This Book Perform computational analyses on Big Data to generate meaningful results Get a practical knowledge of R programming language while working on Big Data platforms like Hadoop, Spark, H2O and SQL/NoSQL databases, Explore fast, streaming, and scalable data analysis with the most cutting-edge technologies in the market Who This Book Is For This book is intended for Data Analysts, Scientists, Data Engineers, Statisticians, Researchers, who want to integrate R with their current or future Big Data workflows. It is assumed that readers have some experience in data analysis and understanding of data management and algorithmic processing of large quantities of data, however they may lack specific skills related to R. What You Will Learn Learn about current state of Big Data processing using R programming language and its powerful statistical capabilities Deploy Big Data analytics platforms with selected Big Data tools supported by R in a cost-effective and time-saving manner Apply the R language to real-world Big Data problems on a multi-node Hadoop cluster, e.g. electricity consumption across various socio-demographic indicators and bike share scheme usage Explore the compatibility of R with Hadoop, Spark, SQL and NoSQL databases, and H2O platform In Detail Big Data analytics is the process of examining large and complex data sets that often exceed the computational capabilities. R is a leading programming language of data science, consisting of powerful functions to tackle all problems related to Big Data processing. The book will begin with a brief introduction to the Big Data world and its current industry standards. With introduction to the R language and presenting its development, structure, applications in real world, and its shortcomings. Book will progress towards revision of major R functions for data management and transformations. Readers will be introduce to Cloud based Big Data solutions (e.g. Amazon EC2 instances and Amazon RDS, Microsoft Azure and its HDInsight clusters) and also provide guidance on R connectivity with relational and non-relational databases such as MongoDB and HBase etc. It will further expand to include Big Data tools such as Apache Hadoop ecosystem, HDFS and MapReduce frameworks. Also other R compatible tools such as Apache Spark, its machine learning library Spark MLlib, as well as H2O. Style and approach This book will serve as a practical guide to tackling Big Data problems using R programming language and its statistical environment. Each section of the book will present you with concise and easy-to-follow steps on how to process, transform and analyse large data sets.
目录展开

Big Data Analytics with R

Big Data Analytics with R

Credits

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. The Era of Big Data

Big Data – The monster re-defined

Big Data toolbox - dealing with the giant

Hadoop - the elephant in the room

Databases

Hadoop Spark-ed up

R – The unsung Big Data hero

Summary

2. Introduction to R Programming Language and Statistical Environment

Learning R

Revisiting R basics

Getting R and RStudio ready

Setting the URLs to R repositories

R data structures

Vectors

Scalars

Matrices

Arrays

Data frames

Lists

Exporting R data objects

Applied data science with R

Importing data from different formats

Exploratory Data Analysis

Data aggregations and contingency tables

Hypothesis testing and statistical inference

Tests of differences

Independent t-test example (with power and effect size estimates)

ANOVA example

Tests of relationships

An example of Pearson's r correlations

Multiple regression example

Data visualization packages

Summary

3. Unleashing the Power of R from Within

Traditional limitations of R

Out-of-memory data

Processing speed

To the memory limits and beyond

Data transformations and aggregations with the ff and ffbase packages

Generalized linear models with the ff and ffbase packages

Logistic regression example with ffbase and biglm

Expanding memory with the bigmemory package

Parallel R

From bigmemory to faster computations

An apply() example with the big.matrix object

A for() loop example with the ffdf object

Using apply() and for() loop examples on a data.frame

A parallel package example

A foreach package example

The future of parallel processing in R

Utilizing Graphics Processing Units with R

Multi-threading with Microsoft R Open distribution

Parallel machine learning with H2O and R

Boosting R performance with the data.table package and other tools

Fast data import and manipulation with the data.table package

Data import with data.table

Lightning-fast subsets and aggregations on data.table

Chaining, more complex aggregations, and pivot tables with data.table

Writing better R code

Summary

4. Hadoop and MapReduce Framework for R

Hadoop architecture

Hadoop Distributed File System

MapReduce framework

A simple MapReduce word count example

Other Hadoop native tools

Learning Hadoop

A single-node Hadoop in Cloud

Deploying Hortonworks Sandbox on Azure

A word count example in Hadoop using Java

A word count example in Hadoop using the R language

RStudio Server on a Linux RedHat/CentOS virtual machine

Installing and configuring RHadoop packages

HDFS management and MapReduce in R - a word count example

HDInsight - a multi-node Hadoop cluster on Azure

Creating your first HDInsight cluster

Creating a new Resource Group

Deploying a Virtual Network

Creating a Network Security Group

Setting up and configuring an HDInsight cluster

Starting the cluster and exploring Ambari

Connecting to the HDInsight cluster and installing RStudio Server

Adding a new inbound security rule for port 8787

Editing the Virtual Network's public IP address for the head node

Smart energy meter readings analysis example – using R on HDInsight cluster

Summary

5. R with Relational Database Management Systems (RDBMSs)

Relational Database Management Systems (RDBMSs)

A short overview of used RDBMSs

Structured Query Language (SQL)

SQLite with R

Preparing and importing data into a local SQLite database

Connecting to SQLite from RStudio

MariaDB with R on a Amazon EC2 instance

Preparing the EC2 instance and RStudio Server for use

Preparing MariaDB and data for use

Working with MariaDB from RStudio

PostgreSQL with R on Amazon RDS

Launching an Amazon RDS database instance

Preparing and uploading data to Amazon RDS

Remotely querying PostgreSQL on Amazon RDS from RStudio

Summary

6. R with Non-Relational (NoSQL) Databases

Introduction to NoSQL databases

Review of leading non-relational databases

MongoDB with R

Introduction to MongoDB

MongoDB data models

Installing MongoDB with R on Amazon EC2

Processing Big Data using MongoDB with R

Importing data into MongoDB and basic MongoDB commands

MongoDB with R using the rmongodb package

MongoDB with R using the RMongo package

MongoDB with R using the mongolite package

HBase with R

Azure HDInsight with HBase and RStudio Server

Importing the data to HDFS and HBase

Reading and querying HBase using the rhbase package

Summary

7. Faster than Hadoop - Spark with R

Spark for Big Data analytics

Spark with R on a multi-node HDInsight cluster

Launching HDInsight with Spark and R/RStudio

Reading the data into HDFS and Hive

Getting the data into HDFS

Importing data from HDFS to Hive

Bay Area Bike Share analysis using SparkR

Summary

8. Machine Learning Methods for Big Data in R

What is machine learning?

Machine learning algorithms

Supervised and unsupervised machine learning methods

Classification and clustering algorithms

Machine learning methods with R

Big Data machine learning tools

GLM example with Spark and R on the HDInsight cluster

Preparing the Spark cluster and reading the data from HDFS

Logistic regression in Spark with R

Naive Bayes with H2O on Hadoop with R

Running an H2O instance on Hadoop with R

Reading and exploring the data in H2O

Naive Bayes on H2O with R

Neural Networks with H2O on Hadoop with R

How do Neural Networks work?

Running Deep Learning models on H2O

Summary

9. The Future of R - Big, Fast, and Smart Data

The current state of Big Data analytics with R

Out-of-memory data on a single machine

Faster data processing with R

Hadoop with R

Spark with R

R with databases

Machine learning with R

The future of R

Big Data

Fast data

Smart data

Where to go next

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部