万本电子书0元读

万本电子书0元读

顶部广告

Getting Started with Greenplum for Big Data Analytics电子书

售       价:¥

19人正在读 | 0人评论 9.8

作       者:Sunila Gollapudi

出  版  社:Packt Publishing

出版时间:2013-10-23

字       数:157.1万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Standard tutorial-based approach."Getting Started with Greenplum for Big Data" Analytics is great for data scientists and data analysts with a basic knowledge of Data Warehousing and Business Intelligence platforms who are new to Big Data and who are looking to get a good grounding in how to use the Greenplum Platform. It’s assumed that you will have some experience with database design and programming as well as be familiar with analytics tools like R and Weka.
目录展开

Getting Started with Greenplum for Big Data Analytics

Table of Contents

Getting Started with Greenplum for Big Data Analytics

Credits

Foreword

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Instant Updates on New Packt Books

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Errata

Piracy

Questions

1. Big Data, Analytics, and Data Science Life Cycle

Enterprise data

Classification

Features

Big Data

So, what is Big Data?

Multi-structured data

Data analytics

Data science

Data science life cycle

Phase 1 – state business problem

Phase 2 – set up data

Phase 3 – explore/transform data

Phase 4 – model

Phase 5 – publish insights

Phase 6 – measure effectiveness

References/Further reading

Summary

2. Greenplum Unified Analytics Platform (UAP)

Big Data analytics – platform requirements

Greenplum Unified Analytics Platform (UAP)

Core components

Greenplum Database

Hadoop (HD)

Chorus

Command Center

Modules

Database modules

HD modules

Data Integration Accelerator (DIA) modules

Core architecture concepts

Data warehousing

Column-oriented databases

Parallel versus distributed computing/processing

Shared nothing, massive parallel processing (MPP) systems, and elastic scalability

Shared disk data architecture

Shared memory data architecture

Shared nothing data architecture

Data loading patterns

Greenplum UAP components

Greenplum Database

The Greenplum Database physical architecture

The Greenplum high-availability architecture

High-speed data loading using external tables

External table types

Polymorphic data storage and historic data management

Data distribution

Hadoop (HD)

Hadoop Distributed File System (HDFS)

Hadoop MapReduce

Chorus

Greenplum Data Computing Appliance (DCA)

Greenplum Data Integration Accelerator (DIA)

References/Further reading

Summary

3. Advanced Analytics – Paradigms, Tools, and Techniques

Analytic paradigms

Descriptive analytics

Predictive analytics

Prescriptive analytics

Analytics classified

Classification

Forecasting or prediction or regression

Clustering

Optimization

Simulations

Modeling methods

Decision trees

Association rules

The Apriori algorithm

Linear regression

Logistic regression

The Naive Bayesian classifier

K-means clustering

Text analysis

R programming

Weka

In-database analytics using MADlib

References/Further reading

Summary

4. Implementing Analytics with Greenplum UAP

Data loading for Greenplum Database and HD

Greenplum data loading options

External tables

gpfdist

gpload

Hadoop (HD) data loading options

Sqoop 2

Greenplum BulkLoader for Hadoop

Using external ETL to load data into Greenplum

Extraction, Load, and Transformation (ELT) and Extraction, Transformation, Load, and Transformation (ETLT)

Greenplum target configuration

Sourcing large volumes of data from Greenplum

Unsupported Greenplum data types

Push Down Optimization (PDO)

Greenplum table distribution and partitioning

Distribution

Data skew and performance

Optimizing the broadcast or redistribution motion for data co-location

Partitioning

Querying Greenplum Database and HD

Querying Greenplum Database

Analyzing and optimizing queries

The ANALYZE function

The EXPLAIN function

Dynamic Pipelining in Greenplum

Querying HDFS

Hive

Pig

Data communication between Greenplum Database and Hadoop (using external tables)

Data Computing Appliance (DCA)

Storage design, disk protection, and fault tolerance

Master server RAID configurations

Segment server RAID configurations

Monitoring DCA

Greenplum Database management

In-database analytics options (Greenplum-specific)

Window functions

The PARTITION BY clause

The ORDER BY clause

The OVER (ORDER BY…) clause

Creating, modifying, and dropping functions

User-defined aggregates

Using R with Greenplum

DBI Connector for R

PL/R

Using Weka with Greenplum

Using MADlib with Greenplum

Using Greenplum Chorus

Pivotal

References/Further reading

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部