当当云阅读 > 进口书 > 外文原版书 > 电脑/网络 > Getting Started with Greenplum for Big Data Analytics

| | 手机阅读

扫描下载当当云阅读App

Getting Started with Greenplum for Big Data Analytics电子书

售价：¥

19人正在读 | 0人评论

9.8

作者：Sunila Gollapudi

出版社：Packt Publishing

出版时间：2013-10-23

字数：157.1万

所属分类：进口书 > 外文原版书 > 电脑/网络

温馨提示：数字商品不支持退换货，不提供源文件，不支持导出打印

为你推荐

读书简介
目录
累计评论(0条)

读书简介
目录
累计评论(0条)

Standard tutorial-based approach."Getting Started with Greenplum for Big Data" Analytics is great for data scientists and data analysts with a basic knowledge of Data Warehousing and Business Intelligence platforms who are new to Big Data and who are looking to get a good grounding in how to use the Greenplum Platform. It’s assumed that you will have some experience with database design and programming as well as be familiar with analytics tools like R and Weka.

目录展开

Getting Started with Greenplum for Big Data Analytics

Table of Contents

Getting Started with Greenplum for Big Data Analytics

Credits

Foreword

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Instant Updates on New Packt Books

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Errata

Piracy

Questions

1. Big Data, Analytics, and Data Science Life Cycle

Enterprise data

Classification

Features

Big Data

So, what is Big Data?

Multi-structured data

Data analytics

Data science

Data science life cycle

Phase 1 – state business problem

Phase 2 – set up data

Phase 3 – explore/transform data

Phase 4 – model

Phase 5 – publish insights

Phase 6 – measure effectiveness

References/Further reading

Summary

2. Greenplum Unified Analytics Platform (UAP)

Big Data analytics – platform requirements

Greenplum Unified Analytics Platform (UAP)

Core components

Greenplum Database

Hadoop (HD)

Chorus

Command Center

Modules

Database modules

HD modules

Data Integration Accelerator (DIA) modules

Core architecture concepts

Data warehousing

Column-oriented databases

Parallel versus distributed computing/processing

Shared nothing, massive parallel processing (MPP) systems, and elastic scalability

Shared disk data architecture

Shared memory data architecture

Shared nothing data architecture

Data loading patterns

Greenplum UAP components

Greenplum Database

The Greenplum Database physical architecture

The Greenplum high-availability architecture

High-speed data loading using external tables

External table types

Polymorphic data storage and historic data management

Data distribution

Hadoop (HD)

Hadoop Distributed File System (HDFS)

Hadoop MapReduce

Chorus

Greenplum Data Computing Appliance (DCA)

Greenplum Data Integration Accelerator (DIA)

References/Further reading

Summary

3. Advanced Analytics – Paradigms, Tools, and Techniques

Analytic paradigms

Descriptive analytics

Predictive analytics

Prescriptive analytics

Analytics classified

Classification

Forecasting or prediction or regression

Clustering

Optimization

Simulations

Modeling methods

Decision trees

Association rules

The Apriori algorithm

Linear regression

Logistic regression

The Naive Bayesian classifier

K-means clustering

Text analysis

R programming

Weka

In-database analytics using MADlib

References/Further reading

Summary

4. Implementing Analytics with Greenplum UAP

Data loading for Greenplum Database and HD

Greenplum data loading options

External tables

gpfdist

gpload

Hadoop (HD) data loading options

Sqoop 2

Greenplum BulkLoader for Hadoop

Using external ETL to load data into Greenplum

Extraction, Load, and Transformation (ELT) and Extraction, Transformation, Load, and Transformation (ETLT)

Greenplum target configuration

Sourcing large volumes of data from Greenplum

Unsupported Greenplum data types

Push Down Optimization (PDO)

Greenplum table distribution and partitioning

Distribution

Data skew and performance

Optimizing the broadcast or redistribution motion for data co-location

Partitioning

Querying Greenplum Database and HD

Querying Greenplum Database

Analyzing and optimizing queries

The ANALYZE function

The EXPLAIN function

Dynamic Pipelining in Greenplum

Querying HDFS

Hive

Pig

Data communication between Greenplum Database and Hadoop (using external tables)

Data Computing Appliance (DCA)

Storage design, disk protection, and fault tolerance

Master server RAID configurations

Segment server RAID configurations

Monitoring DCA

Greenplum Database management

In-database analytics options (Greenplum-specific)

Window functions

The PARTITION BY clause

The ORDER BY clause

The OVER (ORDER BY…) clause

Creating, modifying, and dropping functions

User-defined aggregates

Using R with Greenplum

DBI Connector for R

PL/R

Using Weka with Greenplum

Using MADlib with Greenplum

Using Greenplum Chorus

Pivotal

References/Further reading

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论，分享你的想法吧！

发表评论

买过这本书的人还买过

读了这本书的人还在读

支持设备

Business Process Management with JBoss jBPM ￥90.46

Matt Cumberlidge

￥90.46

Instant PLC Programming with RSLogix 5000 ￥50.13

Austin Scott

￥50.13

Tkinter GUI Programming by Example ￥90.46

David Love

￥90.46

OpenCV 4 Computer Vision Application Programming Cookbook ￥70.84

David Millán Escrivá

￥70.84

Supervised Machine Learning with Python ￥44.68

Taylor Smith

￥44.68

Python for Secret Agents ￥50.13

Steven F. Lott

￥50.13

Matplotlib for Python Developers ￥80.65

Sandro Tosi

￥80.65

Ceph Cookbook ￥80.65

Karan Singh

￥80.65

Java: Advanced Guide to Programming Code with Java ￥24.44

Charlie Masterson

￥24.44

Algorithms To Live By: The Computer Science of Human Decisions ￥76.91

Brian Christian

￥76.91

更多同类图书 >