万本电子书0元读

万本电子书0元读

顶部广告

Pig Design Patterns电子书

售       价:¥

3人正在读 | 0人评论 9.8

作       者:Pradeep Pasupuleti

出  版  社:Packt Publishing

出版时间:2014-04-17

字       数:474.0万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
A comprehensive practical guide that walks you through the multiple stages of data management in enterprise and gives you numerous design patterns with appropriate code examples to solve frequent problems in each of these stages. The chapters are organized to mimick the sequential data flow evidenced in Analytics platforms, but they can also be read independently to solve a particular group of problems in the Big Data life cycle. If you are an experienced developer who is already familiar with Pig and is looking for a use case standpoint where they can relate to the problems of data ingestion, profiling, cleansing, transforming, and egressing data encountered in the enterprises. Knowledge of Hadoop and Pig is necessary for readers to grasp the intricacies of Pig design patterns better.
目录展开

Pig Design Patterns

Table of Contents

Pig Design Patterns

Credits

Foreword

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

Motivation for this book

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Third-party libraries

Datasets

Errata

Piracy

Questions

1. Setting the Context for Design Patterns in Pig

Understanding design patterns

The scope of design patterns in Pig

Hadoop demystified – a quick reckoner

The enterprise context

Common challenges of distributed systems

The advent of Hadoop

Hadoop under the covers

Understanding the Hadoop Distributed File System

HDFS design goals

Working of HDFS

Understanding MapReduce

Understanding how MapReduce works

The MapReduce internals

Pig – a quick intro

Understanding the rationale of Pig

Understanding the relevance of Pig in the enterprise

Working of Pig – an overview

Firing up Pig

The use case

Code listing

The dataset

Understanding Pig through the code

Pig's extensibility

Operators used in code

The EXPLAIN operator

Understanding Pig's data model

Primitive types

Complex types

The relevance of schemas

Summary

2. Data Ingest and Egress Patterns

The context of data ingest and egress

Types of data in the enterprise

Ingest and egress patterns for multistructured data

Considerations for log ingestion

The Apache log ingestion pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Code for the CommonLogLoader class

Code for the CombinedLogLoader class

Results

Additional information

The Custom log ingestion pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The image ingress and egress pattern

Background

Motivation

Use cases

Pattern implementation

The image Ingress Implementation

The image egress implementation

Code snippets

The image ingress

Pig script

Image to a sequence UDF snippet

The image egress

Pig script

Sequence to an image UDF

Results

Additional information

The ingress and egress patterns for the NoSQL data

MongoDB ingress and egress patterns

Background

Motivation

Use cases

Pattern implementation

The ingress implementation

The egress implementation

Code snippets

The ingress code

The egress code

Results

Additional information

The HBase ingress and egress pattern

Background

Motivation

Use cases

Pattern implementation

The ingress implementation

The egress implementation

Code snippets

The ingress code

The egress code

Results

Additional information

The ingress and egress patterns for structured data

The Hive ingress and egress patterns

Background

Motivation

Use cases

Pattern implementation

The ingress implementation

The egress implementation

Code snippets

The ingress Code

Importing data using RCFile

Importing data using HCatalog

The egress code

Results

Additional information

The ingress and egress patterns for semi-structured data

The mainframe ingestion pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

XML ingest and egress patterns

Background

Motivation

Motivation for ingesting raw XML

Motivation for ingesting binary XML

Motivation for egression of XML

Use cases

Pattern implementation

The implementation of the XML raw ingestion

The implementation of the XML binary ingestion

Code snippets

The XML raw ingestion code

The XML binary ingestion code

The XML egress code

Pig script

The XML storage

Results

Additional information

JSON ingress and egress patterns

Background

Motivation

Use cases

Pattern implementation

The ingress implementation

The egress implementation

Code snippets

The ingress code

The code for simple JSON

The code for nested JSON

The egress code

Results

Additional information

Summary

3. Data Profiling Patterns

Data profiling for Big Data

Big Data profiling dimensions

Sampling considerations for profiling Big Data

Sampling support in Pig

Rationale for using Pig in data profiling

The data type inference pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Pig script

Java UDF

Results

Additional information

The basic statistical profiling pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Pig script

Macro

Results

Additional information

The pattern-matching pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Pig script

Macro

Results

Additional information

The string profiling pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Pig script

Macro

Results

Additional information

The unstructured text profiling pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Pig script

Java UDF for stemming

Java UDF for generating TF-IDF

Results

Additional information

Summary

4. Data Validation and Cleansing Patterns

Data validation and cleansing for Big Data

Choosing Pig for validation and cleansing

The constraint validation and cleansing design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The regex validation and cleansing design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The corrupt data validation and cleansing design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The unstructured text data validation and cleansing design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Summary

5. Data Transformation Patterns

Data transformation processes

The structured-to-hierarchical transformation pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The data normalization pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The data integration pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The aggregation pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The data generalization pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Summary

6. Understanding Data Reduction Patterns

Data reduction – a quick introduction

Data reduction considerations for Big Data

Dimensionality reduction – the Principal Component Analysis design pattern

Background

Motivation

Use cases

Pattern implementation

Limitations of PCA implementation

Code snippets

Results

Additional information

Numerosity reduction – the histogram design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Numerosity reduction – sampling design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Numerosity reduction – clustering design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Summary

7. Advanced Patterns and Future Work

The clustering pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The topic discovery pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The natural language processing pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The classification pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Future trends

Emergence of data-driven patterns

The emergence of solution-driven patterns

Patterns addressing programmability constraints

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部