售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Pig Design Patterns
Table of Contents
Pig Design Patterns
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
Motivation for this book
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Third-party libraries
Datasets
Errata
Piracy
Questions
1. Setting the Context for Design Patterns in Pig
Understanding design patterns
The scope of design patterns in Pig
Hadoop demystified – a quick reckoner
The enterprise context
Common challenges of distributed systems
The advent of Hadoop
Hadoop under the covers
Understanding the Hadoop Distributed File System
HDFS design goals
Working of HDFS
Understanding MapReduce
Understanding how MapReduce works
The MapReduce internals
Pig – a quick intro
Understanding the rationale of Pig
Understanding the relevance of Pig in the enterprise
Working of Pig – an overview
Firing up Pig
The use case
Code listing
The dataset
Understanding Pig through the code
Pig's extensibility
Operators used in code
The EXPLAIN operator
Understanding Pig's data model
Primitive types
Complex types
The relevance of schemas
Summary
2. Data Ingest and Egress Patterns
The context of data ingest and egress
Types of data in the enterprise
Ingest and egress patterns for multistructured data
Considerations for log ingestion
The Apache log ingestion pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Code for the CommonLogLoader class
Code for the CombinedLogLoader class
Results
Additional information
The Custom log ingestion pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
The image ingress and egress pattern
Background
Motivation
Use cases
Pattern implementation
The image Ingress Implementation
The image egress implementation
Code snippets
The image ingress
Pig script
Image to a sequence UDF snippet
The image egress
Pig script
Sequence to an image UDF
Results
Additional information
The ingress and egress patterns for the NoSQL data
MongoDB ingress and egress patterns
Background
Motivation
Use cases
Pattern implementation
The ingress implementation
The egress implementation
Code snippets
The ingress code
The egress code
Results
Additional information
The HBase ingress and egress pattern
Background
Motivation
Use cases
Pattern implementation
The ingress implementation
The egress implementation
Code snippets
The ingress code
The egress code
Results
Additional information
The ingress and egress patterns for structured data
The Hive ingress and egress patterns
Background
Motivation
Use cases
Pattern implementation
The ingress implementation
The egress implementation
Code snippets
The ingress Code
Importing data using RCFile
Importing data using HCatalog
The egress code
Results
Additional information
The ingress and egress patterns for semi-structured data
The mainframe ingestion pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
XML ingest and egress patterns
Background
Motivation
Motivation for ingesting raw XML
Motivation for ingesting binary XML
Motivation for egression of XML
Use cases
Pattern implementation
The implementation of the XML raw ingestion
The implementation of the XML binary ingestion
Code snippets
The XML raw ingestion code
The XML binary ingestion code
The XML egress code
Pig script
The XML storage
Results
Additional information
JSON ingress and egress patterns
Background
Motivation
Use cases
Pattern implementation
The ingress implementation
The egress implementation
Code snippets
The ingress code
The code for simple JSON
The code for nested JSON
The egress code
Results
Additional information
Summary
3. Data Profiling Patterns
Data profiling for Big Data
Big Data profiling dimensions
Sampling considerations for profiling Big Data
Sampling support in Pig
Rationale for using Pig in data profiling
The data type inference pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Pig script
Java UDF
Results
Additional information
The basic statistical profiling pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Pig script
Macro
Results
Additional information
The pattern-matching pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Pig script
Macro
Results
Additional information
The string profiling pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Pig script
Macro
Results
Additional information
The unstructured text profiling pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Pig script
Java UDF for stemming
Java UDF for generating TF-IDF
Results
Additional information
Summary
4. Data Validation and Cleansing Patterns
Data validation and cleansing for Big Data
Choosing Pig for validation and cleansing
The constraint validation and cleansing design pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
The regex validation and cleansing design pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
The corrupt data validation and cleansing design pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
The unstructured text data validation and cleansing design pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
Summary
5. Data Transformation Patterns
Data transformation processes
The structured-to-hierarchical transformation pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
The data normalization pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
The data integration pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
The aggregation pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
The data generalization pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
Summary
6. Understanding Data Reduction Patterns
Data reduction – a quick introduction
Data reduction considerations for Big Data
Dimensionality reduction – the Principal Component Analysis design pattern
Background
Motivation
Use cases
Pattern implementation
Limitations of PCA implementation
Code snippets
Results
Additional information
Numerosity reduction – the histogram design pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
Numerosity reduction – sampling design pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
Numerosity reduction – clustering design pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
Summary
7. Advanced Patterns and Future Work
The clustering pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
The topic discovery pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
The natural language processing pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
The classification pattern
Background
Motivation
Use cases
Pattern implementation
Code snippets
Results
Additional information
Future trends
Emergence of data-driven patterns
The emergence of solution-driven patterns
Patterns addressing programmability constraints
Summary
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜