售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Programming MapReduce with Scalding
Table of Contents
Programming MapReduce with Scalding
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introduction to MapReduce
The Hadoop platform
MapReduce
A MapReduce example
MapReduce abstractions
Introducing Cascading
What happens inside a pipe
Pipe assemblies
Cascading extensions
Summary
2. Get Ready for Scalding
Why Scala?
Scala basics
Scala build tools
Hello World in Scala
Development editors
Installing Hadoop in five minutes
Running our first Scalding job
Submitting a Scalding job in Hadoop
Summary
3. Scalding by Example
Reading and writing files
Best practices to read and write files
TextLine parsing
Executing in the local and Hadoop modes
Understanding the core capabilities of Scalding
Map-like operations
Join operations
Pipe operations
Grouping/reducing functions
Operations on groups
Composite operations
A simple example
Typed API
Summary
4. Intermediate Examples
Logfile analysis
Completing the implementation
Exploring ad targeting
Calculating daily points
Calculating historic points
Generating targeted ads
Summary
5. Scalding Design Patterns
The external operations pattern
The dependency injection pattern
The late bound dependency pattern
Summary
6. Testing and TDD
Introduction to testing
MapReduce testing challenges
Development lifecycle with testing strategy
TDD for Scalding developers
Implementing the TDD methodology
Decomposing the algorithm
Defining acceptance tests
Implementing integration tests
Implementing unit tests
Implementing the MapReduce logic
Defining and performing system tests
Black box testing
Summary
7. Running Scalding in Production
Executing Scalding in a Hadoop cluster
Scheduling execution
Coordinating job execution
Configuring using a property file
Configuring using Hadoop parameters
Monitoring Scalding jobs
Using slim JAR files
Scalding execution throttling
Summary
8. Using External Data Stores
Interacting with external systems
SQL databases
NoSQL databases
Understanding HBase
Reading from HBase
Writing in HBase
Using advanced HBase features
Search platforms
Elastic search
Summary
9. Matrix Calculations and Machine Learning
Text similarity using TF-IDF
Setting a similarity using the Jaccard index
K-Means using Mahout
Other libraries
Summary
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜