万本电子书0元读

万本电子书0元读

顶部广告

Hadoop Real-World Solutions Cookbook - Second Edition电子书

售       价:¥

3人正在读 | 0人评论 9.8

作       者:Tanmay Deshpande

出  版  社:Packt Publishing

出版时间:2016-03-31

字       数:142.8万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Over 90 hands-on recipes to help you learn and master the intricacies of Apache Hadoop 2.X, YARN, Hive, Pig, Oozie, Flume, Sqoop, Apache Spark, and Mahout About This Book Implement outstanding Machine Learning use cases on your own analytics models and processes. Solutions to common problems when working with the Hadoop ecosystem. Step-by-step implementation of end-to-end big data use cases. Who This Book Is For Readers who have a basic knowledge of big data systems and want to advance their knowledge with hands-on recipes. What You Will Learn Installing and maintaining Hadoop 2.X cluster and its ecosystem. Write advanced Map Reduce programs and understand design patterns. Advanced Data Analysis using the Hive, Pig, and Map Reduce programs. Import and export data from various sources using Sqoop and Flume. Data storage in various file formats such as Text, Sequential, Parquet, ORC, and RC Files. Machine learning principles with libraries such as Mahout Batch and Stream data processing using Apache Spark In Detail Big data is the current requirement. Most organizations produce huge amount of data every day. With the arrival of Hadoop-like tools, it has become easier for everyone to solve big data problems with great efficiency and at minimal cost. Grasping Machine Learning techniques will help you greatly in building predictive models and using this data to make the right decisions for your organization. Hadoop Real World Solutions Cookbook gives readers insights into learning and mastering big data via recipes. The book not only clarifies most big data tools in the market but also provides best practices for using them. The book provides recipes that are based on the latest versions of Apache Hadoop 2.X, YARN, Hive, Pig, Sqoop, Flume, Apache Spark, Mahout and many more such ecosystem tools. This real-world-solution cookbook is packed with handy recipes you can apply to your own everyday issues. Each chapter provides in-depth recipes that can be referenced easily. This book provides detailed practices on the latest technologies such as YARN and Apache Spark. Readers will be able to consider themselves as big data experts on completion of this book. This guide is an invaluable tutorial if you are planning to implement a big data warehouse for your business. Style and approach An easy-to-follow guide that walks you through world of big data. Each tool in the Hadoop ecosystem is explained in detail and the recipes are placed in such a manner that readers can implement them sequentially. Plenty of reference links are provided for advanced reading.
目录展开

Hadoop Real-World Solutions Cookbook Second Edition

Table of Contents

Hadoop Real-World Solutions Cookbook Second Edition

Credits

About the Author

Acknowledgements

About the Reviewer

www.PacktPub.com

eBooks, discount offers, and more

Why Subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started with Hadoop 2.X

Introduction

Installing a single-node Hadoop Cluster

Getting ready

How to do it...

How it works...

Hadoop Distributed File System (HDFS)

Yet Another Resource Negotiator (YARN)

There's more

Installing a multi-node Hadoop cluster

Getting ready

How to do it...

How it works...

Adding new nodes to existing Hadoop clusters

Getting ready

How to do it...

How it works...

Executing the balancer command for uniform data distribution

Getting ready

How to do it...

How it works...

There's more...

Entering and exiting from the safe mode in a Hadoop cluster

How to do it...

How it works...

Decommissioning DataNodes

Getting ready

How to do it...

How it works...

Performing benchmarking on a Hadoop cluster

Getting ready

How to do it...

TestDFSIO

NNBench

MRBench

How it works...

2. Exploring HDFS

Introduction

Loading data from a local machine to HDFS

Getting ready

How to do it...

How it works...

Exporting HDFS data to a local machine

Getting ready

How to do it...

How it works...

Changing the replication factor of an existing file in HDFS

Getting ready

How to do it...

How it works...

Setting the HDFS block size for all the files in a cluster

Getting ready

How to do it...

How it works...

Setting the HDFS block size for a specific file in a cluster

Getting ready

How to do it...

How it works...

Enabling transparent encryption for HDFS

Getting ready

How to do it...

How it works...

Importing data from another Hadoop cluster

Getting ready

How to do it...

How it works...

Recycling deleted data from trash to HDFS

Getting ready

How to do it...

How it works...

Saving compressed data in HDFS

Getting ready

How to do it...

How it works...

3. Mastering Map Reduce Programs

Introduction

Writing the Map Reduce program in Java to analyze web log data

Getting ready

How to do it...

How it works...

Executing the Map Reduce program in a Hadoop cluster

Getting ready

How to do it

How it works...

Adding support for a new writable data type in Hadoop

Getting ready

How to do it...

How it works...

Implementing a user-defined counter in a Map Reduce program

Getting ready

How to do it...

How it works...

Map Reduce program to find the top X

Getting ready

How to do it...

How it works

Map Reduce program to find distinct values

Getting ready

How to do it

How it works...

Map Reduce program to partition data using a custom partitioner

Getting ready

How to do it...

How it works...

Writing Map Reduce results to multiple output files

Getting ready

How to do it...

How it works...

Performing Reduce side Joins using Map Reduce

Getting ready

How to do it

How it works...

Unit testing the Map Reduce code using MRUnit

Getting ready

How to do it...

How it works...

4. Data Analysis Using Hive, Pig, and Hbase

Introduction

Storing and processing Hive data in a sequential file format

Getting ready

How to do it...

How it works...

Storing and processing Hive data in the RC file format

Getting ready

How to do it...

How it works...

Storing and processing Hive data in the ORC file format

Getting ready

How to do it...

How it works...

Storing and processing Hive data in the Parquet file format

Getting ready

How to do it...

How it works...

Performing FILTER By queries in Pig

Getting ready

How to do it...

How it works...

Performing Group By queries in Pig

Getting ready

How to do it...

How it works...

Performing Order By queries in Pig

Getting ready

How to do it..

How it works...

Performing JOINS in Pig

Getting ready

How to do it...

How it works

Replicated Joins

Skewed Joins

Merge Joins

Writing a user-defined function in Pig

Getting ready

How to do it...

How it works...

There's more...

Analyzing web log data using Pig

Getting ready

How to do it...

How it works...

Performing the Hbase operation in CLI

Getting ready

How to do it

How it works...

Performing Hbase operations in Java

Getting ready

How to do it

How it works...

Executing the MapReduce programming with an Hbase Table

Getting ready

How to do it

How it works

5. Advanced Data Analysis Using Hive

Introduction

Processing JSON data in Hive using JSON SerDe

Getting ready

How to do it...

How it works...

Processing XML data in Hive using XML SerDe

Getting ready

How to do it...

How it works

Processing Hive data in the Avro format

Getting ready

How to do it...

How it works...

Writing a user-defined function in Hive

Getting ready

How to do it

How it works...

Performing table joins in Hive

Getting ready

How to do it...

Left outer join

Right outer join

Full outer join

Left semi join

How it works...

Executing map side joins in Hive

Getting ready

How to do it...

How it works...

Performing context Ngram in Hive

Getting ready

How to do it...

How it works...

Call Data Record Analytics using Hive

Getting ready

How to do it...

How it works...

Twitter sentiment analysis using Hive

Getting ready

How to do it...

How it works

Implementing Change Data Capture using Hive

Getting ready

How to do it

How it works

Multiple table inserting using Hive

Getting ready

How to do it

How it works

6. Data Import/Export Using Sqoop and Flume

Introduction

Importing data from RDMBS to HDFS using Sqoop

Getting ready

How to do it...

How it works...

Exporting data from HDFS to RDBMS

Getting ready

How to do it...

How it works...

Using query operator in Sqoop import

Getting ready

How to do it...

How it works...

Importing data using Sqoop in compressed format

Getting ready

How to do it...

How it works...

Performing Atomic export using Sqoop

Getting ready

How to do it...

How it works...

Importing data into Hive tables using Sqoop

Getting ready

How to do it...

How it works...

Importing data into HDFS from Mainframes

Getting ready

How to do it...

How it works...

Incremental import using Sqoop

Getting ready

How to do it...

How it works...

Creating and executing Sqoop job

Getting ready

How to do it...

How it works...

Importing data from RDBMS to Hbase using Sqoop

Getting ready

How to do it...

How it works...

Importing Twitter data into HDFS using Flume

Getting ready

How to do it...

How it works

Importing data from Kafka into HDFS using Flume

Getting ready

How to do it...

How it works

Importing web logs data into HDFS using Flume

Getting ready

How to do it...

How it works...

7. Automation of Hadoop Tasks Using Oozie

Introduction

Implementing a Sqoop action job using Oozie

Getting ready

How to do it...

How it works

Implementing a Map Reduce action job using Oozie

Getting ready

How to do it...

How it works...

Implementing a Java action job using Oozie

Getting ready

How to do it

How it works

Implementing a Hive action job using Oozie

Getting ready

How to do it...

How it works...

Implementing a Pig action job using Oozie

Getting ready

How to do it...

How it works

Implementing an e-mail action job using Oozie

Getting ready

How to do it...

How it works...

Executing parallel jobs using Oozie (fork)

Getting ready

How to do it...

How it works...

Scheduling a job in Oozie

Getting ready

How to do it...

How it works...

8. Machine Learning and Predictive Analytics Using Mahout and R

Introduction

Setting up the Mahout development environment

Getting ready

How to do it...

How it works...

Creating an item-based recommendation engine using Mahout

Getting ready

How to do it...

How it works...

Creating a user-based recommendation engine using Mahout

Getting ready

How to do it...

How it works...

Predictive analytics on Bank Data using Mahout

Getting ready

How to do it...

How it works...

Text data clustering using K-Means using Mahout

Getting ready

How to do it...

How it works...

Population Data Analytics using R

Getting ready

How to do it...

How it works...

Twitter Sentiment Analytics using R

Getting ready

How to do it...

How it works...

Performing Predictive Analytics using R

Getting ready

How to do it...

How it works...

9. Integration with Apache Spark

Introduction

Running Spark standalone

Getting ready

How to do it...

How it works...

Running Spark on YARN

Getting ready

How to do it...

How it works...

Performing Olympics Athletes analytics using the Spark Shell

Getting ready

How to do it...

How it works...

Creating Twitter trending topics using Spark Streaming

Getting ready

How to do it...

How it works...

Twitter trending topics using Spark streaming

Getting ready

How to do it...

How it works...

Analyzing Parquet files using Spark

Getting ready

How to do it...

How it works...

Analyzing JSON data using Spark

Getting ready

How to do it...

How it works...

Processing graphs using Graph X

Getting ready

How to do it...

How it works...

Conducting predictive analytics using Spark MLib

Getting ready

How to do it...

How it works...

10. Hadoop Use Cases

Introduction

Call Data Record analytics

Getting ready

How to do it...

Problem Statement

Solution

How it works...

Web log analytics

Getting ready

How to do it...

Problem statement

Solution

How it works...

Sensitive data masking and encryption using Hadoop

Getting ready

How to do it...

Problem statement

Solution

How it works...

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部