售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
HDInsight Essentials Second Edition
Table of Contents
HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Instant updates on new Packt books
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Hadoop and HDInsight in a Heartbeat
Data is everywhere
Business value of big data
Hadoop concepts
Brief history of Hadoop
Core components
Hadoop cluster layout
HDFS overview
Writing a file to HDFS
Reading a file from HDFS
HDFS basic commands
YARN overview
YARN application life cycle
YARN workloads
Hadoop distributions
HDInsight overview
HDInsight and Hadoop relationship
Hadoop on Windows deployment options
Microsoft Azure HDInsight Service
HDInsight Emulator
Hortonworks Data Platform (HDP) for Windows
Summary
2. Enterprise Data Lake using HDInsight
Enterprise Data Warehouse architecture
Source systems
Data warehouse
Storage
Processing
User access
Provisioning and monitoring
Data governance and security
Pain points of EDW
The next generation Hadoop-based Enterprise data architecture
Source systems
Data Lake
Storage
Processing
User access
Provisioning and monitoring
Data governance, security, and metadata
Journey to your Data Lake dream
Ingestion and organization
Transformation (rules driven)
Access, analyze, and report
Tools and technology for Hadoop ecosystem
Use case powered by Microsoft HDInsight
Problem statement
Solution
Source systems
Storage
Processing
User access
Benefits
Summary
3. HDInsight Service on Azure
Registering for an Azure account
Azure storage
Provisioning an HDInsight cluster
Cluster topology
Provisioning using Azure PowerShell
Creating a storage container
Provisioning a new HDInsight cluster
HDInsight management dashboard
Dashboard
Monitor
Configuration
Exploring clusters using the remote desktop
Running a sample MapReduce
Deleting the cluster
HDInsight Emulator for the development
Installing HDInsight Emulator
Installation verification
Using HDInsight Emulator
Summary
4. Administering Your HDInsight Cluster
Monitoring cluster health
Name Node status
The Name Node Overview page
Datanode Status
Utilities and logs
Hadoop Service Availability
YARN Application Status
Azure storage management
Configuring your storage account
Monitoring your storage account
Managing access keys
Deleting your storage account
Azure PowerShell
Access Azure Blob storage using Azure PowerShell
Summary
5. Ingest and Organize Data Lake
End-to-end Data Lake solution
Ingesting to Data Lake using HDFS command
Connecting to a Hadoop client
Getting your files on the local storage
Transferring to HDFS
Loading data to Azure Blob storage using Azure PowerShell
Loading files to Data Lake using GUI tools
Storage access keys
Storage tools
CloudXplorer
Key benefits
Registering your storage account
Uploading files to your Blob storage
Using Sqoop to move data from RDBMS to Data Lake
Key benefits
Two modes of using Sqoop
Using Sqoop to import data (SQL to Hadoop)
Organizing your Data Lake in HDFS
Managing file metadata using HCatalog
Key benefits
Using HCatalog Command Line to create tables
Summary
6. Transform Data in the Data Lake
Transformation overview
Tools for transforming data in Data Lake
HCatalog
Persisting HCatalog metastore in a SQL database
Apache Hive
Hive architecture
Starting Hive in HDInsight
Basic Hive commands
Apache Pig
Pig architecture
Starting Pig in HDInsight node
Basic Pig commands
Pig or Hive
MapReduce
The mapper code
The reducer code
The driver code
Executing MapReduce on HDInsight
Azure PowerShell for execution of Hadoop jobs
Transformation for the OTP project
Cleaning data using Pig
Executing Pig script
Registering a refined and aggregate table using Hive
Executing Hive script
Reviewing results
Other tools used for transformation
Oozie
Spark
Summary
7. Analyze and Report from Data Lake
Data access overview
Analysis using Excel and Microsoft Hive ODBC driver
Prerequisites
Step 1 – installing the Microsoft Hive ODBC driver
Step 2 – creating Hive ODBC Data Source
Step 3 – importing data to Excel
Analysis using Excel Power Query
Prerequisites
Step 1 – installing the Microsoft Power Query for Excel
Step 2 – importing Azure Blob storage data into Excel
Step 3 – analyzing data using Excel
Other BI features in Excel
PowerPivot
Power View and Power Map
Step 1 – importing Azure Blob storage data into Excel
Step 2 – launch map view
Step 3 – configure the map
Power BI Catalog
Ad hoc analysis using Hive
Other alternatives for analysis
RHadoop
Apache Giraph
Apache Mahout
Azure Machine Learning
Summary
8. HDInsight 3.1 New Features
HBase
HBase positioning in Data Lake and use cases
Provisioning HDInsight HBase cluster
Creating a sample HBase schema
Designing the airline on-time performance table
Connecting to HBase using the HBase shell
Creating an HBase table
Loading data to the HBase table
Querying data from the HBase table
HBase additional information
Storm
Storm positioning in Data Lake
Storm key concepts
Provisioning HDInsight Storm cluster
Running a sample Storm topology
Connecting to Storm using Storm shell
Running the Storm Wordcount topology
Monitoring status of the Wordcount topology
Additional information on Storm
Apache Tez
Summary
9. Strategy for a Successful Data Lake Implementation
Challenges on building a production Data Lake
The success path for a production Data Lake
Identifying the big data problem
Proof of technology for Data Lake
Form a Data Lake Center of Excellence
Executive sponsors
Data Lake consumers
Development
Operations and infrastructure
Architectural considerations
Extensible and modular
Metadata-driven solution
Integration strategy
Security
Online resources
Summary
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜