万本电子书0元读

万本电子书0元读

顶部广告

Apache Hive Essentials电子书

售       价:¥

20人正在读 | 0人评论 9.8

作       者:Dayong Du

出  版  社:Packt Publishing

出版时间:2015-02-26

字       数:54.6万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
If you are a data analyst, developer, or simply someone who wants to use Hive to explore and analyze data in Hadoop, this is the book for you. Whether you are new to big data or an expert, with this book, you will be able to master both the basic and the advanced features of Hive. Since Hive is an SQL-like language, some previous experience with the SQL language and databases is useful to have a better understanding of this book.
目录展开

Apache Hive Essentials

Table of Contents

Apache Hive Essentials

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Overview of Big Data and Hive

A short history

Introducing big data

Relational and NoSQL database versus Hadoop

Batch, real-time, and stream processing

Overview of the Hadoop ecosystem

Hive overview

Summary

2. Setting Up the Hive Environment

Installing Hive from Apache

Installing Hive from vendor packages

Starting Hive in the cloud

Using the Hive command line and Beeline

The Hive-integrated development environment

Summary

3. Data Definition and Description

Understanding Hive data types

Data type conversions

Hive Data Definition Language

Hive database

Hive internal and external tables

Hive partitions

Hive buckets

Hive views

Summary

4. Data Selection and Scope

The SELECT statement

The INNER JOIN statement

The OUTER JOIN and CROSS JOIN statements

Special JOIN – MAPJOIN

Set operation – UNION ALL

Summary

5. Data Manipulation

Data exchange – LOAD

Data exchange – INSERT

Data exchange – EXPORT and IMPORT

ORDER and SORT

Operators and functions

Transactions

Summary

6. Data Aggregation and Sampling

Basic aggregation – GROUP BY

Advanced aggregation – GROUPING SETS

Advanced aggregation – ROLLUP and CUBE

Aggregation condition – HAVING

Analytic functions

Sampling

Summary

7. Performance Considerations

Performance utilities

The EXPLAIN statement

The ANALYZE statement

Design optimization

Partition tables

Bucket tables

Index

Data file optimization

File format

Compression

Storage optimization

Job and query optimization

Local mode

JVM reuse

Parallel execution

Join optimization

Common join

Map join

Bucket map join

Sort merge bucket (SMB) join

Sort merge bucket map (SMBM) join

Skew join

Summary

8. Extensibility Considerations

User-defined functions

The UDF code template

The UDAF code template

The UDTF code template

Development and deployment

Streaming

SerDe

Summary

9. Security Considerations

Authentication

Metastore server authentication

HiveServer2 authentication

Authorization

Legacy mode

Storage-based mode

SQL standard-based mode

Encryption

Summary

10. Working with Other Tools

JDBC / ODBC connector

HBase

Hue

HCatalog

ZooKeeper

Oozie

Hive roadmap

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部