万本电子书0元读

万本电子书0元读

顶部广告

Apache Hive Essentials电子书

售       价:¥

15人正在读 | 0人评论 9.8

作       者:Dayong Du

出  版  社:Packt Publishing

出版时间:2018-06-30

字       数:26.5万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This book takes you on a fantastic journey to discover the attributes of big data using Apache Hive. About This Book ? Grasp the skills needed to write efficient Hive queries to analyze the Big Data ? Discover how Hive can coexist and work with other tools within the Hadoop ecosystem ? Uses practical, example-oriented scenarios to cover all the newly released features of Apache Hive 2.3.3 Who This Book Is For If you are a data analyst, developer, or simply someone who wants to quickly get started with Hive to explore and analyze Big Data in Hadoop, this is the book for you. Since Hive is an SQL-like language, some previous experience with SQL will be useful to get the most out of this book. What You Will Learn ? Create and set up the Hive environment ? Discover how to use Hive's definition language to describe data ? Discover interesting data by joining and filtering datasets in Hive ? Transform data by using Hive sorting, ordering, and functions ? Aggregate and sample data in different ways ? Boost Hive query performance and enhance data security in Hive ? Customize Hive to your needs by using user-defined functions and integrate it with other tools In Detail In this book, we prepare you for your journey into big data by frstly introducing you to backgrounds in the big data domain, alongwith the process of setting up and getting familiar with your Hive working environment. Next, the book guides you through discovering and transforming the values of big data with the help of examples. It also hones your skills in using the Hive language in an effcient manner. Toward the end, the book focuses on advanced topics, such as performance, security, and extensions in Hive, which will guide you on exciting adventures on this worthwhile big data journey. By the end of the book, you will be familiar with Hive and able to work effeciently to find solutions to big data problems Style and approach This book takes on a practical approach which will get you familiarized with Apache Hive and how to use it to efficiently to find solutions to your big data problems. This book covers crucial topics like performance, and data security in order to help you make the most of the Hive working environment.
目录展开

Title Page

Copyright and Credits

Apache Hive Essentials Second Edition

Dedication

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Overview of Big Data and Hive

A short history

Introducing big data

The relational and NoSQL databases versus Hadoop

Batch, real-time, and stream processing

Overview of the Hadoop ecosystem

Hive overview

Summary

Setting Up the Hive Environment

Installing Hive from Apache

Installing Hive from vendors

Using Hive in the cloud

Using the Hive command

Using the Hive IDE

Summary

Data Definition and Description

Understanding data types

Data type conversions

Data Definition Language

Database

Tables

Table creation

Table description

Table cleaning

Table alteration

Partitions

Buckets

Views

Summary

Data Correlation and Scope

Project data with SELECT

Filtering data with conditions

Linking data with JOIN

INNER JOIN

OUTER JOIN

Special joins

Combining data with UNION

Summary

Data Manipulation

Data exchanging with LOAD

Data exchange with INSERT

Data exchange with [EX|IM]PORT

Data sorting

Functions

Function tips for collections

Function tips for date and string

Virtual column functions

Transactions and locks

Transactions

UPDATE statement

DELETE statement

MERGE statement

Locks

Summary

Data Aggregation and Sampling

Basic aggregation

Enhanced aggregation

Grouping sets

Rollup and Cube

Aggregation condition

Window functions

Window aggregate functions

Window sort functions

Window analytics functions

Window expression

Sampling

Random sampling

Bucket table sampling

Block sampling

Summary

Performance Considerations

Performance utilities

EXPLAIN statement

ANALYZE statement

Logs

Design optimization

Partition table design

Bucket table design

Index design

Use skewed/temporary tables

Data optimization

File format

Compression

Storage optimization

Job optimization

Local mode

JVM reuse

Parallel execution

Join optimization

Common join

Map join

Bucket map join

Sort merge bucket (SMB) join

Sort merge bucket map (SMBM) join

Skew join

Job engine

Optimizer

Vectorization optimization

Cost-based optimization

Summary

Extensibility Considerations

User-defined functions

UDF code template

UDAF code template

UDTF code template

Development and deployment

HPL/SQL

Streaming

SerDe

Summary

Security Considerations

Authentication

Metastore authentication

Hiveserver2 authentication

Authorization

Legacy mode

Storage-based mode

SQL standard-based mode

Mask and encryption

The data-hashing function

The data-masking function

The data-encryption function

Other methods

Summary

Working with Other Tools

The JDBC/ODBC connector

NoSQL

The Hue/Ambari Hive view

HCatalog

Oozie

Spark

Hivemall

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部