万本电子书0元读

万本电子书0元读

顶部广告

Spark Cookbook电子书

售       价:¥

10人正在读 | 4人评论 6.2

作       者:Rishi Yadav

出  版  社:Packt Publishing

出版时间:2015-07-27

字       数:65.5万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(3条)
  • 读书简介
  • 目录
  • 累计评论(3条)
If you are a data engineer, an application developer, or a data scientist who would like to leverage the power of Apache Spark to get better insights from big data, then this is the book for you.
目录展开

Spark Cookbook

Table of Contents

Spark Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started with Apache Spark

Introduction

Installing Spark from binaries

Getting ready

How to do it...

Building the Spark source code with Maven

Getting ready

How to do it...

Launching Spark on Amazon EC2

Getting ready

How to do it...

See also

Deploying on a cluster in standalone mode

Getting ready

How to do it...

How it works...

See also

Deploying on a cluster with Mesos

How to do it...

Deploying on a cluster with YARN

Getting ready

How to do it...

How it works…

Using Tachyon as an off-heap storage layer

How to do it...

See also

2. Developing Applications with Spark

Introduction

Exploring the Spark shell

How to do it...

Developing Spark applications in Eclipse with Maven

Getting ready

How to do it...

Developing Spark applications in Eclipse with SBT

How to do it...

Developing a Spark application in IntelliJ IDEA with Maven

How to do it...

Developing a Spark application in IntelliJ IDEA with SBT

How to do it...

3. External Data Sources

Introduction

Loading data from the local filesystem

How to do it...

Loading data from HDFS

How to do it...

There's more…

Loading data from HDFS using a custom InputFormat

How to do it...

Loading data from Amazon S3

How to do it...

Loading data from Apache Cassandra

How to do it...

There's more...

Merge strategies in sbt-assembly

Loading data from relational databases

Getting ready

How to do it...

How it works…

4. Spark SQL

Introduction

Understanding the Catalyst optimizer

How it works…

Analysis

Logical plan optimization

Physical planning

Code generation

Creating HiveContext

Getting ready

How to do it...

Inferring schema using case classes

How to do it...

Programmatically specifying the schema

How to do it...

How it works…

Loading and saving data using the Parquet format

How to do it...

How it works…

There's more…

Loading and saving data using the JSON format

How to do it...

How it works…

There's more…

Loading and saving data from relational databases

Getting ready

How to do it...

Loading and saving data from an arbitrary source

How to do it...

There's more…

5. Spark Streaming

Introduction

Word count using Streaming

How to do it...

Streaming Twitter data

How to do it...

Streaming using Kafka

Getting ready

How to do it...

There's more…

6. Getting Started with Machine Learning Using MLlib

Introduction

Creating vectors

How to do it…

How it works...

Creating a labeled point

How to do it…

Creating matrices

How to do it…

Calculating summary statistics

How to do it…

Calculating correlation

Getting ready

How to do it…

Doing hypothesis testing

How to do it…

Creating machine learning pipelines using ML

Getting ready

How to do it…

7. Supervised Learning with MLlib – Regression

Introduction

Using linear regression

Getting ready

How to do it…

Understanding cost function

Doing linear regression with lasso

How to do it…

Doing ridge regression

How to do it…

8. Supervised Learning with MLlib – Classification

Introduction

Doing classification using logistic regression

Getting ready

How to do it…

Doing binary classification using SVM

How to do it…

Doing classification using decision trees

Getting ready

How to do it…

How it works…

Doing classification using Random Forests

Getting ready

How to do it…

How it works…

Doing classification using Gradient Boosted Trees

Getting ready

How to do it…

Doing classification with Naïve Bayes

Getting ready

How to do it…

9. Unsupervised Learning with MLlib

Introduction

Clustering using k-means

Getting ready

How to do it…

Dimensionality reduction with principal component analysis

Getting ready

How to do it…

Dimensionality reduction with singular value decomposition

Getting ready

How to do it…

10. Recommender Systems

Introduction

Collaborative filtering using explicit feedback

Getting ready

How to do it…

Collaborative filtering using implicit feedback

Getting ready

How to do it…

How it works…

There's more…

11. Graph Processing Using GraphX

Introduction

Fundamental operations on graphs

Getting ready

How to do it…

Using PageRank

Getting ready

How to do it…

Finding connected components

Getting ready

How to do it…

Performing neighborhood aggregation

Getting ready

How to do it…

12. Optimizations and Performance Tuning

Introduction

Optimizing memory

Using compression to improve performance

Using serialization to improve performance

How to do it…

Optimizing garbage collection

How to do it…

Optimizing the level of parallelism

How to do it…

Understanding the future of optimization – project Tungsten

Manual memory management by leverage application semantics

Using algorithms and data structures

Code generation

Index

累计评论(3条) 1个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部