售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Title Page
Copyright and Credits
Apache Spark Quick Start Guide
About Packt
Why subscribe?
Packt.com
Contributors
About the authors
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Introduction to Apache Spark
What is Spark?
Spark architecture overview
Spark language APIs
Scala
Java
Python
R
SQL
Spark components
Spark Core
Spark SQL
Spark Streaming
Spark machine learning
Spark graph processing
Cluster manager
Standalone scheduler
YARN
Mesos
Kubernetes
Making the most of Hadoop and Spark
Summary
Apache Spark Installation
AWS elastic compute cloud (EC2)
Creating a free account on AWS
Connecting to your Linux instance
Configuring Spark
Prerequisites
Installing Java
Installing Scala
Installing Python
Installing Spark
Using Spark components
Different modes of execution
Spark sandbox
Summary
Spark RDD
What is an RDD?
Resilient metadata
Programming using RDDs
Transformations and actions
Transformation
Narrow transformations
map()
flatMap()
filter()
union()
mapPartitions()
Wide transformations
distinct()
sortBy()
intersection()
subtract()
cartesian()
Action
collect()
count()
take()
top()
takeOrdered()
first()
countByValue()
reduce()
saveAsTextFile()
foreach()
Types of RDDs
Pair RDDs
groupByKey()
reduceByKey()
sortByKey()
join()
Caching and checkpointing
Caching
Checkpointing
Understanding partitions
repartition() versus coalesce()
partitionBy()
Drawbacks of using RDDs
Summary
Spark DataFrame and Dataset
DataFrames
Creating DataFrames
Data sources
DataFrame operations and associated functions
Running SQL on DataFrames
Temporary views on DataFrames
Global temporary views on DataFrames
Datasets
Encoders
Internal row
Creating custom encoders
Summary
Spark Architecture and Application Execution Flow
A sample application
DAG constructor
Stage
Tasks
Task scheduler
FIFO
FAIR
Application execution modes
Local mode
Client mode
Cluster mode
Application monitoring
Spark UI
Application logs
External monitoring solution
Summary
Spark SQL
Spark SQL
Spark metastore
Using the Hive metastore in Spark SQL
Hive configuration with Spark
SQL language manual
Database
Table and view
Load data
Creating UDFs
SQL database using JDBC
Summary
Spark Streaming, Machine Learning, and Graph Analysis
Spark Streaming
Use cases
Data sources
Stream processing
Microbatch
DStreams
Streaming architecture
Streaming example
Machine learning
MLlib
ML
Graph processing
GraphX
mapVertices
mapEdges
subgraph
GraphFrames
degrees
subgraphs
Graph algorithms
PageRank
Summary
Spark Optimizations
Cluster-level optimizations
Memory
Disk
CPU cores
Project Tungsten
Application optimizations
Language choice
Structured versus unstructured APIs
File format choice
RDD optimizations
Choosing the right transformations
Serializing and compressing
Broadcast variables
DataFrame and dataset optimizations
Catalyst optimizer
Storage
Parallelism
Join performance
Code generation
Speculative execution
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜