售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Title Page
Copyright and Credits
PySpark Cookbook
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the authors
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Sections
Getting ready
How to do it...
How it works...
There's more...
See also
Get in touch
Reviews
Installing and Configuring Spark
Introduction
Installing Spark requirements
Getting ready
How to do it...
How it works...
There's more...
Installing Java
Installing Python
Installing R
Installing Scala
Installing Maven
Updating PATH
Installing Spark from sources
Getting ready
How to do it...
How it works...
There's more...
See also
Installing Spark from binaries
Getting ready
How to do it...
How it works...
There's more...
Configuring a local instance of Spark
Getting ready
How to do it...
How it works...
See also
Configuring a multi-node instance of Spark
Getting ready
How to do it...
How it works...
See also
Installing Jupyter
Getting ready
How to do it...
How it works...
There's more...
See also
Configuring a session in Jupyter
Getting ready
How to do it...
How it works...
There's more...
See also
Working with Cloudera Spark images
Getting ready
How to do it...
How it works...
Abstracting Data with RDDs
Introduction
Creating RDDs
Getting ready
How to do it...
How it works...
Spark context parallelize method
.take(...) method
Reading data from files
Getting ready
How to do it...
How it works...
.textFile(...) method
.map(...) method
Partitions and performance
Overview of RDD transformations
Getting ready
How to do it...
.map(...) transformation
.filter(...) transformation
.flatMap(...) transformation
.distinct() transformation
.sample(...) transformation
.join(...) transformation
.repartition(...) transformation
.zipWithIndex() transformation
.reduceByKey(...) transformation
.sortByKey(...) transformation
.union(...) transformation
.mapPartitionsWithIndex(...) transformation
How it works...
Overview of RDD actions
Getting ready
How to do it...
.take(...) action
.collect() action
.reduce(...) action
.count() action
.saveAsTextFile(...) action
How it works...
Pitfalls of using RDDs
Getting ready
How to do it...
How it works...
Abstracting Data with DataFrames
Introduction
Creating DataFrames
Getting ready
How to do it...
How it works...
There's more...
From JSON
From CSV
See also
Accessing underlying RDDs
Getting ready
How to do it...
How it works...
Performance optimizations
Getting ready
How to do it...
How it works...
There's more...
See also
Inferring the schema using reflection
Getting ready
How to do it...
How it works...
See also
Specifying the schema programmatically
Getting ready
How to do it...
How it works...
See also
Creating a temporary table
Getting ready
How to do it...
How it works...
There's more...
Using SQL to interact with DataFrames
Getting ready
How to do it...
How it works...
There's more...
Overview of DataFrame transformations
Getting ready
How to do it...
The .select(...) transformation
The .filter(...) transformation
The .groupBy(...) transformation
The .orderBy(...) transformation
The .withColumn(...) transformation
The .join(...) transformation
The .unionAll(...) transformation
The .distinct(...) transformation
The .repartition(...) transformation
The .fillna(...) transformation
The .dropna(...) transformation
The .dropDuplicates(...) transformation
The .summary() and .describe() transformations
The .freqItems(...) transformation
See also
Overview of DataFrame actions
Getting ready
How to do it...
The .show(...) action
The .collect() action
The .take(...) action
The .toPandas() action
See also
Preparing Data for Modeling
Introduction
Handling duplicates
Getting ready
How to do it...
How it works...
There's more...
Only IDs differ
ID collisions
Handling missing observations
Getting ready
How to do it...
How it works...
Missing observations per row
Missing observations per column
There's more...
See also
Handling outliers
Getting ready
How to do it...
How it works...
See also
Exploring descriptive statistics
Getting ready
How to do it...
How it works...
There's more...
Descriptive statistics for aggregated columns
See also
Computing correlations
Getting ready
How to do it...
How it works...
There's more...
Drawing histograms
Getting ready
How to do it...
How it works...
There's more...
See also
Visualizing interactions between features
Getting ready
How to do it...
How it works...
There's more...
Machine Learning with MLlib
Loading the data
Getting ready
How to do it...
How it works...
There's more...
Exploring the data
Getting ready
How to do it...
How it works...
Numerical features
Categorical features
There's more...
See also
Testing the data
Getting ready
How to do it...
How it works...
See also...
Transforming the data
Getting ready
How to do it...
How it works...
There's more...
See also...
Standardizing the data
Getting ready
How to do it...
How it works...
Creating an RDD for training
Getting ready
How to do it...
Classification
Regression
How it works...
There's more...
See also
Predicting hours of work for census respondents
Getting ready
How to do it...
How it works...
Forecasting the income levels of census respondents
Getting ready
How to do it...
How it works...
There's more...
Building a clustering models
Getting ready
How to do it...
How it works...
There's more...
See also
Computing performance statistics
Getting ready
How to do it...
How it works...
Regression metrics
Classification metrics
See also
Machine Learning with the ML Module
Introducing Transformers
Getting ready
How to do it...
How it works...
There's more...
See also
Introducing Estimators
Getting ready
How to do it...
How it works...
There's more...
Introducing Pipelines
Getting ready
How to do it...
How it works...
See also
Selecting the most predictable features
Getting ready
How to do it...
How it works...
There's more...
See also
Predicting forest coverage types
Getting ready
How to do it...
How it works...
There's more...
Estimating forest elevation
Getting ready
How to do it...
How it works...
There's more...
Clustering forest cover types
Getting ready
How to do it...
How it works...
See also
Tuning hyperparameters
Getting ready
How to do it...
How it works...
There's more...
Extracting features from text
Getting ready
How to do it...
How it works...
There's more...
See also
Discretizing continuous variables
Getting ready
How to do it...
How it works...
Standardizing continuous variables
Getting ready
How to do it...
How it works...
Topic mining
Getting ready
How to do it...
How it works...
Structured Streaming with PySpark
Introduction
Understanding Spark Streaming
Understanding DStreams
Getting ready
How to do it...
Terminal 1 – Netcat window
Terminal 2 – Spark Streaming window
How it works...
There's more...
Understanding global aggregations
Getting ready
How to do it...
Terminal 1 – Netcat window
Terminal 2 – Spark Streaming window
How it works...
Continuous aggregation with structured streaming
Getting ready
How to do it...
Terminal 1 – Netcat window
Terminal 2 – Spark Streaming window
How it works...
GraphFrames – Graph Theory with PySpark
Introduction
Installing GraphFrames
Getting ready
How to do it...
How it works...
Preparing the data
Getting ready
How to do it...
How it works...
There's more...
Building the graph
How to do it...
How it works...
Running queries against the graph
Getting ready
How to do it...
How it works...
Understanding the graph
Getting ready
How to do it...
How it works...
Using PageRank to determine airport ranking
Getting ready
How to do it...
How it works...
Finding the fewest number of connections
Getting ready
How to do it...
How it works...
There's more...
See also
Visualizing the graph
Getting ready
How to do it...
How it works...
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜