万本电子书0元读

万本电子书0元读

顶部广告

Apache Spark Graph Processing电子书

售       价:¥

4人正在读 | 0人评论 9.8

作       者:Rindra Ramamonjison

出  版  社:Packt Publishing

出版时间:2015-09-10

字       数:96.5万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Build, process and analyze large-scale graph data effectively with Spark About This Book Find solutions for every stage of data processing from loading and transforming graph data to Improve the scalability of your graphs with a variety of real-world applications with complete Scala code. A concise guide to processing large-scale networks with Apache Spark. Who This Book Is For This book is for data scientists and big data developers who want to learn the processing and analyzing graph datasets at scale. Basic programming experience with Scala is assumed. Basic knowledge of Spark is assumed. What You Will Learn Write, build and deploy Spark applications with the Scala Build Tool. Build and analyze large-scale network datasets Analyze and transform graphs using RDD and graph-specific operations Implement new custom graph operations tailored to specific needs. Develop iterative and efficient graph algorithms using message aggregation and Pregel abstraction Extract subgraphs and use it to discover common clusters Analyze graph data and solve various data science problems using real-world datasets. In Detail Apache Spark is the next standard of open-source cluster-computing engine for processing big data. Many practical computing problems concern large graphs, like the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. Apache Spark GraphX API combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework. This book will teach the user to do graphical programming in Apache Spark, apart from an explanation of the entire process of graphical data analysis. You will journey through the creation of graphs, its uses, its exploration and analysis and finally will also cover the conversion of graph elements into graph structures. This book begins with an introduction of the Spark system, its libraries and the Scala Build Tool. Using a hands-on approach, this book will quickly teach you how to install and leverage Spark interactively on the command line and in a standalone Scala program. Then, it presents all the methods for building Spark graphs using illustrative network datasets. Next, it will walk you through the process of exploring, visualizing and analyzing different network characteristics. This book will also teach you how to transform raw datasets into a usable form. In addition, you will learn powerful operations that can be used to transform graph elements and graph structures. Furthermore, this book also teaches how to create custom graph operations that are tailored for specific needs with efficiency in mind. The later chapters of this book cover more advanced topics such as clustering graphs, implementing graph-parallel iterative algorithms and learning methods from graph data. Style and approach A step-by-step guide that will walk you through the key ideas and techniques for processing big graph data at scale, with practical examples that will ensure an overall understanding of the concepts of Spark.
目录展开

Apache Spark Graph Processing

Table of Contents

Apache Spark Graph Processing

Credits

Foreword

About the Author

About the Reviewer

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

Distinctive features

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Getting Started with Spark and GraphX

Downloading and installing Spark 1.4.1

Experimenting with the Spark shell

Getting started with GraphX

Building a tiny social network

Loading the data

The property graph

Transforming RDDs to VertexRDD and EdgeRDD

Introducing graph operations

Building and submitting a standalone application

Writing and configuring a Spark program

Building the program with the Scala Build Tool

Deploying and running with spark-submit

Summary

2. Building and Exploring Graphs

Network datasets

The communication network

Flavor networks

Social ego networks

Graph builders

The Graph factory method

edgeListFile

fromEdges

fromEdgeTuples

Building graphs

Building directed graphs

Building a bipartite graph

Building a weighted social ego network

Computing the degrees of the network nodes

In-degree and out-degree of the Enron email network

Degrees in the bipartite food network

Degree histogram of the social ego networks

Summary

3. Graph Analysis and Visualization

Network datasets

The graph visualization

Installing the GraphStream and BreezeViz libraries

Visualizing the graph data

Plotting the degree distribution

The analysis of network connectedness

Finding the connected components

Counting triangles and computing clustering coefficients

The network centrality and PageRank

How PageRank works

Ranking web pages

Scala Build Tool revisited

Organizing build definitions

Managing library dependencies

A preview of the steps

Step 1 – Enable the sbt-assembly plugin

Step 2 – Create a build.sbt file

Step 3 – Declare library dependencies and resolvers

Step 4 – Set up the sbt-assembly plugin

Step 5 – Create the uber JAR

Running tasks with SBT commands

Summary

4. Transforming and Shaping Up Graphs to Your Needs

Transforming the vertex and edge attributes

mapVertices

mapEdges

mapTriplets

Modifying graph structures

The reverse operator

The subgraph operator

The mask operator

The groupEdges operator

Joining graph datasets

joinVertices

outerJoinVertices

Example – Hollywood movie graph

Data operations on VertexRDD and EdgeRDD

Mapping VertexRDD and EdgeRDD

Filtering VertexRDDs

Joining VertexRDDs

Joining EdgeRDDs

Reversing edge directions

Collecting neighboring information

Example – from food network to flavor pairing

Summary

5. Creating Custom Graph Aggregation Operators

NCAA College Basketball datasets

The aggregateMessages operator

EdgeContext

Abstracting out the aggregation

Keeping things DRY

Coach wants more numbers

Calculating average points per game

Defense stats – D matters as in direction

Joining average stats into a graph

Performance optimization

The MapReduceTriplets operator

Summary

6. Iterative Graph-Parallel Processing with Pregel

The Pregel computational model

Example – iterating towards the social equality

The Pregel API in GraphX

Community detection through label propagation

The Pregel implementation of PageRank

Summary

7. Learning Graph Structures

Community clustering in graphs

Spectral clustering

Power iteration clustering

Applications – music fan community detection

Step 1 – load the data into a Spark graph property

Step 2 – extract the features of nodes

Step 3 – define a similarity measure between two nodes

Step 4 – create an affinity matrix

Step 5 – run k-means clustering on the affinity matrix

Exercise – collaborative clustering through playlists

Summary

A. References

Chapter 2, Building and Exploring Graphs

Chapter 3, Graph Analysis and Visualization

Chapter 7, Learning Graph Structures

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部