万本电子书0元读

万本电子书0元读

顶部广告

Fast Data Processing Systems with SMACK Stack电子书

售       价:¥

4人正在读 | 0人评论 9.8

作       者:Raúl Estrada

出  版  社:Packt Publishing

出版时间:2016-12-01

字       数:532.2万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Combine the incredible powers of Spark, Mesos, Akka, Cassandra, and Kafka to build data processing platforms that can take on even the hardest of your data troubles! About This Book This highly practical guide shows you how to use the best of the big data technologies to solve your response-critical problems Learn the art of making cheap-yet-effective big data architecture without using complex Greek-letter architectures Use this easy-to-follow guide to build fast data processing systems for your organization Who This Book Is For If you are a developer, data architect, or a data scientist looking for information on how to integrate the Big Data stack architecture and how to choose the correct technology in every layer, this book is what you are looking for. What You Will Learn Design and implement a fast data Pipeline architecture Think and solve programming challenges in a functional way with Scala Learn to use Akka, the actors model implementation for the JVM Make on memory processing and data analysis with Spark to solve modern business demands Build a powerful and effective cluster infrastructure with Mesos and Docker Manage and consume unstructured and No-SQL data sources with Cassandra Consume and produce messages in a massive way with Kafka In Detail SMACK is an open source full stack for big data architecture. It is a combination of Spark, Mesos, Akka, Cassandra, and Kafka. This stack is the newest technique developers have begun to use to tackle critical real-time analytics for big data. This highly practical guide will teach you how to integrate these technologies to create a highly efficient data analysis system for fast data processing. We’ll start off with an introduction to SMACK and show you when to use it. First you’ll get to grips with functional thinking and problem solving using Scala. Next you’ll come to understand the Akka architecture. Then you’ll get to know how to improve the data structure architecture and optimize resources using Apache Spark. Moving forward, you’ll learn how to perform linear scalability in databases with Apache Cassandra. You’ll grasp the high throughput distributed messaging systems using Apache Kafka. We’ll show you how to build a cheap but effective cluster infrastructure with Apache Mesos. Finally, you will deep dive into the different aspect of SMACK using a few case studies. By the end of the book, you will be able to integrate all the components of the SMACK stack and use them together to achieve highly effective and fast data processing. Style and approach With the help of various industry examples, you will learn about the full stack of big data architecture, taking the important aspects in every technology. You will learn how to integrate the technologies to build effective systems rather than getting incomplete information on single technologies. You will learn how various open source technologies can be used to build cheap and fast data processing systems with the help of various industry examples
目录展开

Fast Data Processing Systems with SMACK Stack

Fast Data Processing Systems with SMACK Stack

Credits

About the Author

About the Reviewers

www.PacktPub.com

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. An Introduction to SMACK

Modern data-processing challenges

The data-processing pipeline architecture

The NoETL manifesto

Lambda architecture

Hadoop

SMACK technologies

Apache Spark

Akka

Apache Cassandra

Apache Kafka

Apache Mesos

Changing the data center operations

From scale-up to scale-out

The open-source predominance

Data store diversification

Data gravity and data locality

DevOps rules

Data expert profiles

Data architects

Data engineers

Data analysts

Data scientists

Is SMACK for me?

Summary

2. The Model - Scala and Akka

The language - Scala

Kata 1 - The collections hierarchy

Sequence

Map

Set

Kata 2 - Choosing the right collection

Sequence

Map

Set

Kata 3 - Iterating with foreach

Kata 4 - Iterating with for

Kata 5 - Iterators

Kata 6 - Transforming with map

Kata 7 - Flattening

Kata 8 - Filtering

Kata 9 - Subsequences

Kata 10 - Splitting

Kata 11 - Extracting unique elements

Kata 12 - Merging

Kata 13 - Lazy views

Kata 14 - Sorting

Kata 15 - Streams

Kata 16 - Arrays

Kata 17 - ArrayBuffer

Kata 18 - Queues

Kata 19 - Stacks

Kata 20 - Ranges

The model - Akka

The Actor Model in a nutshell

Kata 21 - Actors

The actor system

Actor reference

Kata 22 - Actor communication

Kata 23 - Actor life cycle

Kata 24 - Starting actors

Kata 25 - Stopping actors

Kata 26 - Killing actors

Kata 27 - Shutting down the actor system

Kata 28 - Actor monitoring

Kata 29 - Looking up actors

Summary

3. The Engine - Apache Spark

Spark in single mode

Downloading Apache Spark

Testing Apache Spark

Spark core concepts

Resilient distributed datasets

Running Spark applications

Initializing the Spark context

Spark applications

Running programs

RDD operation

Transformations

Actions

Persistence (caching)

Spark in cluster mode

Runtime architecture

Driver

Dividing a program into tasks

Scheduling tasks on executors

Executor

Cluster manager

Program execution

Application deployment

Standalone cluster manager

Launching the standalone manager

Submitting our application

Configuring resources

Working in the cluster

Spark Streaming

Spark Streaming architecture

Transformations

Stateless transformations

Stateful transformations

Windowed operations

Update state by key

Output operations

Fault-tolerant Spark Streaming

Checkpointing

Spark Streaming performance

Parallelism level

Window size and batch size

Garbage collector

Summary

4. The Storage - Apache Cassandra

A bit of history

NoSQL

NoSQL or SQL?

CAP Brewer's theorem

Apache Cassandra installation

Data model

Data storage

Installation

DataStax OpsCenter

Creating a key space

Authentication and authorization (roles)

Setting up a simple authentication and authorization

Backup

Compression

Recovery

Restart node

Printing schema

Logs

Configuring log4j

Log file rotation

User activity log

Transaction log

SQL dump

CQL

CQL commands

DBMS Cluster

Deleting the database

CLI delete commands

CQL shell delete commands

DB and DBMS optimization

Bloom filter

Data cache

Java heap tune up

Java garbage collection tune up

Views, triggers, and stored procedures

Client-server architecture

Drivers

Spark-Cassandra connector

Installing the connector

Establishing the connection

Using the connector

Summary

5. The Broker - Apache Kafka

Introducing Kafka

Features of Apache Kafka

Born to be fast data

Use cases

Installation

Installing Java

Installing Kafka

Importing Kafka

Cluster

Single node - single broker cluster

Starting Zookeeper

Starting the broker

Creating a topic

Starting a producer

Starting a consumer

Single node - Multiple broker cluster

Starting the brokers

Creating a topic

Starting a producer

Starting a consumer

Multiple node - multiple broker cluster

Broker properties

Architecture

Segment files

Offset

Leaders

Groups

Log compaction

Kafka design

Message compression

Replication

Asynchronous replication

Synchronous replication

Producers

Producer API

Scala producers

Step 1: Import classes

Step 2: Define properties

Step 3: Build and send the message

Step 4: Create the topic

Step 5: Compile the producer

Step 6: Run the producer

Step 7: Run a consumer

Producers with custom partitioning

Step 1: Import classes

Step 2: Define properties

Step 3: Implement the partitioner class

Step 4: Build and send the message

Step 5: Create the topic

Step 6: Compile the programs

Step 7: Run the producer

Step 8: Run a consumer

Producer properties

Consumers

Consumer API

Simple Scala consumers

Step 1: Import classes

Step 2: Define properties

Step 3: Code the SimpleConsumer

Step 4: Create the topic

Step 5: Compile the program

Step 6: Run the producer

Step 7: Run the consumer

Multithread Scala consumers

Step 1: Import classes

Step 2: Define properties

Step 3: Code the MultiThreadConsumer

Step 4: Create the topic

Step 5: Compile the program

Step 6: Run the producer

Step 7: Run the consumer

Consumer properties

Integration

Integration with Apache Spark

Administration

Cluster tools

Adding servers

Kafka topic tools

Cluster mirroring

Summary

6. The Manager - Apache Mesos

The Apache Mesos architecture

Frameworks

Existing Mesos frameworks

Frameworks for long running applications

Frameworks for scheduling

Frameworks for storage

Attributes and resources

Attributes

Resources

The Apache Mesos API

Messages

The Executor API

Executor Driver API

The Scheduler API

The Scheduler Driver API

Resource allocation

The DRF algorithm

Weighted DRF algorithm

Resource configuration

Resource reservation

Static reservation

Defining roles

Assigning frameworks to roles

Setting policies

Dynamic reservation

The reserve operation

The unreserve operation

HTTP reserve

HTTP unreserve

Running a Mesos cluster on AWS

AWS instance types

AWS instances launching

Installing Mesos on AWS

Downloading Mesos

Building Mesos

Launching several instances

Running a Mesos cluster on a private data center

Mesos installation

Setting up the environment

Start the master

Start the slaves

Process automation

Common Mesos issues

Missing library dependencies

Directory permissions

Missing library

Debugging

Directory structure

Slaves not connecting with masters

Multiple slaves on the same machine

Scheduling and management frameworks

Marathon

Marathon installation

Installing Apache Zookeeper

Running Marathon in local mode

Multi-node Marathon installation

Running a test application from the web UI

Application scaling

Terminating the application

Chronos

Chronos installation

Job scheduling

Chronos and Marathon

Chronos REST API

Listing running jobs

Starting a job manually

Adding a job

Deleting a job

Deleting all the job tasks

Marathon REST API

Listing the running applications

Adding an application

Changing the application configuration

Deleting the application

Apache Aurora

Installing Aurora

Singularity

Singularity installation

The Singularity configuration file

Apache Spark on Apache Mesos

Submitting jobs in client mode

Submitting jobs in cluster mode

Advanced configuration

Apache Cassandra on Apache Mesos

Advanced configuration

Apache Kafka on Apache Mesos

Kafka log management

Summary

7. Study Case 1 - Spark and Cassandra

Spark Cassandra connector

Requisites

Preparing Cassandra

SparkContext setup

Cassandra and Spark Streaming

Spark Streaming setup

Cassandra setup

Streaming context creation

Stream creation

Kafka Streams

Akka Streams

Enabling Cassandra

Write the Stream to Cassandra

Read the Stream from Cassandra

Saving datasets to Cassandra

Saving a collection of tuples to Cassandra

Saving collections to Cassandra

Modifying collections

Saving objects of Cassandra (user defined types)

Scala options to Cassandra options conversion

Saving RDDs as new tables

Cluster deployment

Spark Cassandra use cases

Study case: The Calliope project

Installing Calliope

CQL3

Read from Cassandra with CQL3

Write to Cassandra with CQL3

Thrift

Read from Cassandra with Thrift

Write to Cassandra with Thrift

Calliope SQL context creation

Calliope SQL Configuration

Loading Cassandra tables programmatically

Summary

8. Study Case 2 - Connectors

Akka and Cassandra

Writing to Cassandra

Reading from Cassandra

Connecting to Cassandra

Scanning tweets

Testing the scanner

Akka and Spark

Kafka and Akka

Kafka and Cassandra

Summary

9. Study Case 3 - Mesos and Docker

Mesos frameworks API

Authentication, authorization, and access control

Framework authentication

Authentication configuration

Framework authorization

Access control lists

Spark Mesos run modes

Coarse-grained

Fine-grained

Apache Mesos API

Scheduler HTTP API

Requests

SUBSCRIBE

TEARDOWN

ACCEPT

DECLINE

REVIVE

KILL

SHUTDOWN

ACKNOWLEDGE

RECONCILE

MESSAGE

REQUEST

Responses

SUBSCRIBED

OFFERS

RESCIND

UPDATE

MESSAGE

FAILURE

ERROR

HEARTBEAT

Mesos containerizers

Containers

Docker containerizers

Containers and containerizers

Types of containerizers

Creating containerizers

Mesos containerizer

Launching Mesos containerizer

Architecture of Mesos containerizer

Shared filesystem

PID namespace

Posix disk

Docker containerizers

Docker containerizer setup

Launching the Docker containerizers

Composing containerizers

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部