万本电子书0元读

万本电子书0元读

顶部广告

Big Data Architect’s Handbook电子书

售       价:¥

6人正在读 | 0人评论 9.8

作       者:Syed Muhammad Fahad Akhtar

出  版  社:Packt Publishing

出版时间:2018-06-21

字       数:44.6万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
A comprehensive end-to-end guide that gives hands-on practice in big data and Artificial Intelligence About This Book ? Learn to build and run a big data application with sample code ? Explore examples to implement activities that a big data architect performs ? Use Machine Learning and AI for structured and unstructured data Who This Book Is For Big Data Architect’s Handbook is for you if you are an aspiring data professional, developer, or IT enthusiast who aims to be an all-round architect in big data. This book is your one-stop solution to enhance your knowledge and carry out easy to complex activities required to become a big data architect. What You Will Learn ? Learn Hadoop Ecosystem and Apache projects ? Understand, compare NoSQL database and essential software architecture ? Cloud infrastructure design considerations for big data ? Explore application scenario of big data tools for daily activities ? Learn to analyze and visualize results to uncover valuable insights ? Build and run a big data application with sample code from end to end ? Apply Machine Learning and AI to perform big data intelligence ? Practice the daily activities performed by big data architects In Detail The big data architects are the “masters” of data, and hold high value in today’s market. Handling big data, be it of good or bad quality, is not an easy task. The prime job for any big data architect is to build an end-to-end big data solution that integrates data from different sources and analyzes it to find useful, hidden insights. Big Data Architect’s Handbook takes you through developing a complete, end-to-end big data pipeline, which will lay the foundation for you and provide the necessary knowledge required to be an architect in big data. Right from understanding the design considerations to implementing a solid, efficient, and scalable data pipeline, this book walks you through all the essential aspects of big data. It also gives you an overview of how you can leverage the power of various big data tools such as Apache Hadoop and ElasticSearch in order to bring them together and build an efficient big data solution. By the end of this book, you will be able to build your own design system which integrates, maintains, visualizes, and monitors your data. In addition, you will have a smooth design flow in each process, putting insights in action. Style and approach Comprehensive guide with a perfect blend of theory, examples and implementation of real-world use-cases
目录展开

Title Page

Copyright and Credits

Big Data Architect's Handbook

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Why Big Data?

What is big data?

Characteristics of big data

Volume

Velocity

Variety

Veracity

Variability

Value

Solution-based approach for data

Data – the most valuable asset

Traditional approaches to data storage

Clustered computing

High availability

Resource pooling

Easy scalability

Big data – how does it make a difference?

Big data solutions – cloud versus on-premises infrastructure

Cost

Security

Current capabilities

Scalability

Big data glossary

Big data

Batch processing

Cluster computing

Data warehouse

Data lake

Data mining

ETL

Hadoop

In-memory computing

Machine learning

MapReduce

NoSQL

Stream processing

Summary

Big Data Environment Setup

Oracle VM VirtualBox installation

Ubuntu installation

Hadoop prerequisite installation

Java installation

SSH installation and configuration

Hadoop system user

Apache Hadoop installation

Hadoop configuration

Path configuration for Hadoop commands

Hadoop server start and stop

Summary

Hadoop Ecosystem

Apache Hadoop

Hadoop Distributed File System

HDFS hands-on

Creating a directory in HDFS

Copying files from a local file system to HDFS

Copying files from HDFS to a local file system

Deleting files and folders in HDFS

Hadoop MapReduce

Job Tracker and Task Tracker

The execution flow of MapReduce

Mapper

Shuffle and Sort

Reducer

Example program

Preparing the data file for analysis

Program code

Driver program

Mapper program

Reducer program

Observations and results

YARN

Resource Manager

Node Manager

Container

Application Master

Apache Projects related to big data

Apache Zookeeper

Apache Kafka

Apache Flume

Apache Cassandra

Apache HBase

Apache Spark

Summary

NoSQL Database

What is NoSQL?

Benefits of NoSQL databases

NoSQL versus RDBMS

The CAP theorem

The ACID properties

Data models in NoSQL

Key-value data stores

Document store

Column stores

Graph stores

Apache Cassandra

Installation

Starting Cassandra

The Cassandra Query Language – CQL

The help command

Basic commands

Data manipulation

Creating, altering, and deleting a keyspace

Creating, altering, and deleting tables

Inserting, updating, and deleting data

The MongoDB database

Installing MongoDB

Starting MongoDB

Working on MongoDB

The help command

Basic commands

Data manipulation

Creating and deleting databases

Creating and deleting collections

The create, retrieve, update, delete operations

Neo4j database

Installing Neo4j

Starting Neo4j

The cypher query language

Help

Basic operations in Cypher

Creating nodes, relationships, and properties

Updating nodes, relationships, and properties

Deleting nodes, relationships, and properties

Reading nodes, relationships, and properties

Summary

Off-the-Shelf Commercial Tools

Microsoft Azure

Building a practical application

Microsoft Azure account

The Azure Event Hub

IoT simulation application

Setting up an Azure Stream Analytics job

Input

Query

Output

Dashboard in Power BI

Summary

Containerization

Virtualization

Hypervisors

Hardware-based hypervisors

Software-based hypervisors

What is containerization?

Benefits of containers

Docker

Docker workflow

Installation

Basic commands

Docker images

Building a Docker image

Running and verifying Docker images

Importing and exporting Docker images

Docker Swarm

Setting up Docker Swarm

Creating service containers

Replicating containers

Removing container services

Kubernetes

Key components

Pods

ReplicaSets

Deployments

PetSets

Installation

Deployment

Kubernetes Dashboard

Summary

Network Infrastructure

Network

Local area networks

Metropolitan area networks

Wide area networks

Network connectivity

Wired

Wireless

Network visualization

Gephi

Installation

Java installation

First run

Practical example

Summary

Cloud Infrastructure

Companies moving to cloud

Driving factors

Infrastructure

Locality of data

Requirements

Design considerations

Open source versus commercial

Commodity hardware versus purpose build

Cloud versus on-premises

Scale up and down

Application architecture

Cost decision

Summary

Security and Monitoring

Simple Network Management Protocol

Benefits of SNMP

Security

Agents and Traps

Netflow

Nagios

Key benefits

Security Onion

Deployment scenarios

The Standalone model

The Server-Sensor model

Hybrid model

Preconfigured tools

Wireshark

Key features

Summary

Frontend Architecture

React JS

Key concepts

Node.js

JSX

Unidirectional dataflow

Getting started with ReactJS

Single page application

React application project

React app directory structure

Components

Properties

Event handling

State

Redux

Architecture of Redux

Key concepts

Single store

Action

Reducers

Guestbook application

Installation

Create a store

Setting up Reducer

Setting up Dispatcher

Connect function

Setting up Subscribers

Final output

Summary

Backend Architecture

API

RESTful API

HTTP request methods

GET

POST

PUT

DELETE

Authentication

Basic authentication

JSON Web Token

Header

Payload

Signature

Practical

RESTful web service

Java client

Redis

Installation

Redis server

Redis client

Working with Redis

Redis data types and structures

String

HashMap

List

Set

Redis Publish/Subscribe

Common key operations

Summary

Machine Learning

Machine learning

Types of algorithms

Parametric algorithms

Non-parametric algorithms

Supervised learning

The classification model

Binary classification

Multi-class classification

The regression model

Linear regression

Polynomial regression

Unsupervised learning

Clustering, k-means

Neural networks

Feedforward neural network

Recurrent neural network

Symmetrically connected neural network

Deep neural networks

Decision tree classifiers

Summary

Artificial Intelligence

Artificial intelligence

Convolutional neural networks

Deep learning using TensorFlow

TensorFlow

Installation

TensorFlow program

Uninstalling TensorFlow

TensorBoard

Program

Launching TensorBoard

TensorBoard graph

Object detection using YOLO

Installation

Compiling YOLO library

Trained weights

Detecting objects in an image

Summary

Elasticsearch

Installing Elasticsearch

Starting the Elasticsearch server

Auto starting the Elasticsearch service

Stopping the Elasticsearch server

Uninstalling Elasticsearch

Kibana

Installation

Starting Kibana

Uninstalling Kibana

Security

Securing Elasticsearch

Securing Kibana

Understanding queries – CRUD commands

Creating

Reading

Updating

Deleting

Summary

Structured Data

Data analysis

Installing MySQL

Importing data

Analyzing the data model

HBase

Installation

Starting an HBase instance

Stopping a HBase instance

Preparing an HBase for migration

Sqoop

Installation

Verifying the installation

MySQL JDBC driver

Importing data

Verifying the imported data

Summary

Unstructured Data

Moving data into Hadoop

Downloading Flume

Environment configuration

Configuring agent and sink

Running Apache Flume

Transferring a log file

Converting images into text for analysis

Tesseract OCR

Installing Tesseract

Practical example

Complete code

Program execution

Summary

Data Visualization

Matplotlib

Installing Matplotlib

Line chart

Bar charts

Stack charts

Scatter charts

Pie charts

Geographic projections

D3.js

Installation

Practical example

Output

Summary

Financial Trading System

What is algorithmic trading?

Benefits of algorithmic trading

Big data in the financial market

Algorithmic trading strategies

Building an Expert Advisor

MetaTrader

Downloading and setting up MetaTrader

MetaQuotes language

Trading bot objective

Practical

Trading pattern – moving average

Decision time: buy or sell

Complete program

Backtesting in MetaTrader 4

Summary

Retail Recommendation System

Types of recommendation system

Collaborative filtering

Content-based filtering

Demographic-based system

Utility-based system

Knowledge-based system

Hybrid model

Commercial tools

Barilliance

Softcube

Strands

Monetate

Nosto

Book recommendation system

Dataset

Directory structure

Code

Reading the dataset

Verifying the dataset

Data analysis

Age group

Commutative rating

Algorithms

Top-rated books

Popular books

Demographic-based recommendation

Useful resources

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部