万本电子书0元读

万本电子书0元读

顶部广告

Practical Big Data Analytics电子书

售       价:¥

5人正在读 | 0人评论 6.2

作       者:Nataraj Dasgupta

出  版  社:Packt Publishing

出版时间:2018-01-15

字       数:41.3万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Get command of your organizational Big Data using the power of data science and analytics About This Book ? A perfect companion to boost your Big Data storing, processing, analyzing skills to help you take informed business decisions ? Work with the best tools such as Apache Hadoop, R, Python, and Spark for NoSQL platforms to perform massive online analyses ? Get expert tips on statistical inference, machine learning, mathematical modeling, and data visualization for Big Data Who This Book Is For The book is intended for existing and aspiring Big Data professionals who wish to become the go-to person in their organization when it comes to Big Data architecture, analytics, and governance. While no prior knowledge of Big Data or related technologies is assumed, it will be helpful to have some programming experience. What You Will Learn ? Get a 360-degree view into the world of Big Data, data science and machine learning ? Broad range of technical and business Big Data analytics topics that caters to the interests of the technical experts as well as corporate IT executives ? Get hands-on experience with industry-standard Big Data and machine learning tools such as Hadoop, Spark, MongoDB, KDB+ and R ? Create production-grade machine learning BI Dashboards using R and R Shiny with step-by-step instructions ? Learn how to combine open-source Big Data, machine learning and BI Tools to create low-cost business analytics applications ? Understand corporate strategies for successful Big Data and data science projects ? Go beyond general-purpose analytics to develop cutting-edge Big Data applications using emerging technologies In Detail Big Data analytics relates to the strategies used by organizations to collect, organize and analyze large amounts of data to uncover valuable business insights that otherwise cannot be analyzed through traditional systems. Crafting an enterprise-scale cost-efficient Big Data and machine learning solution to uncover insights and value from your organization's data is a challenge. Today, with hundreds of new Big Data systems, machine learning packages and BI Tools, selecting the right combination of technologies is an even greater challenge. This book will help you do that. With the help of this guide, you will be able to bridge the gap between the theoretical world of technology with the practical ground reality of building corporate Big Data and data science platforms. You will get hands-on exposure to Hadoop and Spark, build machine learning dashboards using R and R Shiny, create web-based apps using NoSQL databases such as MongoDB and even learn how to write R code for neural networks. By the end of the book, you will have a very clear and concrete understanding of what Big Data analytics means, how it drives revenues for organizations, and how you can develop your own Big Data analytics solution using different tools and methods articulated in this book. Style and approach This book equips you with a knowledge of various NoSQL tools, R, Python programming, cloud platforms, and techniques so you can use them to store, analyze, and deliver meaningful insights from your data.
目录展开

Title Page

Copyright and Credits

Practical Big Data Analytics

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Too Big or Not Too Big

What is big data?

A brief history of data

Dawn of the information age

Dr. Alan Turing and modern computing

The advent of the stored-program computer

From magnetic devices to SSDs

Why we are talking about big data now if data has always existed

Definition of big data

Building blocks of big data analytics

Types of Big Data

Structured

Unstructured

Semi-structured

Sources of big data

The 4Vs of big data

When do you know you have a big data problem and where do you start your search for the big data solution?

Summary

Big Data Mining for the Masses

What is big data mining?

Big data mining in the enterprise

Building the case for a Big Data strategy

Implementation life cycle

Stakeholders of the solution

Implementing the solution

Technical elements of the big data platform

Selection of the hardware stack

Selection of the software stack

Summary

The Analytics Toolkit

Components of the Analytics Toolkit

System recommendations

Installing on a laptop or workstation

Installing on the cloud

Installing Hadoop

Installing Oracle VirtualBox

Installing CDH in other environments

Installing Packt Data Science Box

Installing Spark

Installing R

Steps for downloading and installing Microsoft R Open

Installing RStudio

Installing Python

Summary

Big Data With Hadoop

The fundamentals of Hadoop

The fundamental premise of Hadoop

The core modules of Hadoop

Hadoop Distributed File System - HDFS

Data storage process in HDFS

Hadoop MapReduce

An intuitive introduction to MapReduce

A technical understanding of MapReduce

Block size and number of mappers and reducers

Hadoop YARN

Job scheduling in YARN

Other topics in Hadoop

Encryption

User authentication

Hadoop data storage formats

New features expected in Hadoop 3

The Hadoop ecosystem

Hands-on with CDH

WordCount using Hadoop MapReduce

Analyzing oil import prices with Hive

Joining tables in Hive

Summary

Big Data Mining with NoSQL

Why NoSQL?

The ACID, BASE, and CAP properties

ACID and SQL

The BASE property of NoSQL

The CAP theorem

The need for NoSQL technologies

Google Bigtable

Amazon Dynamo

NoSQL databases

In-memory databases

Columnar databases

Document-oriented databases

Key-value databases

Graph databases

Other NoSQL types and summary of other types of databases

Analyzing Nobel Laureates data with MongoDB

JSON format

Installing and using MongoDB

Tracking physician payments with real-world data

Installing kdb+, R, and RStudio

Installing kdb+

Installing R

Installing RStudio

The CMS Open Payments Portal

Downloading the CMS Open Payments data

Creating the Q application

Loading the data

The backend code

Creating the frontend web portal

R Shiny platform for developers

Putting it all together - The CMS Open Payments application

Applications

Summary

Spark for Big Data Analytics

The advent of Spark

Limitations of Hadoop

Overcoming the limitations of Hadoop

Theoretical concepts in Spark

Resilient distributed datasets

Directed acyclic graphs

SparkContext

Spark DataFrames

Actions and transformations

Spark deployment options

Spark APIs

Core components in Spark

Spark Core

Spark SQL

Spark Streaming

GraphX

MLlib

The architecture of Spark

Spark solutions

Spark practicals

Signing up for Databricks Community Edition

Spark exercise - hands-on with Spark (Databricks)

Summary

An Introduction to Machine Learning Concepts

What is machine learning?

The evolution of machine learning

Factors that led to the success of machine learning

Machine learning, statistics, and AI

Categories of machine learning

Supervised and unsupervised machine learning

Supervised machine learning

Vehicle Mileage, Number Recognition and other examples

Unsupervised machine learning

Subdividing supervised machine learning

Common terminologies in machine learning

The core concepts in machine learning

Data management steps in machine learning

Pre-processing and feature selection techniques

Centering and scaling

The near-zero variance function

Removing correlated variables

Other common data transformations

Data sampling

Data imputation

The importance of variables

The train, test splits, and cross-validation concepts

Splitting the data into train and test sets

The cross-validation parameter

Creating the model

Leveraging multicore processing in the model

Summary

Machine Learning Deep Dive

The bias, variance, and regularization properties

The gradient descent and VC Dimension theories

Popular machine learning algorithms

Regression models

Association rules

Confidence

Support

Lift

Decision trees

The Random forest extension

Boosting algorithms

Support vector machines

The K-Means machine learning technique

The neural networks related algorithms

Tutorial - associative rules mining with CMS data

Downloading the data

Writing the R code for Apriori

Shiny (R Code)

Using custom CSS and fonts for the application

Running the application

Summary

Enterprise Data Science

Enterprise data science overview

A roadmap to enterprise analytics success

Data science solutions in the enterprise

Enterprise data warehouse and data mining

Traditional data warehouse systems

Oracle Exadata, Exalytics, and TimesTen

HP Vertica

Teradata

IBM data warehouse systems (formerly Netezza appliances)

PostgreSQL

Greenplum

SAP Hana

Enterprise and open source NoSQL Databases

Kdb+

MongoDB

Cassandra

Neo4j

Cloud databases

Amazon Redshift, Redshift Spectrum, and Athena databases

Google BigQuery and other cloud services

Azure CosmosDB

GPU databases

Brytlyt

MapD

Other common databases

Enterprise data science – machine learning and AI

The R programming language

Python

OpenCV, Caffe, and others

Spark

Deep learning

H2O and Driverless AI

Datarobot

Command-line tools

Apache MADlib

Machine learning as a service

Enterprise infrastructure solutions

Cloud computing

Virtualization

Containers – Docker, Kubernetes, and Mesos

On-premises hardware

Enterprise Big Data

Tutorial – using RStudio in the cloud

Summary

Closing Thoughts on Big Data

Corporate big data and data science strategy

Ethical considerations

Silicon Valley and data science

The human factor

Characteristics of successful projects

Summary

External Data Science Resources

Big data resources

NoSQL products

Languages and tools

Creating dashboards

Notebooks

Visualization libraries

Courses on R

Courses on machine learning

Machine learning and deep learning links

Web-based machine learning services

Movies

Machine learning books from Packt

Books for leisure reading

Other Books You May Enjoy

Leave a review - let other readers know what you think

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部