万本电子书0元读

万本电子书0元读

顶部广告

Hands-On Data Analysis with Scala电子书

售       价:¥

3人正在读 | 0人评论 9.8

作       者:Rajesh Gupta

出  版  社:Packt Publishing

出版时间:2019-05-03

字       数:36.2万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Master scala's advanced techniques to solve real-world problems in data analysis and gain valuable insights from your data Key Features * A beginner's guide for performing data analysis loaded with numerous rich, practical examples * Access to popular Scala libraries such as Breeze, Saddle for efficient data manipulation and exploratory analysis * Develop applications in Scala for real-time analysis and machine learning in Apache Spark Book Description Efficient business decisions with an accurate sense of business data helps in delivering better performance across products and services. This book helps you to leverage the popular Scala libraries and tools for performing core data analysis tasks with ease. The book begins with a quick overview of the building blocks of a standard data analysis process. You will learn to perform basic tasks like Extraction, Staging, Validation, Cleaning, and Shaping of datasets. You will later deep dive into the data exploration and visualization areas of the data analysis life cycle. You will make use of popular Scala libraries like Saddle, Breeze, Vegas, and PredictionIO for processing your datasets. You will learn statistical methods for deriving meaningful insights from data. You will also learn to create applications for Apache Spark 2.x on complex data analysis, in real-time. You will discover traditional machine learning techniques for doing data analysis. Furthermore, you will also be introduced to neural networks and deep learning from a data analysis standpoint. By the end of this book, you will be capable of handling large sets of structured and unstructured data, perform exploratory analysis, and building efficient Scala applications for discovering and delivering insights What you will learn * Techniques to determine the validity and confidence level of data * Apply quartiles and n-tiles to datasets to see how data is distributed into many buckets * Create data pipelines that combine multiple data lifecycle steps * Use built-in features to gain a deeper understanding of the data * Apply Lasso regression analysis method to your data * Compare Apache Spark API with traditional Apache Spark data analysis Who this book is for If you are a data scientist or a data analyst who wants to learn how to perform data analysis using Scala, this book is for you. All you need is knowledge of the basic fundamentals of Scala programming.
目录展开

Dedication

About Packt

Why subscribe?

Packt.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Scala and Data Analysis Life Cycle

Scala Overview

Getting started with Scala

Running Scala code online

Scastie

ScalaFiddle

Installing Scala on your computer

Installing command-line tools

Installing IDE

Overview of object-oriented and functional programming

Object-oriented programming using Scala

Functional programming using Scala

Scala case classes and the collection API

Scala case classes

Scala collection API

Array

List

Map

Overview of Scala libraries for data analysis

Apache Spark

Breeze

Breeze-viz

DeepLearning

Epic

Saddle

Scalalab

Smile

Vegas

Summary

Data Analysis Life Cycle

Data journey

Sourcing data

Data formats

XML

JSON

CSV

Understanding data

Using statistical methods for data exploration

Using Scala

Other Scala tools

Using data visualization for data exploration

Using the vegas-viz library for data visualization

Other libraries for data visualization

Using ML to learn from data

Setting up Smile

Running Smile

Creating a data pipeline

Summary

Data Ingestion

Data extraction

Pull-oriented data extraction

Push-oriented data delivery

Data staging

Why is the staging important?

Cleaning and normalizing

Enriching

Organizing and storing

Summary

Data Exploration and Visualization

Sampling data

Selecting the sample

Selecting samples using Saddle

Performing ad hoc analysis

Finding a relationship between data elements

Visualizing data

Vegas viz for data visualization

Spark Notebook for data visualization

Downloading and installing Spark Notebook

Creating a Spark Notebook with simple visuals

More charts with Spark Notebook

Box plot

Histogram

Bubble chart

Summary

Applying Statistics and Hypothesis Testing

Basics of statistics

Summary level statistics

Correlation statistics

Vector level statistics

Random data generation

Pseudorandom numbers

Random numbers with normal distribution

Random numbers with Poisson distribution

Hypothesis testing

Summary

Section 2: Advanced Data Analysis and Machine Learning

Introduction to Spark for Distributed Data Analysis

Spark setup and overview

Spark core concepts

Spark Datasets and DataFrames

Sourcing data using Spark

Parquet file format

Avro file format

Spark JDBC integration

Using Spark to explore data

Summary

Traditional Machine Learning for Data Analysis

ML overview

Characteristics of ML

Categories or types of ML

Decision trees

Implementing decision trees

Decision tree algorithms

Implementing decision tree algorithms in our example

Evaluating the results

Using our model with a decision tree

Random forest

Random forest algorithms

Ridge and lasso regression

Characteristics of ridge regression

Characteristics of lasso regression

k-means cluster analysis

Natural language processing for data analysis

Algorithm selections

Summary

Section 3: Real-Time Data Analysis and Scalability

Near Real-Time Data Analysis Using Streaming

Overview of streaming

Spark Streaming overview

Word count using pure Scala

Word count using Scala and Spark

Word count using Scala and Spark Streaming

Deep dive into the Spark Streaming solution

Streaming a k-means clustering algorithm using Spark

Streaming linear regression using Spark

Summary

Working with Data at Scale

Working with data at scale

Cost considerations

Data storage

Data governance

Reliability considerations

Input data errors

Processing failures

Summary

Another Book You May Enjoy

Leave a review - let other readers know what you think

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部