万本电子书0元读

万本电子书0元读

顶部广告

Apache Spark for Data Science Cookbook电子书

售       价:¥

4人正在读 | 0人评论 9.8

作       者:Padma Priya Chitturi

出  版  社:Packt Publishing

出版时间:2016-12-01

字       数:273.7万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Over insightful 90 recipes to get lightning-fast analytics with Apache Spark About This Book Use Apache Spark for data processing with these hands-on recipes Implement end-to-end, large-scale data analysis better than ever before Work with powerful libraries such as MLLib, SciPy, NumPy, and Pandas to gain insights from your data Who This Book Is For This book is for novice and intermediate level data science professionals and data analysts who want to solve data science problems with a distributed computing framework. Basic experience with data science implementation tasks is expected. Data science professionals looking to skill up and gain an edge in the field will find this book helpful. What You Will Learn Explore the topics of data mining, text mining, Natural Language Processing, information retrieval, and machine learning. Solve real-world analytical problems with large data sets. Address data science challenges with analytical tools on a distributed system like Spark (apt for iterative algorithms), which offers in-memory processing and more flexibility for data analysis at scale. Get hands-on experience with algorithms like Classification, regression, and recommendation on real datasets using Spark MLLib package. Learn about numerical and scientific computing using NumPy and SciPy on Spark. Use Predictive Model Markup Language (PMML) in Spark for statistical data mining models. In Detail Spark has emerged as the most promising big data analytics engine for data science professionals. The true power and value of Apache Spark lies in its ability to execute data science tasks with speed and accuracy. Spark’s selling point is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. It lets you tackle the complexities that come with raw unstructured data sets with ease. This guide will get you comfortable and confident performing data science tasks with Spark. You will learn about implementations including distributed deep learning, numerical computing, and scalable machine learning. You will be shown effective solutions to problematic concepts in data science using Spark’s data science libraries such as MLLib, Pandas, NumPy, SciPy, and more. These simple and efficient recipes will show you how to implement algorithms and optimize your work. Style and approach This book contains a comprehensive range of recipes designed to help you learn the fundamentals and tackle the difficulties of data science. This book outlines practical steps to produce powerful insights into Big Data through a recipe-based approach.
目录展开

Apache Spark for Data Science Cookbook

Apache Spark for Data Science Cookbook

Credits

About the Author

About the Reviewer

www.PacktPub.com

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Big Data Analytics with Spark

Introduction

Initializing SparkContext

Getting ready

How to do it…

How it works…

There's more…

See also

Working with Spark's Python and Scala shells

How to do it…

How it works…

There's more…

See also

Building standalone applications

Getting ready

How to do it…

How it works…

There's more…

See also

Working with the Spark programming model

How to do it…

How it works…

There's more…

See also

Working with pair RDDs

Getting ready

How to do it…

How it works…

There's more…

See also

Persisting RDDs

Getting ready

How to do it…

How it works…

There's more…

See also

Loading and saving data

Getting ready

How to do it…

How it works…

There's more…

See also

Creating broadcast variables and accumulators

Getting ready

How to do it…

How it works…

There's more…

See also

Submitting applications to a cluster

Getting ready

How to do it…

How it works…

There's more…

See also

Working with DataFrames

Getting ready

How to do it…

How it works…

There's more…

See also

Working with Spark Streaming

Getting ready

How to do it…

How it works…

There's more…

See also

2. Tricky Statistics with Spark

Introduction

Working with Pandas

Variable identification

Getting ready

How to do it…

How it works…

There's more…

See also

Sampling data

Getting ready

How to do it…

How it works…

There's more…

See also

Summary and descriptive statistics

Getting ready

How to do it…

How it works…

There's more…

See also

Generating frequency tables

Getting ready

How to do it…

How it works…

There's more…

See also

Installing Pandas on Linux

Getting ready

How to do it…

How it works…

There's more…

See also

Installing Pandas from source

Getting ready

How to do it…

How it works…

There's more…

See also

Using IPython with PySpark

Getting ready

How to do it…

How it work…

There's more…

See also

Creating Pandas DataFrames over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

Splitting, slicing, sorting, filtering, and grouping DataFrames over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

Implementing co-variance and correlation using Pandas

Getting ready

How to do it…

How it works…

There's more…

See also

Concatenating and merging operations over DataFrames

Getting ready

How to do it…

How it works…

There's more…

See also

Complex operations over DataFrames

Getting ready

How to do it…

How it works…

There's more…

See also

Sparkling Pandas

Getting ready

How to do it…

How it works…

There's more…

See also

3. Data Analysis with Spark

Introduction

Univariate analysis

Getting ready

How to do it…

How it works…

There's more…

See also

Bivariate analysis

Getting ready

How to do it…

How it works…

There's more…

See also

Missing value treatment

Getting ready

How to do it…

How it works…

There's more…

See also

Outlier detection

Getting ready

How to do it…

How it works…

There's more…

See also

Use case - analyzing the MovieLens dataset

Getting ready

How to do it…

How it works…

There's more…

See also

Use case - analyzing the Uber dataset

Getting ready

How to do it…

How it works…

There's more…

See also

4. Clustering, Classification, and Regression

Introduction

Supervised learning

Unsupervised learning

Applying regression analysis for sales data

Variable identification

Getting ready

How to do it…

How it works…

There's more…

See also

Data exploration

Getting ready

How to do it…

How it works…

There's more…

See also

Feature engineering

Getting ready

How to do it…

How it works…

There's more…

See also

Applying linear regression

Getting ready

How to do it…

How it works…

There's more…

See also

Applying logistic regression on bank marketing data

Variable identification

Getting ready

How to do it…

How it works…

There's more…

See also

Data exploration

Getting ready

How to do it…

How it works…

There's more…

See also

Feature engineering

Getting ready

How to do it…

How it works…

There's more…

See also

Applying logistic regression

Getting ready

How to do it…

How it works…

There's more…

See also

Real-time intrusion detection using streaming k-means

Variable identification

Getting ready

How to do it…

How it works…

There's more…

See also

Simulating real-time data

Getting ready

How to do it…

How it works…

There's more…

See also

Applying streaming k-means

Getting ready

How to do it…

How it works…

There's more…

See also

5. Working with Spark MLlib

Introduction

Working with Spark ML pipelines

Implementing Naive Bayes' classification

Getting ready

How to do it…

How it works…

There's more…

See also

Implementing decision trees

Getting ready

How to do it…

How it works…

There's more…

See also

Building a recommendation system

Getting ready

How to do it…

How it works…

There's more…

See also

Implementing logistic regression using Spark ML pipelines

Getting ready

How to do it…

How it works…

There's more…

See also

6. NLP with Spark

Introduction

Installing NLTK on Linux

Getting ready

How to do it…

How it works…

There's more…

See also

Installing Anaconda on Linux

Getting ready

How to do it…

How it works…

There's more…

See also

Anaconda for cluster management

Getting ready

How to do it…

How it works…

There's more…

See also

POS tagging with PySpark on an Anaconda cluster

Getting ready

How to do it…

How it works…

There's more…

See also

NER with IPython over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

Implementing openNLP - chunker over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

Implementing openNLP - sentence detector over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

Implementing stanford NLP - lemmatization over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

Implementing sentiment analysis using stanford NLP over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

7. Working with Sparkling Water - H2O

Introduction

Features

Working with H2O on Spark

Getting ready

How to do it…

How it works…

There's more…

See also

Implementing k-means using H2O over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

Implementing spam detection with Sparkling Water

Getting ready

How to do it…

How it works…

There's more…

See also

Deep learning with airlines and weather data

Getting ready

How to do it…

How it works…

There's more…

See also

Implementing a crime detection application

Getting ready

How to do it…

How it works…

There's more…

See also

Running SVM with H2O over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

8. Data Visualization with Spark

Introduction

Visualization using Zeppelin

Getting ready

How to do it…

Installing Zeppelin

Customizing Zeppelin's server and websocket port

Visualizing data on HDFS - parameterizing inputs

Running custom functions

Adding external dependencies to Zeppelin

Pointing to an external Spark Cluster

How to do it…

How it works…

There's more…

See also

Creating scatter plots with Bokeh-Scala

Getting ready

How to do it…

How it works…

There's more…

See also

Creating a time series MultiPlot with Bokeh-Scala

Getting ready

How to do it…

How it work…

There's more…

See also

Creating plots with the lightning visualization server

Getting ready

How to do it…

How it works…

There's more…

See also

Visualize machine learning models with Databricks notebook

Getting ready

How to do it…

How it works…

There's more…

See also

9. Deep Learning on Spark

Introduction

Installing CaffeOnSpark

Getting ready

How to do it…

How it works…

There's more…

See also

Working with CaffeOnSpark

Getting ready

How to do it…

How it works…

There's more…

See also

Running a feed-forward neural network with DeepLearning 4j over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

Running an RBM with DeepLearning4j over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

Running a CNN for learning MNIST with DeepLearning4j over Spark

Getting ready

How to do it…

How it works…

There's more…

See also

Installing TensorFlow

Getting ready

How to do it…

How it works…

There's more…

See also

Working with Spark TensorFlow

Getting ready

How to do it…

How it works…

There's more…

See also

10. Working with SparkR

Introduction

Installing R

Getting ready…

How to do it…

How it works…

There's more…

See also

Interactive analysis with the SparkR shell

Getting ready

How to do it…

How it works…

There's more…

See also

Creating a SparkR standalone application from RStudio

Getting ready

How to do it…

How it works…

There's more…

See also

Creating SparkR DataFrames

Getting ready

How to do it…

How it works…

There's more…

See also

SparkR DataFrame operations

Getting ready

How to do it…

How it works…

There's more…

See also

Applying user-defined functions in SparkR

Getting ready

How to do it…

How it works…

There's more…

See also

Running SQL queries from SparkR and caching DataFrames

Getting ready

How to do it…

How it works…

There's more…

See also

Machine learning with SparkR

Getting ready

How to do it…

How it works…

There's more…

See also

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部