万本电子书0元读

万本电子书0元读

顶部广告

Mastering Java for Data Science电子书

售       价:¥

0人正在读 | 0人评论 9.8

作       者:Alexey Grigorev

出  版  社:Packt Publishing

出版时间:2017-04-27

字       数:49.2万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Java is the most popular programming language, according to the TIOBE index, and it is a typical choice for running production systems in many companies, both in the startup world and among large enterprises. Not surprisingly, it is also a common choice for creating data science applications: it is fast and has a great set of data processing tools, both built-in and external. What is more, choosing Java for data science allows you to easily integrate solutions with existing software, and bring data science into production with less effort. This book will teach you how to create data science applications with Java. First, we will revise the most important things when starting a data science application, and then brush up the basics of Java and machine learning before diving into more advanced topics. We start by going over the existing libraries for data processing and libraries with machine learning algorithms. After that, we cover topics such as classification and regression, dimensionality reduction and clustering, information retrieval and natural language processing, and deep learning and big data. Finally, we finish the book by talking about the ways to deploy the model and evaluate it in production settings. What you will learn ?Get a solid understanding of the data processing toolbox available in Java ?Explore the data science ecosystem available in Java
目录展开

Title Page

Copyright

Credits

About the Author

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Data Science Using Java

Data science

Machine learning

Supervised learning

Unsupervised learning

Clustering

Dimensionality reduction

Natural Language Processing

Data science process models

CRISP-DM

A running example

Data science in Java

Data science libraries

Data processing libraries

Math and stats libraries

Machine learning and data mining libraries

Text processing

Summary

Data Processing Toolbox

Standard Java library

Collections

Input/Output

Reading input data

Writing ouput data

Streaming API

Extensions to the standard library

Apache Commons

Commons Lang

Commons IO

Commons Collections

Other commons modules

Google Guava

AOL Cyclops React

Accessing data

Text data and CSV

Web and HTML

JSON

Databases

DataFrames

Search engine - preparing data

Summary

Exploratory Data Analysis

Exploratory data analysis in Java

Search engine datasets

Apache Commons Math

Joinery

Interactive Exploratory Data Analysis in Java

JVM languages

Interactive Java

Joinery shell

Summary

Supervised Learning - Classification and Regression

Classification

Binary classification models

Smile

JSAT

LIBSVM and LIBLINEAR

Encog

Evaluation

Accuracy

Precision, recall, and F1

ROC and AU ROC (AUC)

Result validation

K-fold cross-validation

Training, validation, and testing

Case study - page prediction

Regression

Machine learning libraries for regression

Smile

JSAT

Other libraries

Evaluation

MSE

MAE

Case study - hardware performance

Summary

Unsupervised Learning - Clustering and Dimensionality Reduction

Dimensionality reduction

Unsupervised dimensionality reduction

Principal Component Analysis

Truncated SVD

Truncated SVD for categorical and sparse data

Random projection

Cluster analysis

Hierarchical methods

K-means

Choosing K in K-Means

DBSCAN

Clustering for supervised learning

Clusters as features

Clustering as dimensionality reduction

Supervised learning via clustering

Evaluation

Manual evaluation

Supervised evaluation

Unsupervised Evaluation

Summary

Working with Text - Natural Language Processing and Information Retrieval

Natural Language Processing and information retrieval

Vector Space Model - Bag of Words and TF-IDF

Vector space model implementation

Indexing and Apache Lucene

Natural Language Processing tools

Stanford CoreNLP

Customizing Apache Lucene

Machine learning for texts

Unsupervised learning for texts

Latent Semantic Analysis

Text clustering

Word embeddings

Supervised learning for texts

Text classification

Learning to rank for information retrieval

Reranking with Lucene

Summary

Extreme Gradient Boosting

Gradient Boosting Machines and XGBoost

Installing XGBoost

XGBoost in practice

XGBoost for classification

Parameter tuning

Text features

Feature importance

XGBoost for regression

XGBoost for learning to rank

Summary

Deep Learning with DeepLearning4J

Neural Networks and DeepLearning4J

ND4J - N-dimensional arrays for Java

Neural networks in DeepLearning4J

Convolutional Neural Networks

Deep learning for cats versus dogs

Reading the data

Creating the model

Monitoring the performance

Data augmentation

Running DeepLearning4J on GPU

Summary

Scaling Data Science

Apache Hadoop

Hadoop MapReduce

Common Crawl

Apache Spark

Link prediction

Reading the DBLP graph

Extracting features from the graph

Node features

Negative sampling

Edge features

Link Prediction with MLlib and XGBoost

Link suggestion

Summary

Deploying Data Science Models

Microservices

Spring Boot

Search engine service

Online evaluation

A/B testing

Multi-armed bandits

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部