万本电子书0元读

万本电子书0元读

顶部广告

Java for Data Science电子书

售       价:¥

10人正在读 | 0人评论 9.8

作       者:Richard M. Reese

出  版  社:Packt Publishing

出版时间:2017-01-01

字       数:314.9万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Examine the techniques and Java tools supporting the growing field of data science About This Book Your entry ticket to the world of data science with the stability and power of Java Explore, analyse, and visualize your data effectively using easy-to-follow examples Make your Java applications more capable using machine learning Who This Book Is For This book is for Java developers who are comfortable developing applications in Java. Those who now want to enter the world of data science or wish to build intelligent applications will find this book ideal. Aspiring data scientists will also find this book very helpful. What You Will Learn Understand the nature and key concepts used in the field of data science Grasp how data is collected, cleaned, and processed Become comfortable with key data analysis techniques See specialized analysis techniques centered on machine learning Master the effective visualization of your data Work with the Java APIs and techniques used to perform data analysis In Detail Data science is concerned with extracting knowledge and insights from a wide variety of data sources to analyse patterns or predict future behaviour. It draws from a wide array of disciplines including statistics, computer science, mathematics, machine learning, and data mining. In this book, we cover the important data science concepts and how they are supported by Java, as well as the often statistically challenging techniques, to provide you with an understanding of their purpose and application. The book starts with an introduction of data science, followed by the basic data science tasks of data collection, data cleaning, data analysis, and data visualization. This is followed by a discussion of statistical techniques and more advanced topics including machine learning, neural networks, and deep learning. The next section examines the major categories of data analysis including text, visual, and audio data, followed by a discussion of resources that support parallel implementation. The final chapter illustrates an in-depth data science problem and provides a comprehensive, Java-based solution. Due to the nature of the topic, simple examples of techniques are presented early followed by a more detailed treatment later in the book. This permits a more natural introduction to the techniques and concepts presented in the book. Style and approach This book follows a tutorial approach, providing examples of each of the major concepts covered. With a step-by-step instructional style, this book covers various facets of data science and will get you up and running quickly.
目录展开

Java for Data Science

Java for Data Science

Credits

About the Authors

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Getting Started with Data Science

Problems solved using data science

Understanding the data science problem - solving approach

Using Java to support data science

Acquiring data for an application

The importance and process of cleaning data

Visualizing data to enhance understanding

The use of statistical methods in data science

Machine learning applied to data science

Using neural networks in data science

Deep learning approaches

Performing text analysis

Visual and audio analysis

Improving application performance using parallel techniques

Assembling the pieces

Summary

2. Data Acquisition

Understanding the data formats used in data science applications

Overview of CSV data

Overview of spreadsheets

Overview of databases

Overview of PDF files

Overview of JSON

Overview of XML

Overview of streaming data

Overview of audio/video/images in Java

Data acquisition techniques

Using the HttpUrlConnection class

Web crawlers in Java

Creating your own web crawler

Using the crawler4j web crawler

Web scraping in Java

Using API calls to access common social media sites

Using OAuth to authenticate users

Handing Twitter

Handling Wikipedia

Handling Flickr

Handling YouTube

Searching by keyword

Summary

3. Data Cleaning

Handling data formats

Handling CSV data

Handling spreadsheets

Handling Excel spreadsheets

Handling PDF files

Handling JSON

Using JSON streaming API

Using the JSON tree API

The nitty gritty of cleaning text

Using Java tokenizers to extract words

Java core tokenizers

Third-party tokenizers and libraries

Transforming data into a usable form

Simple text cleaning

Removing stop words

Finding words in text

Finding and replacing text

Data imputation

Subsetting data

Sorting text

Data validation

Validating data types

Validating dates

Validating e-mail addresses

Validating ZIP codes

Validating names

Cleaning images

Changing the contrast of an image

Smoothing an image

Brightening an image

Resizing an image

Converting images to different formats

Summary

4. Data Visualization

Understanding plots and graphs

Visual analysis goals

Creating index charts

Creating bar charts

Using country as the category

Using decade as the category

Creating stacked graphs

Creating pie charts

Creating scatter charts

Creating histograms

Creating donut charts

Creating bubble charts

Summary

5. Statistical Data Analysis Techniques

Working with mean, mode, and median

Calculating the mean

Using simple Java techniques to find mean

Using Java 8 techniques to find mean

Using Google Guava to find mean

Using Apache Commons to find mean

Calculating the median

Using simple Java techniques to find median

Using Apache Commons to find the median

Calculating the mode

Using ArrayLists to find multiple modes

Using a HashMap to find multiple modes

Using a Apache Commons to find multiple modes

Standard deviation

Sample size determination

Hypothesis testing

Regression analysis

Using simple linear regression

Using multiple regression

Summary

6. Machine Learning

Supervised learning techniques

Decision trees

Decision tree types

Decision tree libraries

Using a decision tree with a book dataset

Testing the book decision tree

Support vector machines

Using an SVM for camping data

Testing individual instances

Bayesian networks

Using a Bayesian network

Unsupervised machine learning

Association rule learning

Using association rule learning to find buying relationships

Reinforcement learning

Summary

7. Neural Networks

Training a neural network

Getting started with neural network architectures

Understanding static neural networks

A basic Java example

Understanding dynamic neural networks

Multilayer perceptron networks

Building the model

Evaluating the model

Predicting other values

Saving and retrieving the model

Learning vector quantization

Self-Organizing Maps

Using a SOM

Displaying the SOM results

Additional network architectures and algorithms

The k-Nearest Neighbors algorithm

Instantaneously trained networks

Spiking neural networks

Cascading neural networks

Holographic associative memory

Backpropagation and neural networks

Summary

8. Deep Learning

Deeplearning4j architecture

Acquiring and manipulating data

Reading in a CSV file

Configuring and building a model

Using hyperparameters in ND4J

Instantiating the network model

Training a model

Testing a model

Deep learning and regression analysis

Preparing the data

Setting up the class

Reading and preparing the data

Building the model

Evaluating the model

Restricted Boltzmann Machines

Reconstruction in an RBM

Configuring an RBM

Deep autoencoders

Building an autoencoder in DL4J

Configuring the network

Building and training the network

Saving and retrieving a network

Specialized autoencoders

Convolutional networks

Building the model

Evaluating the model

Recurrent Neural Networks

Summary

9. Text Analysis

Implementing named entity recognition

Using OpenNLP to perform NER

Identifying location entities

Classifying text

Word2Vec and Doc2Vec

Classifying text by labels

Classifying text by similarity

Understanding tagging and POS

Using OpenNLP to identify POS

Understanding POS tags

Extracting relationships from sentences

Using OpenNLP to extract relationships

Sentiment analysis

Downloading and extracting the Word2Vec model

Building our model and classifying text

Summary

10. Visual and Audio Analysis

Text-to-speech

Using FreeTTS

Getting information about voices

Gathering voice information

Understanding speech recognition

Using CMUPhinx to convert speech to text

Obtaining more detail about the words

Extracting text from an image

Using Tess4j to extract text

Identifying faces

Using OpenCV to detect faces

Classifying visual data

Creating a Neuroph Studio project for classifying visual images

Training the model

Summary

11. Mathematical and Parallel Techniques for Data Analysis

Implementing basic matrix operations

Using GPUs with DeepLearning4j

Using map-reduce

Using Apache's Hadoop to perform map-reduce

Writing the map method

Writing the reduce method

Creating and executing a new Hadoop job

Various mathematical libraries

Using the jblas API

Using the Apache Commons math API

Using the ND4J API

Using OpenCL

Using Aparapi

Creating an Aparapi application

Using Aparapi for matrix multiplication

Using Java 8 streams

Understanding Java 8 lambda expressions and streams

Using Java 8 to perform matrix multiplication

Using Java 8 to perform map-reduce

Summary

12. Bringing It All Together

Defining the purpose and scope of our application

Understanding the application's architecture

Data acquisition using Twitter

Understanding the TweetHandler class

Extracting data for a sentiment analysis model

Building the sentiment model

Processing the JSON input

Cleaning data to improve our results

Removing stop words

Performing sentiment analysis

Analysing the results

Other optional enhancements

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部