售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Scala for Data Science
Table of Contents
Scala for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Installing the JDK
Installing and using SBT
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
eBooks, discount offers, and more
Questions
1. Scala and Data Science
Data science
Programming in data science
Why Scala?
Static typing and type inference
Scala encourages immutability
Scala and functional programs
Null pointer uncertainty
Easier parallelism
Interoperability with Java
When not to use Scala
Summary
References
2. Manipulating Data with Breeze
Code examples
Installing Breeze
Getting help on Breeze
Basic Breeze data types
Vectors
Dense and sparse vectors and the vector trait
Matrices
Building vectors and matrices
Advanced indexing and slicing
Mutating vectors and matrices
Matrix multiplication, transposition, and the orientation of vectors
Data preprocessing and feature engineering
Breeze – function optimization
Numerical derivatives
Regularization
An example – logistic regression
Towards re-usable code
Alternatives to Breeze
Summary
References
3. Plotting with breeze-viz
Diving into Breeze
Customizing plots
Customizing the line type
More advanced scatter plots
Multi-plot example – scatterplot matrix plots
Managing without documentation
Breeze-viz reference
Data visualization beyond breeze-viz
Summary
4. Parallel Collections and Futures
Parallel collections
Limitations of parallel collections
Error handling
Setting the parallelism level
An example – cross-validation with parallel collections
Futures
Future composition – using a future's result
Blocking until completion
Controlling parallel execution with execution contexts
Futures example – stock price fetcher
Summary
References
5. Scala and SQL through JDBC
Interacting with JDBC
First steps with JDBC
Connecting to a database server
Creating tables
Inserting data
Reading data
JDBC summary
Functional wrappers for JDBC
Safer JDBC connections with the loan pattern
Enriching JDBC statements with the "pimp my library" pattern
Wrapping result sets in a stream
Looser coupling with type classes
Type classes
Coding against type classes
When to use type classes
Benefits of type classes
Creating a data access layer
Summary
References
6. Slick – A Functional Interface for SQL
FEC data
Importing Slick
Defining the schema
Connecting to the database
Creating tables
Inserting data
Querying data
Invokers
Operations on columns
Aggregations with "Group by"
Accessing database metadata
Slick versus JDBC
Summary
References
7. Web APIs
A whirlwind tour of JSON
Querying web APIs
JSON in Scala – an exercise in pattern matching
JSON4S types
Extracting fields using XPath
Extraction using case classes
Concurrency and exception handling with futures
Authentication – adding HTTP headers
HTTP – a whirlwind overview
Adding headers to HTTP requests in Scala
Summary
References
8. Scala and MongoDB
MongoDB
Connecting to MongoDB with Casbah
Connecting with authentication
Inserting documents
Extracting objects from the database
Complex queries
Casbah query DSL
Custom type serialization
Beyond Casbah
Summary
References
9. Concurrency with Akka
GitHub follower graph
Actors as people
Hello world with Akka
Case classes as messages
Actor construction
Anatomy of an actor
Follower network crawler
Fetcher actors
Routing
Message passing between actors
Queue control and the pull pattern
Accessing the sender of a message
Stateful actors
Follower network crawler
Fault tolerance
Custom supervisor strategies
Life-cycle hooks
What we have not talked about
Summary
References
10. Distributed Batch Processing with Spark
Installing Spark
Acquiring the example data
Resilient distributed datasets
RDDs are immutable
RDDs are lazy
RDDs know their lineage
RDDs are resilient
RDDs are distributed
Transformations and actions on RDDs
Persisting RDDs
Key-value RDDs
Double RDDs
Building and running standalone programs
Running Spark applications locally
Reducing logging output and Spark configuration
Running Spark applications on EC2
Spam filtering
Lifting the hood
Data shuffling and partitions
Summary
Reference
11. Spark SQL and DataFrames
DataFrames – a whirlwind introduction
Aggregation operations
Joining DataFrames together
Custom functions on DataFrames
DataFrame immutability and persistence
SQL statements on DataFrames
Complex data types – arrays, maps, and structs
Structs
Arrays
Maps
Interacting with data sources
JSON files
Parquet files
Standalone programs
Summary
References
12. Distributed Machine Learning with MLlib
Introducing MLlib – Spam classification
Pipeline components
Transformers
Estimators
Evaluation
Regularization in logistic regression
Cross-validation and model selection
Beyond logistic regression
Summary
References
13. Web APIs with Play
Client-server applications
Introduction to web frameworks
Model-View-Controller architecture
Single page applications
Building an application
The Play framework
Dynamic routing
Actions
Composing the response
Understanding and parsing the request
Interacting with JSON
Querying external APIs and consuming JSON
Calling external web services
Parsing JSON
Asynchronous actions
Creating APIs with Play: a summary
Rest APIs: best practice
Summary
References
14. Visualization with D3 and the Play Framework
GitHub user data
Do I need a backend?
JavaScript dependencies through web-jars
Towards a web application: HTML templates
Modular JavaScript through RequireJS
Bootstrapping the applications
Client-side program architecture
Designing the model
The event bus
AJAX calls through JQuery
Response views
Drawing plots with NVD3
Summary
References
A. Pattern Matching and Extractors
Pattern matching in for comprehensions
Pattern matching internals
Extracting sequences
Summary
Reference
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜