售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Title Page
Copyright and Credits
Practical Big Data Analytics
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Too Big or Not Too Big
What is big data?
A brief history of data
Dawn of the information age
Dr. Alan Turing and modern computing
The advent of the stored-program computer
From magnetic devices to SSDs
Why we are talking about big data now if data has always existed
Definition of big data
Building blocks of big data analytics
Types of Big Data
Structured
Unstructured
Semi-structured
Sources of big data
The 4Vs of big data
When do you know you have a big data problem and where do you start your search for the big data solution?
Summary
Big Data Mining for the Masses
What is big data mining?
Big data mining in the enterprise
Building the case for a Big Data strategy
Implementation life cycle
Stakeholders of the solution
Implementing the solution
Technical elements of the big data platform
Selection of the hardware stack
Selection of the software stack
Summary
The Analytics Toolkit
Components of the Analytics Toolkit
System recommendations
Installing on a laptop or workstation
Installing on the cloud
Installing Hadoop
Installing Oracle VirtualBox
Installing CDH in other environments
Installing Packt Data Science Box
Installing Spark
Installing R
Steps for downloading and installing Microsoft R Open
Installing RStudio
Installing Python
Summary
Big Data With Hadoop
The fundamentals of Hadoop
The fundamental premise of Hadoop
The core modules of Hadoop
Hadoop Distributed File System - HDFS
Data storage process in HDFS
Hadoop MapReduce
An intuitive introduction to MapReduce
A technical understanding of MapReduce
Block size and number of mappers and reducers
Hadoop YARN
Job scheduling in YARN
Other topics in Hadoop
Encryption
User authentication
Hadoop data storage formats
New features expected in Hadoop 3
The Hadoop ecosystem
Hands-on with CDH
WordCount using Hadoop MapReduce
Analyzing oil import prices with Hive
Joining tables in Hive
Summary
Big Data Mining with NoSQL
Why NoSQL?
The ACID, BASE, and CAP properties
ACID and SQL
The BASE property of NoSQL
The CAP theorem
The need for NoSQL technologies
Google Bigtable
Amazon Dynamo
NoSQL databases
In-memory databases
Columnar databases
Document-oriented databases
Key-value databases
Graph databases
Other NoSQL types and summary of other types of databases
Analyzing Nobel Laureates data with MongoDB
JSON format
Installing and using MongoDB
Tracking physician payments with real-world data
Installing kdb+, R, and RStudio
Installing kdb+
Installing R
Installing RStudio
The CMS Open Payments Portal
Downloading the CMS Open Payments data
Creating the Q application
Loading the data
The backend code
Creating the frontend web portal
R Shiny platform for developers
Putting it all together - The CMS Open Payments application
Applications
Summary
Spark for Big Data Analytics
The advent of Spark
Limitations of Hadoop
Overcoming the limitations of Hadoop
Theoretical concepts in Spark
Resilient distributed datasets
Directed acyclic graphs
SparkContext
Spark DataFrames
Actions and transformations
Spark deployment options
Spark APIs
Core components in Spark
Spark Core
Spark SQL
Spark Streaming
GraphX
MLlib
The architecture of Spark
Spark solutions
Spark practicals
Signing up for Databricks Community Edition
Spark exercise - hands-on with Spark (Databricks)
Summary
An Introduction to Machine Learning Concepts
What is machine learning?
The evolution of machine learning
Factors that led to the success of machine learning
Machine learning, statistics, and AI
Categories of machine learning
Supervised and unsupervised machine learning
Supervised machine learning
Vehicle Mileage, Number Recognition and other examples
Unsupervised machine learning
Subdividing supervised machine learning
Common terminologies in machine learning
The core concepts in machine learning
Data management steps in machine learning
Pre-processing and feature selection techniques
Centering and scaling
The near-zero variance function
Removing correlated variables
Other common data transformations
Data sampling
Data imputation
The importance of variables
The train, test splits, and cross-validation concepts
Splitting the data into train and test sets
The cross-validation parameter
Creating the model
Leveraging multicore processing in the model
Summary
Machine Learning Deep Dive
The bias, variance, and regularization properties
The gradient descent and VC Dimension theories
Popular machine learning algorithms
Regression models
Association rules
Confidence
Support
Lift
Decision trees
The Random forest extension
Boosting algorithms
Support vector machines
The K-Means machine learning technique
The neural networks related algorithms
Tutorial - associative rules mining with CMS data
Downloading the data
Writing the R code for Apriori
Shiny (R Code)
Using custom CSS and fonts for the application
Running the application
Summary
Enterprise Data Science
Enterprise data science overview
A roadmap to enterprise analytics success
Data science solutions in the enterprise
Enterprise data warehouse and data mining
Traditional data warehouse systems
Oracle Exadata, Exalytics, and TimesTen
HP Vertica
Teradata
IBM data warehouse systems (formerly Netezza appliances)
PostgreSQL
Greenplum
SAP Hana
Enterprise and open source NoSQL Databases
Kdb+
MongoDB
Cassandra
Neo4j
Cloud databases
Amazon Redshift, Redshift Spectrum, and Athena databases
Google BigQuery and other cloud services
Azure CosmosDB
GPU databases
Brytlyt
MapD
Other common databases
Enterprise data science – machine learning and AI
The R programming language
Python
OpenCV, Caffe, and others
Spark
Deep learning
H2O and Driverless AI
Datarobot
Command-line tools
Apache MADlib
Machine learning as a service
Enterprise infrastructure solutions
Cloud computing
Virtualization
Containers – Docker, Kubernetes, and Mesos
On-premises hardware
Enterprise Big Data
Tutorial – using RStudio in the cloud
Summary
Closing Thoughts on Big Data
Corporate big data and data science strategy
Ethical considerations
Silicon Valley and data science
The human factor
Characteristics of successful projects
Summary
External Data Science Resources
Big data resources
NoSQL products
Languages and tools
Creating dashboards
Notebooks
Visualization libraries
Courses on R
Courses on machine learning
Machine learning and deep learning links
Web-based machine learning services
Movies
Machine learning books from Packt
Books for leisure reading
Other Books You May Enjoy
Leave a review - let other readers know what you think
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜