售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Data Analysis with Python
Data Analysis with Python
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Why am I writing this book?
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
1. Programming and Data Science – A New Toolset
What is data science
Is data science here to stay?
Why is data science on the rise?
What does that have to do with developers?
Putting these concepts into practice
Deep diving into a concrete example
Data pipeline blueprint
What kind of skills are required to become a data scientist?
IBM Watson DeepQA
Back to our sentiment analysis of Twitter hashtags project
Lessons learned from building our first enterprise-ready data pipeline
Data science strategy
Jupyter Notebooks at the center of our strategy
Why are Notebooks so popular?
Summary
2. Python and Jupyter Notebooks to Power your Data Analysis
Why choose Python?
Introducing PixieDust
SampleData – a simple API for loading data
Wrangling data with pixiedust_rosie
Display – a simple interactive API for data visualization
Filtering
Bridging the gap between developers and data scientists with PixieApps
Architecture for operationalizing data science analytics
Summary
3. Accelerate your Data Analysis with Python Libraries
Anatomy of a PixieApp
Routes
Generating requests to routes
A GitHub project tracking sample application
Displaying the search results in a table
Invoking the PixieDust display() API using pd_entity attribute
Invoking arbitrary Python code with pd_script
Making the application more responsive with pd_refresh
Creating reusable widgets
Summary
4. Publish your Data Analysis to the Web - the PixieApp Tool
Overview of Kubernetes
Installing and configuring the PixieGateway server
PixieGateway server configuration
PixieGateway architecture
Publishing an application
Encoding state in the PixieApp URL
Sharing charts by publishing them as web pages
PixieGateway admin console
Python Console
Displaying warmup and run code for a PixieApp
Summary
5. Python and PixieDust Best Practices and Advanced Concepts
Use @captureOutput decorator to integrate the output of third-party Python libraries
Create a word cloud image with @captureOutput
Increase modularity and code reuse
Creating a widget with pd_widget
PixieDust support of streaming data
Adding streaming capabilities to your PixieApp
Adding dashboard drill-downs with PixieApp events
Extending PixieDust visualizations
Debugging
Debugging on the Jupyter Notebook using pdb
Visual debugging with PixieDebugger
Debugging PixieApp routes with PixieDebugger
Troubleshooting issues using PixieDust logging
Client-side debugging
Run Node.js inside a Python Notebook
Summary
6. Analytics Study: AI and Image Recognition with TensorFlow
What is machine learning?
What is deep learning?
Getting started with TensorFlow
Simple classification with DNNClassifier
Image recognition sample application
Part 1 – Load the pretrained MobileNet model
Part 2 – Create a PixieApp for our image recognition sample application
Part 3 – Integrate the TensorBoard graph visualization
Part 4 – Retrain the model with custom training data
Summary
7. Analytics Study: NLP and Big Data with Twitter Sentiment Analysis
Getting started with Apache Spark
Apache Spark architecture
Configuring Notebooks to work with Spark
Twitter sentiment analysis application
Part 1 – Acquiring the data with Spark Structured Streaming
Architecture diagram for the data pipeline
Authentication with Twitter
Creating the Twitter stream
Creating a Spark Streaming DataFrame
Creating and running a structured query
Monitoring active streaming queries
Creating a batch DataFrame from the Parquet files
Part 2 – Enriching the data with sentiment and most relevant extracted entity
Getting started with the IBM Watson Natural Language Understanding service
Part 3 – Creating a real-time dashboard PixieApp
Refactoring the analytics into their own methods
Creating the PixieApp
Part 4 – Adding scalability with Apache Kafka and IBM Streams Designer
Streaming the raw tweets to Kafka
Enriching the tweets data with the Streaming Analytics service
Creating a Spark Streaming DataFrame with a Kafka input source
Summary
8. Analytics Study: Prediction - Financial Time Series Analysis and Forecasting
Getting started with NumPy
Creating a NumPy array
Operations on ndarray
Selections on NumPy arrays
Broadcasting
Statistical exploration of time series
Hypothetical investment
Autocorrelation function (ACF) and partial autocorrelation function (PACF)
Putting it all together with the StockExplorer PixieApp
BaseSubApp – base class for all the child PixieApps
StockExploreSubApp – first child PixieApp
MovingAverageSubApp – second child PixieApp
AutoCorrelationSubApp – third child PixieApp
Time series forecasting using the ARIMA model
Build an ARIMA model for the MSFT stock time series
StockExplorer PixieApp Part 2 – add time series forecasting using the ARIMA model
Summary
9. Analytics Study: Graph Algorithms - US Domestic Flight Data Analysis
Introduction to graphs
Graph representations
Graph algorithms
Graph and big data
Getting started with the networkx graph library
Creating a graph
Visualizing a graph
Part 1 – Loading the US domestic flight data into a graph
Graph centrality
Part 2 – Creating the USFlightsAnalysis PixieApp
Part 3 – Adding data exploration to the USFlightsAnalysis PixieApp
Part 4 – Creating an ARIMA model for predicting flight delays
Summary
10. The Future of Data Analysis and Where to Develop your Skills
Forward thinking – what to expect for AI and data science
References
A. PixieApp Quick-Reference
Annotations
Custom HTML attributes
Methods
Other Books You May Enjoy
Leave a review – let other readers know what you think
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜