售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Preface
About the Book
About the Authors
Learning Objectives
Approach
Audience
Minimum Hardware Requirements
Software Requirements
Conventions
Installation and Setup
Installing the Code Bundle
Additional Resources
Chapter 1
The Python Data Science Stack
Introduction
Python Libraries and Packages
IPython: A Powerful Interactive Shell
Exercise 1: Interacting with the Python Shell Using the IPython Commands
The Jupyter Notebook
Exercise 2: Getting Started with the Jupyter Notebook
IPython or Jupyter?
Activity 1: IPython and Jupyter
NumPy
SciPy
Matplotlib
Pandas
Using Pandas
Reading Data
Exercise 3: Reading Data with Pandas
Data Manipulation
Selection and Filtering
Selecting Rows Using Slicing
Exercise 4: Data Selection and the .loc Method
Applying a Function to a Column
Activity 2: Working with Data Problems
Data Type Conversion
Exercise 5: Exploring Data Types
Aggregation and Grouping
Exercise 6: Aggregation and Grouping Data
NumPy on Pandas
Exporting Data from Pandas
Exercise 7: Exporting Data in Different Formats
Visualization with Pandas
Activity 3: Plotting Data with Pandas
Summary
Chapter 2
Statistical Visualizations
Introduction
Types of Graphs and When to Use Them
Exercise 8: Plotting an Analytical Function
Components of a Graph
Exercise 9: Creating a Graph
Exercise 10: Creating a Graph for a Mathematical Function
Seaborn
Which Tool Should Be Used?
Types of Graphs
Line Graphs
Time Series Plots
Exercise 11: Creating Line Graphs Using Different Libraries
Pandas DataFrames and Grouped Data
Activity 4: Line Graphs with the Object-Oriented API and Pandas DataFrames
Scatter Plots
Activity 5: Understanding Relationships of Variables Using Scatter Plots
Histograms
Exercise 12: Creating a Histogram of Horsepower Distribution
Boxplots
Exercise 13: Analyzing the Behavior of the Number of Cylinders and Horsepower Using a Boxplot
Changing Plot Design: Modifying Graph Components
Title and Label Configuration for Axis Objects
Exercise 14: Configuring a Title and Labels for Axis Objects
Line Styles and Color
Figure Size
Exercise 15: Working with Matplotlib Style Sheets
Exporting Graphs
Activity 6: Exporting a Graph to a File on Disk
Activity 7: Complete Plot Design
Summary
Chapter 3
Working with Big Data Frameworks
Introduction
Hadoop
Manipulating Data with the HDFS
Exercise 16: Manipulating Files in the HDFS
Spark
Spark SQL and Pandas DataFrames
Exercise 17: Performing DataFrame Operations in Spark
Exercise 18: Accessing Data with Spark
Exercise 19: Reading Data from the Local Filesystem and the HDFS
Exercise 20: Writing Data Back to the HDFS and PostgreSQL
Writing Parquet Files
Exercise 21: Writing Parquet Files
Increasing Analysis Performance with Parquet and Partitions
Exercise 22: Creating a Partitioned Dataset
Handling Unstructured Data
Exercise 23: Parsing Text and Cleaning
Activity 8: Removing Stop Words from Text
Summary
Chapter 4
Diving Deeper with Spark
Introduction
Getting Started with Spark DataFrames
Exercise 24: Specifying the Schema of a DataFrame
Exercise 25: Creating a DataFrame from an Existing RDD
Exercise 25: Creating a DataFrame Using a CSV File
Writing Output from Spark DataFrames
Exercise 27: Converting a Spark DataFrame to a Pandas DataFrame
Exploring Spark DataFrames
Exercise 28: Displaying Basic DataFrame Statistics
Activity 9: Getting Started with Spark DataFrames
Data Manipulation with Spark DataFrames
Exercise 29: Selecting and Renaming Columns from the DataFrame
Exercise 30: Adding and Removing a Column from the DataFrame
Exercise 31: Displaying and Counting Distinct Values in a DataFrame
Exercise 32: Removing Duplicate Rows and Filtering Rows of a DataFrame
Exercise 33: Ordering Rows in a DataFrame
Exercise 34: Aggregating Values in a DataFrame
Activity 10: Data Manipulation with Spark DataFrames
Graphs in Spark
Exercise 35: Creating a Bar Chart
Exercise 36: Creating a Linear Model Plot
Exercise 37: Creating a KDE Plot and a Boxplot
Activity 11: Graphs in Spark
Summary
Chapter 5
Handling Missing Values and Correlation Analysis
Introduction
Setting up the Jupyter Notebook
Missing Values
Exercise 38: Counting Missing Values in a DataFrame
Exercise 39: Counting Missing Values in All DataFrame Columns
Fetching Missing Value Records from the DataFrame
Handling Missing Values in Spark DataFrames
Exercise 40: Removing Records with Missing Values from a DataFrame
Exercise 41: Filling Missing Values with a Constant in a DataFrame Column
Correlation
Exercise 42: Computing Correlation
Activity 12: Missing Value Handling and Correlation Analysis with PySpark DataFrames
Summary
Chapter 6
Exploratory Data Analysis
Introduction
Defining a Business Problem
Problem Identification
Requirement Gathering
Data Pipeline and Workflow
Identifying Measurable Metrics
Documentation and Presentation
Translating a Business Problem into Measurable Metrics and Exploratory Data Analysis (EDA)
Data Gathering
Analysis of Data Generation
KPI Visualization
Feature Importance
Exercise 43: Identify the Target Variable and Related KPIs from the Given Data for the Business Problem
Exercise 44: Generate the Feature Importance of the Target Variable and Carry Out EDA
Structured Approach to the Data Science Project Life Cycle
Data Science Project Life Cycle Phases
Phase 1: Understanding and Defining the Business Problem
Phase 2: Data Access and Discovery
Phase 3: Data Engineering and Pre-processing
Activity 13: Carry Out Mapping to Gaussian Distribution of Numeric Features from the Given Data
Phase 4: Model Development
Summary
Chapter 7
Reproducibility in Big Data Analysis
Introduction
Reproducibility with Jupyter Notebooks
Introduction to the Business Problem
Documenting the Approach and Workflows
Explaining the Data Pipeline
Explain the Dependencies
Using Source Code Version Control
Modularizing the Process
Gathering Data in a Reproducible Way
Functionalities in Markdown and Code Cells
Explaining the Business Problem in the Markdown
Providing a Detailed Introduction to the Data Source
Explain the Data Attributes in the Markdown
Exercise 45: Performing Data Reproducibility
Code Practices and Standards
Environment Documentation
Writing Readable Code with Comments
Effective Segmentation of Workflows
Workflow Documentation
Exercise 46: Missing Value Preprocessing with High Reproducibility
Avoiding Repetition
Using Functions and Loops for Optimizing Code
Developing Libraries/Packages for Code/Algorithm Reuse
Activity 14: Carry normalisation of data
Summary
Chapter 8
Creating a Full Analysis Report
Introduction
Reading Data in Spark from Different Data Sources
Exercise 47: Reading Data from a CSV File Using the PySpark Object
Reading JSON Data Using the PySpark Object
SQL Operations on a Spark DataFrame
Exercise 48: Reading Data in PySpark and Carrying Out SQL Operations
Exercise 49: Creating and Merging Two DataFrames
Exercise 50: Subsetting the DataFrame
Generating Statistical Measurements
Activity 15: Generating Visualization Using Plotly
Summary
Appendix
Chapter 01: The Python Data Science Stack
Activity 1: IPython and Jupyter
Activity 2: Working with Data Problems
Activity 3: Plotting Data with Pandas
Chapter 02: Statistical Visualizations Using Matplotlib and Seaborn
Activity 4: Line Graphs with the Object-Oriented API and Pandas DataFrames
Activity 5: Understanding Relationships of Variables Using Scatter Plots
Activity 6: Exporting a Graph to a File on Disk
Activity 7: Complete Plot Design
Chapter 03: Working with Big Data Frameworks
Activity 8: Parsing Text
Chapter 04: Diving Deeper with Spark
Activity 9: Getting Started with Spark DataFrames
Activity 10: Data Manipulation with Spark DataFrames
Activity 11: Graphs in Spark
Chapter 05: Missing Value Handling and Correlation Analysis in Spark
Activity 12: Missing Value Handling and Correlation Analysis with PySpark DataFrames
Chapter 6: Business Process Definition and Exploratory Data Analysis
Activity 13: Carry Out Mapping to Gaussian Distribution of Numeric Features from the Given Data
Chapter 07: Reproducibility in Big Data Analysis
Activity 14: Test normality of data attributes (columns) and carry out Gaussian normalization of non-normally distributed attributes
Chapter 08: Creating a Full Analysis Report
Activity 15: Generating Visualization Using Plotly
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜