售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Preface
About the Book
About the Authors
Learning Objectives
Approach
Audience
Minimum Hardware Requirements
Software Requirements
Conventions
Installation and Setup
Installing the Code Bundle
Additional Resources
Chapter 1
Introduction to Data Wrangling with Python
Introduction
Importance of Data Wrangling
Python for Data Wrangling
Lists, Sets, Strings, Tuples, and Dictionaries
Lists
Exercise 1: Accessing the List Members
Exercise 2: Generating a List
Exercise 3: Iterating over a List and Checking Membership
Exercise 4: Sorting a List
Exercise 5: Generating a Random List
Activity 1: Handling Lists
Sets
Introduction to Sets
Union and Intersection of Sets
Creating Null Sets
Dictionary
Exercise 6: Accessing and Setting Values in a Dictionary
Exercise 7: Iterating Over a Dictionary
Exercise 8: Revisiting the Unique Valued List Problem
Exercise 9: Deleting Value from Dict
Exercise 10: Dictionary Comprehension
Tuples
Creating a Tuple with Different Cardinalities
Unpacking a Tuple
Exercise 11: Handling Tuples
Strings
Exercise 12: Accessing Strings
Exercise 13: String Slices
String Functions
Exercise 14: Split and Join
Activity 2: Analyze a Multiline String and Generate the Unique Word Count
Summary
Chapter 2
Advanced Data Structures and File Handling
Introduction
Advanced Data Structures
Iterator
Exercise 15: Introduction to the Iterator
Stacks
Exercise 16: Implementing a Stack in Python
Exercise 17: Implementing a Stack Using User-Defined Methods
Exercise 18: Lambda Expression
Exercise 19: Lambda Expression for Sorting
Exercise 20: Multi-Element Membership Checking
Queue
Exercise 21: Implementing a Queue in Python
Activity 3: Permutation, Iterator, Lambda, List
Basic File Operations in Python
Exercise 22: File Operations
File Handling
Exercise 23: Opening and Closing a File
The with Statement
Opening a File Using the with Statement
Exercise 24: Reading a File Line by Line
Exercise 25: Write to a File
Activity 4: Design Your Own CSV Parser
Summary
Chapter 3
Introduction to NumPy, Pandas,and Matplotlib
Introduction
NumPy Arrays
NumPy Array and Features
Exercise 26: Creating a NumPy Array (from a List)
Exercise 27: Adding Two NumPy Arrays
Exercise 28: Mathematical Operations on NumPy Arrays
Exercise 29: Advanced Mathematical Operations on NumPy Arrays
Exercise 30: Generating Arrays Using arange and linspace
Exercise 31: Creating Multi-Dimensional Arrays
Exercise 32: The Dimension, Shape, Size, and Data Type of the Two-dimensional Array
Exercise 33: Zeros, Ones, Random, Identity Matrices, and Vectors
Exercise 34: Reshaping, Ravel, Min, Max, and Sorting
Exercise 35: Indexing and Slicing
Conditional Subsetting
Exercise 36: Array Operations (array-array, array-scalar, and universal functions)
Stacking Arrays
Pandas DataFrames
Exercise 37: Creating a Pandas Series
Exercise 38: Pandas Series and Data Handling
Exercise 39: Creating Pandas DataFrames
Exercise 40: Viewing a DataFrame Partially
Indexing and Slicing Columns
Indexing and Slicing Rows
Exercise 41: Creating and Deleting a New Column or Row
Statistics and Visualization with NumPy and Pandas
Refresher of Basic Descriptive Statistics (and the Matplotlib Library for Visualization)
Exercise 42: Introduction to Matplotlib Through a Scatter Plot
Definition of Statistical Measures – Central Tendency and Spread
Random Variables and Probability Distribution
What Is a Probability Distribution?
Discrete Distributions
Continuous Distributions
Data Wrangling in Statistics and Visualization
Using NumPy and Pandas to Calculate Basic Descriptive Statistics on the DataFrame
Random Number Generation Using NumPy
Exercise 43: Generating Random Numbers from a Uniform Distribution
Exercise 44: Generating Random Numbers from a Binomial Distribution and Bar Plot
Exercise 45: Generating Random Numbers from Normal Distribution and Histograms
Exercise 46: Calculation of Descriptive Statistics from a DataFrame
Exercise 47: Built-in Plotting Utilities
Activity 5: Generating Statistics from a CSV File
Summary
Chapter 4
A Deep Dive into Data Wrangling with Python
Introduction
Subsetting, Filtering, and Grouping
Exercise 48: Loading and Examining a Superstore's Sales Data from an Excel File
Subsetting the DataFrame
An Example Use Case: Determining Statistics on Sales and Profit
Exercise 49: The unique Function
Conditional Selection and Boolean Filtering
Exercise 50: Setting and Resetting the Index
Exercise 51: The GroupBy Method
Detecting Outliers and Handling Missing Values
Missing Values in Pandas
Exercise 52: Filling in the Missing Values with fillna
Exercise 53: Dropping Missing Values with dropna
Outlier Detection Using a Simple Statistical Test
Concatenating, Merging, and Joining
Exercise 54: Concatenation
Exercise 55: Merging by a Common Key
Exercise 56: The join Method
Useful Methods of Pandas
Exercise 57: Randomized Sampling
The value_counts Method
Pivot Table Functionality
Exercise 58: Sorting by Column Values – the sort_values Method
Exercise 59: Flexibility for User-Defined Functions with the apply Method
Activity 6: Working with the Adult Income Dataset (UCI)
Summary
Chapter 5
Getting Comfortable with Different Kinds of Data Sources
Introduction
Reading Data from Different Text-Based (and Non-Text-Based) Sources
Data Files Provided with This Chapter
Libraries to Install for This Chapter
Exercise 60: Reading Data from a CSV File Where Headers Are Missing
Exercise 61: Reading from a CSV File where Delimiters are not Commas
Exercise 62: Bypassing the Headers of a CSV File
Exercise 63: Skipping Initial Rows and Footers when Reading a CSV File
Reading Only the First N Rows (Especially Useful for Large Files)
Exercise 64: Combining Skiprows and Nrows to Read Data in Small Chunks
Setting the skip_blank_lines Option
Read CSV from a Zip file
Reading from an Excel File Using sheet_name and Handling a Distinct sheet_name
Exercise 65: Reading a General Delimited Text File
Reading HTML Tables Directly from a URL
Exercise 66: Further Wrangling to Get the Desired Data
Exercise 67: Reading from a JSON File
Reading a Stata File
Exercise 68: Reading Tabular Data from a PDF File
Introduction to Beautiful Soup 4 and Web Page Parsing
Structure of HTML
Exercise 69: Reading an HTML file and Extracting its Contents Using BeautifulSoup
Exercise 70: DataFrames and BeautifulSoup
Exercise 71: Exporting a DataFrame as an Excel File
Exercise 72: Stacking URLs from a Document using bs4
Activity 7: Reading Tabular Data from a Web Page and Creating DataFrames
Summary
Chapter 6
Learning the Hidden Secrets of Data Wrangling
Introduction
Additional Software Required for This Section
Advanced List Comprehension and the zip Function
Introduction to Generator Expressions
Exercise 73: Generator Expressions
Exercise 74: One-Liner Generator Expression
Exercise 75: Extracting a List with Single Words
Exercise 76: The zip Function
Exercise 77: Handling Messy Data
Data Formatting
The % operator
Using the format Function
Exercise 78: Data Representation Using {}
Identify and Clean Outliers
Exercise 79: Outliers in Numerical Data
Z-score
Exercise 80: The Z-Score Value to Remove Outliers
Exercise 81: Fuzzy Matching of Strings
Activity 8: Handling Outliers and Missing Data
Summary
Chapter 7
Advanced Web Scraping and Data Gathering
Introduction
The Basics of Web Scraping and the Beautiful Soup Library
Libraries in Python
Exercise 81: Using the Requests Library to Get a Response from the Wikipedia Home Page
Exercise 82: Checking the Status of the Web Request
Checking the Encoding of the Web Page
Exercise 83: Creating a Function to Decode the Contents of the Response and Check its Length
Exercise 84: Extracting Human-Readable Text From a BeautifulSoup Object
Extracting Text from a Section
Extracting Important Historical Events that Happened on Today's Date
Exercise 85: Using Advanced BS4 Techniques to Extract Relevant Text
Exercise 86: Creating a Compact Function to Extract the "On this Day" Text from the Wikipedia Home Page
Reading Data from XML
Exercise 87: Creating an XML File and Reading XML Element Objects
Exercise 88: Finding Various Elements of Data within a Tree (Element)
Reading from a Local XML File into an ElementTree Object
Exercise 89: Traversing the Tree, Finding the Root, and Exploring all Child Nodes and their Tags and Attributes
Exercise 90: Using the text Method to Extract Meaningful Data
Extracting and Printing the GDP/Per Capita Information Using a Loop
Exercise 91: Finding All the Neighboring Countries for each Country and Printing Them
Exercise 92: A Simple Demo of Using XML Data Obtained by Web Scraping
Reading Data from an API
Defining the Base URL (or API Endpoint)
Exercise 93: Defining and Testing a Function to Pull Country Data from an API
Using the Built-In JSON Library to Read and Examine Data
Printing All the Data Elements
Using a Function that Extracts a DataFrame Containing Key Information
Exercise 94: Testing the Function by Building a Small Database of Countries' Information
Fundamentals of Regular Expressions (RegEx)
Regex in the Context of Web Scraping
Exercise 95: Using the match Method to Check Whether a Pattern matches a String/Sequence
Using the Compile Method to Create a Regex Program
Exercise 96: Compiling Programs to Match Objects
Exercise 97: Using Additional Parameters in Match to Check for Positional Matching
Finding the Number of Words in a List That End with "ing"
Exercise 98: The search Method in Regex
Exercise 99: Using the span Method of the Match Object to Locate the Position of the Matched Pattern
Exercise 100: Examples of Single Character Pattern Matching with search
Exercise 101: Examples of Pattern Matching at the Start or End of a String
Exercise 102: Examples of Pattern Matching with Multiple Characters
Exercise 103: Greedy versus Non-Greedy Matching
Exercise 104: Controlling Repetitions to Match
Exercise 105: Sets of Matching Characters
Exercise 106: The use of OR in Regex using the OR Operator
The findall Method
Activity 9: Extracting the Top 100 eBooks from Gutenberg
Activity 10: Building Your Own Movie Database by Reading an API
Summary
Chapter 8
RDBMS and SQL
Introduction
Refresher of RDBMS and SQL
How is an RDBMS Structured?
SQL
Using an RDBMS (MySQL/PostgreSQL/SQLite)
Exercise 107: Connecting to Database in SQLite
Exercise 108: DDL and DML Commands in SQLite
Reading Data from a Database in SQLite
Exercise 109: Sorting Values that are Present in the Database
Exercise 110: Altering the Structure of a Table and Updating the New Fields
Exercise 111: Grouping Values in Tables
Relation Mapping in Databases
Adding Rows in the comments Table
Joins
Retrieving Specific Columns from a JOIN query
Exercise 112: Deleting Rows
Updating Specific Values in a Table
Exercise 113: RDBMS and DataFrames
Activity 11: Retrieving Data Correctly From Databases
Summary
Chapter 9
Application of Data Wrangling in Real Life
Introduction
Applying Your Knowledge to a Real-life Data Wrangling Task
Activity 12: Data Wrangling Task – Fixing UN Data
Activity 13: Data Wrangling Task – Cleaning GDP Data
Activity 14: Data Wrangling Task – Merging UN Data and GDP Data
Activity 15: Data Wrangling Task – Connecting the New Data to the Database
An Extension to Data Wrangling
Additional Skills Required to Become a Data Scientist
Basic Familiarity with Big Data and Cloud Technologies
What Goes with Data Wrangling?
Tips and Tricks for Mastering Machine Learning
Summary
Appendix
Solution of Activity 1: Handling Lists
Solution of Activity 2: Analyze a Multiline String and Generate the Unique Word Count
Solution of Activity 3: Permutation, Iterator, Lambda, List
Solution of Activity 4: Design Your Own CSV Parser
Solution of Activity 5: Generating Statistics from a CSV File
Solution of Activity 6: Working with the Adult Income Dataset (UCI)
Solution of Activity 7: Reading Tabular Data from a Web Page and Creating DataFrames
Solution of Activity 8: Handling Outliers and Missing Data
Solution of Activity 9: Extracting the Top 100 eBooks from Gutenberg
Solution of Activity 10: Extracting the top 100 eBooks from Gutenberg.org
Solution of Activity 11: Retrieving Data Correctly from Databases
Solution of Activity 12: Data Wrangling Task – Fixing UN Data
Activity 13: Data Wrangling Task – Cleaning GDP Data
Solution of Activity 14: Data Wrangling Task – Merging UN Data and GDP Data
Activity 15: Data Wrangling Task – Connecting the New Data to a Database
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜