万本电子书0元读

万本电子书0元读

顶部广告

Practical Data Wrangling电子书

售       价:¥

0人正在读 | 0人评论 9.8

作       者:Allan Visochek

出  版  社:Packt Publishing

出版时间:2017-11-15

字       数:22.1万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Turn your noisy data into relevant, insight-ready information by leveraging the data wrangling techniques in Python and RAbout This Book·This easy-to-follow guide takes you through every step of the data wrangling process in the best possible way·Work with different types of datasets, and reshape the layout of your data to make it easier for analysis·Get simple examples and real-life data wrangling solutions for data pre-processingWho This Book Is ForIf you are a data scientist, data analyst, or a statistician who wants to learn how to wrangle your data for analysis in the best possible manner, this book is for you. As this book covers both R and Python, some understanding of them will be beneficial.What You Will Learn·Read a csv file into python and R, and print out some statistics on the data·Gain knowledge of the data formats and programming structures involved in retrieving API data·Make effective use of regular expressions in the data wrangling process·Explore the tools and packages available to prepare numerical data for analysis·Find out how to have better control over manipulating the structure of the data·Create a dexterity to programmatically read, audit, correct, and shape data·Write and complete programs to take in, format, and output data setsIn DetailAround 80% of time in data analysis is spent on cleaning and preparing data for analysis. This is, however, an important task, and is a prerequisite to the rest of the data analysis workflow, including visualization, analysis and reporting. Python and R are considered a popular choice of tool for data analysis, and have packages that can be best used to manipulate different kinds of data, as per your requirements. This book will show you the different data wrangling techniques, and how you can leverage the power of Python and R packages to implement them.You'll start by understanding the data wrangling process and get a solid foundation to work with different types of data. You'll work with different data structures and acquire and parse data from various locations. You'll also see how to reshape the layout of data and manipulate, summarize, and join data sets. Finally, we conclude with a quick primer on accessing and processing data from databases, conducting data exploration, and storing and retrieving data quickly using databases.The book includes practical examples on each of these points using simple and real-world data sets to give you an easier understanding. By the end of the book, you'll have a thorough understanding of all the data wrangling concepts and how to implement them in the best possible way.Style and approachThis is a practical book on data wrangling designed to give you an insight into the practical application of data wrangling. It takes you through complex concepts and tasks in an accessible way, featuring information on a wide range of data wrangling techniques with Python and R
目录展开

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Programming with Data

Understanding data wrangling

Getting and reading data

Cleaning data

Shaping and structuring data

Storing data

The tools for data wrangling

Python

R

Summary

Introduction to Programming in Python

External resources

Logistical overview

Installation requirements

Using other learning resources

Python 2 versus Python 3

Running programs in python

Using text editors to write and manage programs

Writing the hello world program

Using the terminal to run programs

Running the Hello World program

What if it didn't work?

Data types, variables, and the Python shell

Numbers - integers and floats

Why integers?

Strings

Booleans

The print function

Variables

Adding to a variable

Subtracting from a variable

Multiplication

Division

Naming variables

Arrays (lists, if you ask Python)

Dictionaries

Compound statements

Compound statement syntax and indentation level

For statements and iterables

If statements

Else and elif clauses

Functions

Passing arguments to a function

Returning values from a function

Making annotations within programs

A programmer's resources

Documentation

Online forums and mailing lists

Summary

Reading, Exploring, and Modifying Data - Part I

External resources

Logistical overview

Installation requirements

Data

File system setup

Introducing a basic data wrangling work flow

Introducing the JSON file format

Opening and closing a file in Python using file I/O

The open function and file objects

File structure - best practices to store your data

Opening a file

Reading the contents of a file

Modules in Python

Parsing a JSON file using the json module

Exploring the contents of a data file

Extracting the core content of the data

Listing out all of the variables in the data

Modifying a dataset

Extracting data variables from the original dataset

Using a for loop to iterate over the data

Using a nested for loop to iterate over the data variables

Outputting the modified data to a new file

Specifying input and output file names in the Terminal

Specifying the filenames from the Terminal

Summary

Reading, Exploring, and Modifying Data - Part II

Logistical overview

File system setup

Data

Installing pandas

Understanding the CSV format

Introducing the CSV module

Using the CSV module to read CSV data

Using the CSV module to write CSV data

Using the pandas module to read and process data

Counting the total road length in 2011 revisited

Handling non-standard CSV encoding and dialect

Understanding XML

XML versus JSON

Using the XML module to parse XML data

XPath

Summary

Manipulating Text Data - An Introduction to Regular Expressions

Logistical overview

Data

File structure setup

Understanding the need for pattern recognition

Introducting regular expressions

Writing and using a regular expression

Special characters

Matching whitespace

Matching the start of string

Matching the end of a string

Matching a range of characters

Matching any one of several patterns

Matching a sequence instead of just one character

Putting patterns together

Extracting a pattern from a string

The regex split() function

Python regex documentation

Looking for patterns

Quantifying the existence of patterns

Creating a regular expression to match the street address

Counting the number of matches

Verifying the correctness of the matches

Extracting patterns

Outputting the data to a new file

Summary

Cleaning Numerical Data - An Introduction to R and RStudio

Logistical overview

Data

Directory structure

Installing R and RStudio

Introducing R and RStudio

Familiarizing yourself with RStudio

Running R commands

Setting the working directory

Reading data

The R dataframe

R vectors

Indexing R dataframes

Finding the 2011 total in R

Conducting basic outlier detection and removal

Handling NA values

Deleting missing values

Replacing missing values with a constant

Imputation of missing values

Variable names and contents

Summary

Simplifying Data Manipulation with dplyr

Logistical overview

Data

File system setup

Installing the dplyr and tibble packages

Introducing dplyr

Getting started with dplyr

Chaining operations together

Filtering the rows of a dataframe

Summarizing data by category

Rewriting code using dplyr

Summary

Getting Data from the Web

Logistical overview

Filesystem setup

Installing the requests module

Internet connection

Introducing APIs

Using Python to retrieve data from APIs

Using URL parameters to filter the results

Summary

Working with Large Datasets

Logistical overview

System requirements

Data

File system setup

Installing MongoDB

Planning out your time

Cleaning up

Understanding computer memory

Understanding databases

Introducing MongoDB

Interfacing with MongoDB from Python

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部