万本电子书0元读

万本电子书0元读

顶部广告

Getting Started with Beautiful Soup电子书

售       价:¥

2人正在读 | 0人评论 9.8

作       者:Vineeth G. Nair

出  版  社:Packt Publishing

出版时间:2014-01-24

字       数:118.0万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This book is a practical, handson guide that takes you through the techniques of web scraping using Beautiful Soup. Getting Started with Beautiful Soup is great for anybody who is interested in website scraping and extracting information. However, a basic knowledge of Python, HTML tags, and CSS is required for better understanding.
目录展开

Getting Started with Beautiful Soup

Table of Contents

Getting Started with Beautiful Soup

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Installing Beautiful Soup

Installing Beautiful Soup

Installing Beautiful Soup in Linux

Installing Beautiful Soup using package manager

Installing Beautiful Soup using pip or easy_install

Installing Beautiful Soup using pip

Installing Beautiful Soup using easy_install

Installing Beautiful Soup in Windows

Verifying Python path in Windows

Installing Beautiful Soup using setup.py

Using Beautiful Soup without installation

Verifying the installation

Quick reference

Summary

2. Creating a BeautifulSoup Object

Creating a BeautifulSoup object

Creating a BeautifulSoup object from a string

Creating a BeautifulSoup object from a file-like object

Creating a BeautifulSoup object for XML parsing

Understanding the features argument

Tag

Accessing the Tag object from BeautifulSoup

Name of the Tag object

Attributes of a Tag object

The NavigableString object

Quick reference

Summary

3. Search Using Beautiful Soup

Searching in Beautiful Soup

Searching with find()

Finding the first producer

Explaining find()

Searching for tags

Searching for text

Searching based on regular expressions

Searching based on attribute values of a tag

Finding the first primary consumer

Searching based on custom attributes

Searching based on the CSS class

Searching using functions defined

Applying searching methods in combination

Searching with find_all()

Finding all tertiary consumers

Understanding parameters used with find_all()

Searching for Tags in relation

Searching for the parent tags

Searching for siblings

Searching for next

Searching for previous

Using search methods to scrape information from a web page

Quick reference

Summary

4. Navigation Using Beautiful Soup

Navigation using Beautiful Soup

Navigating down

Using the name of the child tag

Using predefined attributes

The .contents attribute

The .children attribute

The .descendants attribute

Special attributes for navigating down

The .string attribute

The .strings attribute

Navigating up

The .parent attribute

The .parents attribute

Navigating sideways to the siblings

The .next_sibling attribute

The .previous_sibling attribute

Navigating to the previous and next objects parsed

Quick reference

Summary

5. Modifying Content Using Beautiful Soup

Modifying Tag using Beautiful Soup

Modifying the name property of Tag

Modifying the attribute values of Tag

Updating the existing attribute value of Tag

Adding new attribute values to Tag

Deleting the tag attributes

Adding a new tag

Adding a new producer using new_tag() and append()

Creating a new tag using new_tag()

Adding a new tag using append()

Adding a new div tag to the li tag using insert()

Modifying string contents

Using .string to modify the string content

Adding strings using .append(), insert(), and new_string()

Deleting tags from the HTML document

Deleting the producer using decompose()

Deleting the producer using extract()

Deleting the contents of a tag using Beautiful Soup

Special functions to modify content

Quick reference

Summary

6. Encoding Support in Beautiful Soup

Encoding in Beautiful Soup

Understanding the original encoding of the HTML document

Specifying the encoding of the HTML document

Output encoding

Quick reference

Summary

7. Output in Beautiful Soup

Formatted printing

Unformatted printing

Output formatters in Beautiful Soup

The minimal formatter

The html formatter

The None formatter

The function formatter

Using get_text()

Quick reference

Summary

8. Creating a Web Scraper

Getting book details from PacktPub.com

Finding pages with a list of books

Finding book details

Getting selling prices from Amazon

Getting the selling price from Barnes and Noble

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部