INST 326 Module 13: Web Scraping

This repository has been deprecated, but is being kept online to preserve course links.

For the latest content please see the repository at:
https://umd-ischool-inst326.github.io/inst326/

Overview

This module introduces the core concepts of web scraping, i.e. extracting data from unstructured or semi-structured data sources online. We learn to use the BeautifulSoup module and experiment with a number of examples

Lecture Videos

Video Lecture: Web Scraping with Python
Video Lecture: Hands on HTML Analysis for Web Scraping
Video Lecture: Coding Python for Web Scraping

Exercise

UFO Data Exercise

Learning Outcomes

After completing this module, students should be able to:

Analyze HTML pages to identify repeating structural elements that support web scraping
Use the BeautifulSoup module
Extract data from web pages with web scraping techniques

Additional Readings

How to Web Scrape with Python in 4 Minutes
BeautifulSoup Documentation