This repository has been deprecated, but is being kept online to preserve course links.

For the latest content please see the repository at:
https://umd-ischool-inst326.github.io/inst326/

Overview

This module introduces the core concepts of web scraping, i.e. extracting data from unstructured or semi-structured data sources online. We learn to use the BeautifulSoup module and experiment with a number of examples

Learning Outcomes

After completing this module, students should be able to:
  • Analyze HTML pages to identify repeating structural elements that support web scraping

  • Use the BeautifulSoup module

  • Extract data from web pages with web scraping techniques

Additional Readings