Overview

The purpose of this module is to introduce students to performing data analysis using Python. Currently there are many different modules for working with data in Python. We are going to examine some of the most widely used open source modules. Pandas, seaborn, scipy and statsmodels. All of these modules are used to perform different tasks, from data wrangling to creating linear regression models to visualizing data trends. This module aims to introduce these concepts and contains examples of how we can apply this functionality with real world data.

Learning Outcomes

After completing this module, students should:

  • Have an understanding of what the pandas, seaborn, scipy and statsmodels modules are used for

  • Have an understanding of how to read the documentation for each of these modules

  • know how to create pandas dataframes and perform data manipulation including:

    • creating columns based on values found in other columns

    • inspecting and analyzing dataframes

    • subsetting dataframes

  • know how to create data visualizations using the seaborn module

  • know how to apply scipy in order to perform statistical analysis and tests including:

    • Correlation

    • Z tests

    • T tests

    • ANOVA tests

  • know how to use the statsmodels module in order to create linear regression models

Supplementary Files

Readings

Additional Readings (optional, as needed)