Container Data Types

Part 2 Focus of this module

Sorting
Comprehensions and generators

One of those is sorting, which is the process of taking a collection of values and putting them in order according to one or more criteria. Sorted data is useful for solving a wide range of problems. Python has the most elegant sorting interface of any programming language I’ve ever used. It’s simple and yet very powerful. I’m excited to introduce you to it.

We’re also going to look at a family of expressions that derive sequences from other sequences, which turns out to be a really useful thing to do. We call these comprehensions and generators. These are some of the most powerful features of Python, and if you take the time to wrap your brain around them, they’ll pay big dividends throughout your programming career.

Before we dive into the main focus of our lecture, we’re going to need to go over some background information. In this background information, you’re going to hear things you’ve heard before, but try to focus on understanding these ideas at a deeper level, because you’ll need that understanding in order to tackle the new material we’re introducing today.

Expressions

Expressions are chunks of code that evaluate to a value
Literal values, variables, and function calls are examples of simple expressions
More complex expressions can be created with the help of operators
- (+, -, >, <, ==, in, is, [], etc.)

Most Python statements contain expressions

varname = expression
return expression
if condition:
function(expression[, expression…])
- Yes, even the function is an expression

How Python compares things (`<`, `>`)

Numbers are compared as you’d expect
True > False
Ordered sequences (strings, lists, tuples) are compared one position at a time
- (1, 1000, 7) < (2, 3, 4)
- (1, 2, 6) < (1, 3, 0)
- (3, 17, 7) < (3, 17, 7, -2)

How Python compares strings (`<`, `>`)

String comparison is based on Unicode code points
- Code points are documented at https://www.unicode.org/charts/
- "duck" < "fish"
- "Horse" < "bird"
- "moose" < "älg"
- "cat" < "ко́шка"
- "貓" < "고양이"

Python functions are first-class objects

def reverse_string(s):
    """ Reverses the order of characters in a string """
    return s[::-1]

do_stuff = reverse_string
print(do_stuff("drawer"))

def use_another_func(fnc, value_list):
    """ Applies fnc to each item in value_list and returns the new list """
    new_list = []
    for v in value_list:
        new_list.append(fnc(v))
    return new_list

Summary

Expressions are chunks of code that evaluate to a value
Python compares sequences one index at a time until it finds a difference
Functions in Python are first-class objects

Sorting

list.sort()
- sorts lists in-place

sorted(collection)
- sorts any collection of comparable values and returns a new list

>>> names = ["Reggie", "Susan", "Michael", "Gabby"]
>>> names.sort()

>>> print(names)
['Gabby', 'Michael', 'Reggie', 'Susan']

>>> names = ["Reggie", "Susan", "Michael", "Gabby"]
>>> names2 = sorted(names)

>>> print(names)
['Reggie', 'Susan', 'Michael', 'Gabby']

>>> print(names2)
['Gabby', 'Michael', 'Reggie', 'Susan']

>>> wages = {"Reggie" :27,
...          "Susan"  :21,
...          "Michael":19,
...          "Gabby"  :32}
>>> names = sorted(wages)

>>> print(names)
["Gabby", "Michael", "Reggie", "Susan"]

Sorting in descending order

Both sort() and sorted() have optional keyword argument reverse (default is False)

>>> ordered_scores = sorted(scores, reverse=True)
>>> print(ordered_scores)
[22, 20, 18, 16, 14, 12, 8]

Custom sort criteria

Both sort() and sorted() have optional keyword argument key, which takes a function as a value
- Remember, functions are first-class objects in Python
The function must
- take one argument
- return a value
Python sorts based on the output of the function
- but the actual items in the list don’t change

Custom sort criteria: examples

Sort a list of strings based on their length

>>> words = ["class", "environment", "function", "call"]
>>> words.sort(key=len)

>>> print(words)
['call', 'class', 'function', 'environment']

Custom sort criteria: examples

Sort a list of numbers based on their absolute value

>>> results = [5, -2, 3, 1, -4]
>>> results.sort(key=abs)

>>> print(results)
[1, -2, 3, -4, 5]

Custom sort criteria: examples

Sort keys in a dictionary by their associated values in the dictionary

>>> wages = {"Reggie" :27,
...          "Susan"  :21,
...          "Michael":19,
...          "Gabby"  :32}
>>> names = sorted(wages, key=wages.get)

>>> print(names)
['Michael', 'Susan', 'Reggie', 'Gabby']

Summary

Two ways to sort things in Python
- list.sort() (lists only, sorts in-place)
- sorted(collection) (any collection, creates new list object)
Sorting is based on < operation
To reverse the order of a sort, add reverse=True
To change the sort criteria, you can specify a key function
- Function should take one argument and return a value

Writing our own key function

The function should take exactly one argument and should return a value
We’ll pass the name of the function to the key argument
Example: sort strings as if the strings were reversed
- (causes similar endings to group together)

def reverse_string(s):
    """ Reverses the order of characters in a string """
    return s[::-1]

>>> words = ["string", "strings", "stringing", "float", "floats", "floating"]
>>> words_by_ending = sorted(words, key=reverse_string)

>>> print(words_by_ending)
['stringing', 'string', 'floating', 'strings', 'floats', 'float']

Anonymous functions

An anonymous function is a short, simple function that can take arguments (or not) and return the result of a single expression
- For now, we’ll only consider anonymous functions that take one argument
We create anonymous functions using lambda expressions
- A lambda expression evaluates to a function object

lambda param: expr

# lambda equivalent of the reverse_string() method from the previous slide
lambda s: s[::-1]

Anonymous functions and sorting

Sort strings as if they were reversed

lambda s: s[::-1]

>>> words = ["string", "strings", "stringing", "float", "floats", "floating"]
>>> words_by_ending = sorted(words, key=lambda s: s[::-1])

>>> print(words_by_ending)
['stringing', 'string', 'floating', 'strings', 'floats', 'float']

Anonymous functions and sorting

Sort a list of tuples of the form (name, location) based on location

lambda t: t[1]

>>> employees = [("Trevon", "Omaha"), ("Beverly", "Dallas"), ("Dustin", "Omaha")]
>>> for name, location in sorted(employees, key=lambda t: t[1]):
...     print(f"{name} is based in {location}")

Beverly is based in Dallas
Trevon is based in Omaha
Dustin is based in Omaha

Anonymous functions and sorting

Imagine a contest where people guess the number of jelly beans in a jar
- Spoiler alert: it’s 3842
You have a dictionary guesses of contestants and their guesses
You want a list of contestants based on how close the guesses are to the right answer

answer = 3842
lambda name: abs(guesses[name]-answer)

>>> guesses = {"Aliyah": 4000, "Andrew": 3600, "Alexandra": 4250}
>>> results = sorted(guesses, key=lambda name: abs(guesses[name]-answer))

>>> print(results)
['Aliyah', 'Andrew', 'Alexandra']

Multiple sort criteria

Have your key function return a tuple
First item in tuple relates to most significant criterion, etc.

Example: sort list of names by last name, then first name

>>> students = ["Ron Jones", "Miryung Kim", "Jaehyun Kim", "Derold Jones",
...             "Sandra Jones-Patterson"]

>>> def split_name(name):
...     parts = name.split()
...     return parts[-1], parts[0]

>>> students.sort(key=split_name)
>>> print(students)
['Derold Jones', 'Ron Jones', 'Sandra Jones-Patterson', 'Jaehyun Kim', 'Miryung Kim']

Summary

We can write our own key functions for sorting
- Function should take one argument and return a value
One way to write simple key functions is to use anonymous functions
- Lambda expressions create anonymous functions
To use multiple sort criteria, our key function can return a tuple
- Most significant value comes first

`min()` and `max()`

Finding a minimum or maximum value is a special case of sorting
min() and max() can either take a collection of values, or multiple arguments

>>> scores = [88, 73, 92, 96, 85, 79, 91, 94]
>>> max_score = max(scores)
>>> print(max_score)
96
>>> min_score = min(scores)
>>> print(min_score)
73

# see how many widgets still have to be made to meet our quota;
# if we made extra, then remaining should be 0
remaining = max(0, quota-widgets)

`min()` and `max()` can take a `key` argument

>>> wages = {"Reggie" :27,
...          "Susan"  :21,
...          "Michael":19,
...          "Gabby"  :32}
>>> lowest_paid_employee = min(wages, key=wages.get)
>>> highest_paid_employee = max(wages, key=wages.get)

>>> print(lowest_paid_employee)
Michael

>>> print(highest_paid_employee)
Gabby

Summary

min() and max() find the minimum and maximum value of a sequence or list of arguments
- Special case of sorting
key argument to min() and max() lets us specify how to compare values

Comprehensions and generator expressions

Common pattern: derive one list from another

\$s=\sqrt{\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{N-1}}\$

sample = [74, 46, 68, 72, 71, 57, 56, 58, 69, 67]
mean = sum(sample) / len(sample)

msd = [] # mean squared difference
for n in sample:
    msd.append((n - mean)**2)

stdev = (sum(msd) / (len(sample) - 1))**0.5

This pattern is so common that Python provides shorthand notation for it

Basic list comprehensions

[expr for itervar in iterable]

Python iterates once for each item in iterable, setting itervar each time
Python evaluates expr, which usually involves itervar in some way
Python builds a new list with the results of expr for each item in iterable

>>> squares = [i**2 for i in range(10)]
>>> print(squares)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

>>> squares = []
>>> for i in range(10):
...     squares.append(i**2)
...
>>> print(squares)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Standard deviation using a list comprehension

\$s=\sqrt{\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{N-1}}\$

Assume we already have sample and mean

With for loop (for reference)

msd = []
for n in sample:
    msd.append((n - mean)**2)
stdev = (sum(msd) / (len(sample) - 1))**0.5

With list comprehension

msd = [(n - mean)**2 for n in sample]
stdev = (sum(msd) / (len(sample) - 1))**0.5

More examples

Generate ten random numbers

from random import randint

numbers = [randint(1, 100) for i in range(10)]

Convert strings to integers

string_list = ["10", "14", "12", "7", "11"]
int_list = [int(s) for s in string_list]

Process lines from a file

def process_line(line):
    """ Convert a line consisting of a string, a comma, and an integer
    into a list of a string and an integer """
    values = line.strip().split(",")
    values[1] = int(values[1])
    return values

with open(filename, "r", encoding="utf-8") as f:
    data = [process_line(line) for line in f]

Summary

A list comprehension is an expression that takes the place of a for loop
List comprehensions evaluate to lists
The basic syntax is [expr for itervar in iterable]

Filtering in list comprehensions

Sometimes we want to extract list items that satisfy some criterion
List comprehensions can contain an if element with a condition
Both the expression and the condition can use the iteration variable
Keep all items for which the condition evaluates to True

[expr for itervar in iterable if condition]

sample = [74, 46, 68, 72, 71, 57, 56, 58, 69, 67]
above_70 = [s for s in sample if s > 70]

More examples

games = [{"Johns Hopkins": 13, "Drexel"      : 11},
         {"Maryland"     : 19, "George Mason":  6},
         {"Michigan"     : 10, "Jacksonville":  8},
         {"Florida"      : 15, "Maryland"    : 14},
         # etc.
         ]
umd_games = [g for g in games if "Maryland" in g]

reps = [("Ralph Abraham"  , "LA"),
        ("Alma Adams"     , "NC"),
        ("Robert Aderholt", "AL"),
        ("Anthony Brown"  , "MD"),
        # etc.
        ]
md_reps = [rep[0] for rep in reps if rep[1] == "MD"]

Summary

You can use a list comprehension to filter values from a list, possibly also applying a transformation to the values
The syntax for a list comprehension with filtering is
[expr for itervar in iterable if condition]

Generator expressions

Sometimes we’d rather not store a whole new list in memory
Instead of using a list comprehension, we can use a generator expression
- With a generator, only one new item is in memory at any one time

(expr for itervar in iterable)
(expr for itervar in iterable if condition)

nums = [15, 10, 7, 9, 14, 6]
sum(n**2 for n in nums)

Set comprehensions

Similar to list comprehensions, but the result is a set
Useful for
- removing duplicates
- cases where you plan to do set operations on the result

{expr for itervar in iterable}
{expr for itervar in iterable if condition}

with open("filename", "r", encoding="utf-8") as f:
    hashtags = {line.strip() for line in f if "#" in line}

Dictionary comprehensions

Similar to list comprehensions, but the result is a dict

{keyexpr: valueexpr for itervar in iterable}
{keyexpr: valueexpr for itervar in iterable if condition}

items_sold = ["gear", "sprocket", "sprocket", "hub", "gear", "widget"]
sales_count = {item: items_sold.count(item) for item in set(items_sold)}

Summary

Comprehensions/generator expressions transform and filter collections of data
- Often, we can use them in place of a for loop
Comprehensions exist for lists, sets, and dictionaries
- The name of the comprehension refers to the result of the expression
Generator expressions conserve memory
- Only one item at a time is stored in memory

Container Data Types

Part 2

Focus of this module

Expressions

Most Python statements contain expressions

How Python compares things (<, >)

How Python compares strings (<, >)

Python functions are first-class objects

Summary

Sorting

Sorting in descending order

Custom sort criteria

Custom sort criteria: examples

Custom sort criteria: examples

Custom sort criteria: examples

Summary

Writing our own key function

Anonymous functions

Anonymous functions and sorting

Anonymous functions and sorting

Anonymous functions and sorting

Multiple sort criteria

Summary

min() and max()

min() and max() can take a key argument

Summary

Comprehensions and generator expressions

Basic list comprehensions

Standard deviation using a list comprehension

More examples

Summary

Filtering in list comprehensions

More examples

Summary

Generator expressions

Set comprehensions

Dictionary comprehensions

Summary

How Python compares things (`<`, `>`)

How Python compares strings (`<`, `>`)

`min()` and `max()`

`min()` and `max()` can take a `key` argument