Container Data Types

Part 2

Focus of this module

  • Sorting

  • Comprehensions and generators

Expressions

  • Expressions are chunks of code that evaluate to a value

  • Literal values, variables, and function calls are examples of simple expressions

  • More complex expressions can be created with the help of operators

    • (+, -, >, <, ==, in, is, [], etc.)

Most Python statements contain expressions

  • varname = expression

  • return expression

  • if condition:

  • function(expression[, expression…​])

    • Yes, even the function is an expression

How Python compares things (<, >)

  • Numbers are compared as you’d expect

  • True > False

  • Ordered sequences (strings, lists, tuples) are compared one position at a time

    • (1, 1000, 7) < (2, 3, 4)

    • (1, 2, 6) < (1, 3, 0)

    • (3, 17, 7) < (3, 17, 7, -2)

How Python compares strings (<, >)

  • String comparison is based on Unicode code points

Python functions are first-class objects

def reverse_string(s):
    """ Reverses the order of characters in a string """
    return s[::-1]
do_stuff = reverse_string
print(do_stuff("drawer"))
def use_another_func(fnc, value_list):
    """ Applies fnc to each item in value_list and returns the new list """
    new_list = []
    for v in value_list:
        new_list.append(fnc(v))
    return new_list

Summary

  • Expressions are chunks of code that evaluate to a value

  • Python compares sequences one index at a time until it finds a difference

  • Functions in Python are first-class objects

Sorting

  • list.sort()

    • sorts lists in-place

  • sorted(collection)

    • sorts any collection of comparable values and returns a new list

>>> names = ["Reggie", "Susan", "Michael", "Gabby"]
>>> names.sort()
>>> print(names)
['Gabby', 'Michael', 'Reggie', 'Susan']
>>> names = ["Reggie", "Susan", "Michael", "Gabby"]
>>> names2 = sorted(names)
>>> print(names)
['Reggie', 'Susan', 'Michael', 'Gabby']
>>> print(names2)
['Gabby', 'Michael', 'Reggie', 'Susan']
>>> wages = {"Reggie" :27,
...          "Susan"  :21,
...          "Michael":19,
...          "Gabby"  :32}
>>> names = sorted(wages)
>>> print(names)
["Gabby", "Michael", "Reggie", "Susan"]

Sorting in descending order

  • Both sort() and sorted() have optional keyword argument reverse (default is False)

>>> ordered_scores = sorted(scores, reverse=True)
>>> print(ordered_scores)
[22, 20, 18, 16, 14, 12, 8]

Custom sort criteria

  • Both sort() and sorted() have optional keyword argument key, which takes a function as a value

    • Remember, functions are first-class objects in Python

  • The function must

    • take one argument

    • return a value

  • Python sorts based on the output of the function

    • but the actual items in the list don’t change

Custom sort criteria: examples

  • Sort a list of strings based on their length

>>> words = ["class", "environment", "function", "call"]
>>> words.sort(key=len)
>>> print(words)
['call', 'class', 'function', 'environment']

Custom sort criteria: examples

  • Sort a list of numbers based on their absolute value

>>> results = [5, -2, 3, 1, -4]
>>> results.sort(key=abs)
>>> print(results)
[1, -2, 3, -4, 5]

Custom sort criteria: examples

  • Sort keys in a dictionary by their associated values in the dictionary

>>> wages = {"Reggie" :27,
...          "Susan"  :21,
...          "Michael":19,
...          "Gabby"  :32}
>>> names = sorted(wages, key=wages.get)
>>> print(names)
['Michael', 'Susan', 'Reggie', 'Gabby']

Summary

  • Two ways to sort things in Python

    • list.sort() (lists only, sorts in-place)

    • sorted(collection) (any collection, creates new list object)

  • Sorting is based on < operation

  • To reverse the order of a sort, add reverse=True

  • To change the sort criteria, you can specify a key function

    • Function should take one argument and return a value

Writing our own key function

  • The function should take exactly one argument and should return a value

  • We’ll pass the name of the function to the key argument

  • Example: sort strings as if the strings were reversed

    • (causes similar endings to group together)

def reverse_string(s):
    """ Reverses the order of characters in a string """
    return s[::-1]
>>> words = ["string", "strings", "stringing", "float", "floats", "floating"]
>>> words_by_ending = sorted(words, key=reverse_string)
>>> print(words_by_ending)
['stringing', 'string', 'floating', 'strings', 'floats', 'float']

Anonymous functions

  • An anonymous function is a short, simple function that can take arguments (or not) and return the result of a single expression

    • For now, we’ll only consider anonymous functions that take one argument

  • We create anonymous functions using lambda expressions

    • A lambda expression evaluates to a function object

lambda param: expr

# lambda equivalent of the reverse_string() method from the previous slide
lambda s: s[::-1]

Anonymous functions and sorting

  • Sort strings as if they were reversed

lambda s: s[::-1]
>>> words = ["string", "strings", "stringing", "float", "floats", "floating"]
>>> words_by_ending = sorted(words, key=lambda s: s[::-1])
>>> print(words_by_ending)
['stringing', 'string', 'floating', 'strings', 'floats', 'float']

Anonymous functions and sorting

  • Sort a list of tuples of the form (name, location) based on location

lambda t: t[1]
>>> employees = [("Trevon", "Omaha"), ("Beverly", "Dallas"), ("Dustin", "Omaha")]
>>> for name, location in sorted(employees, key=lambda t: t[1]):
...     print(f"{name} is based in {location}")
Beverly is based in Dallas
Trevon is based in Omaha
Dustin is based in Omaha

Anonymous functions and sorting

  • Imagine a contest where people guess the number of jelly beans in a jar

    • Spoiler alert: it’s 3842

  • You have a dictionary guesses of contestants and their guesses

  • You want a list of contestants based on how close the guesses are to the right answer

answer = 3842
lambda name: abs(guesses[name]-answer)
>>> guesses = {"Aliyah": 4000, "Andrew": 3600, "Alexandra": 4250}
>>> results = sorted(guesses, key=lambda name: abs(guesses[name]-answer))
>>> print(results)
['Aliyah', 'Andrew', 'Alexandra']

Multiple sort criteria

  • Have your key function return a tuple

  • First item in tuple relates to most significant criterion, etc.

Example: sort list of names by last name, then first name

>>> students = ["Ron Jones", "Miryung Kim", "Jaehyun Kim", "Derold Jones",
...             "Sandra Jones-Patterson"]
>>> def split_name(name):
...     parts = name.split()
...     return parts[-1], parts[0]
>>> students.sort(key=split_name)
>>> print(students)
['Derold Jones', 'Ron Jones', 'Sandra Jones-Patterson', 'Jaehyun Kim', 'Miryung Kim']

Summary

  • We can write our own key functions for sorting

    • Function should take one argument and return a value

  • One way to write simple key functions is to use anonymous functions

    • Lambda expressions create anonymous functions

  • To use multiple sort criteria, our key function can return a tuple

    • Most significant value comes first

min() and max()

  • Finding a minimum or maximum value is a special case of sorting

  • min() and max() can either take a collection of values, or multiple arguments

>>> scores = [88, 73, 92, 96, 85, 79, 91, 94]
>>> max_score = max(scores)
>>> print(max_score)
96
>>> min_score = min(scores)
>>> print(min_score)
73
# see how many widgets still have to be made to meet our quota;
# if we made extra, then remaining should be 0
remaining = max(0, quota-widgets)

min() and max() can take a key argument

>>> wages = {"Reggie" :27,
...          "Susan"  :21,
...          "Michael":19,
...          "Gabby"  :32}
>>> lowest_paid_employee = min(wages, key=wages.get)
>>> highest_paid_employee = max(wages, key=wages.get)
>>> print(lowest_paid_employee)
Michael
>>> print(highest_paid_employee)
Gabby

Summary

  • min() and max() find the minimum and maximum value of a sequence or list of arguments

    • Special case of sorting

  • key argument to min() and max() lets us specify how to compare values

Comprehensions and generator expressions

  • Common pattern: derive one list from another

\$s=\sqrt{\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{N-1}}\$
sample = [74, 46, 68, 72, 71, 57, 56, 58, 69, 67]
mean = sum(sample) / len(sample)
msd = [] # mean squared difference
for n in sample:
    msd.append((n - mean)**2)
stdev = (sum(msd) / (len(sample) - 1))**0.5
  • This pattern is so common that Python provides shorthand notation for it

Basic list comprehensions

[expr for itervar in iterable]

  • Python iterates once for each item in iterable, setting itervar each time

  • Python evaluates expr, which usually involves itervar in some way

  • Python builds a new list with the results of expr for each item in iterable

>>> squares = [i**2 for i in range(10)]
>>> print(squares)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> squares = []
>>> for i in range(10):
...     squares.append(i**2)
...
>>> print(squares)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Standard deviation using a list comprehension

\$s=\sqrt{\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{N-1}}\$

Assume we already have sample and mean

With for loop (for reference)

msd = []
for n in sample:
    msd.append((n - mean)**2)
stdev = (sum(msd) / (len(sample) - 1))**0.5

With list comprehension

msd = [(n - mean)**2 for n in sample]
stdev = (sum(msd) / (len(sample) - 1))**0.5

More examples

Generate ten random numbers

from random import randint

numbers = [randint(1, 100) for i in range(10)]

Convert strings to integers

string_list = ["10", "14", "12", "7", "11"]
int_list = [int(s) for s in string_list]

Process lines from a file

def process_line(line):
    """ Convert a line consisting of a string, a comma, and an integer
    into a list of a string and an integer """
    values = line.strip().split(",")
    values[1] = int(values[1])
    return values

with open(filename, "r", encoding="utf-8") as f:
    data = [process_line(line) for line in f]

Summary

  • A list comprehension is an expression that takes the place of a for loop

  • List comprehensions evaluate to lists

  • The basic syntax is [expr for itervar in iterable]

Filtering in list comprehensions

  • Sometimes we want to extract list items that satisfy some criterion

  • List comprehensions can contain an if element with a condition

  • Both the expression and the condition can use the iteration variable

  • Keep all items for which the condition evaluates to True

[expr for itervar in iterable if condition]

sample = [74, 46, 68, 72, 71, 57, 56, 58, 69, 67]
above_70 = [s for s in sample if s > 70]

More examples

games = [{"Johns Hopkins": 13, "Drexel"      : 11},
         {"Maryland"     : 19, "George Mason":  6},
         {"Michigan"     : 10, "Jacksonville":  8},
         {"Florida"      : 15, "Maryland"    : 14},
         # etc.
         ]
umd_games = [g for g in games if "Maryland" in g]
reps = [("Ralph Abraham"  , "LA"),
        ("Alma Adams"     , "NC"),
        ("Robert Aderholt", "AL"),
        ("Anthony Brown"  , "MD"),
        # etc.
        ]
md_reps = [rep[0] for rep in reps if rep[1] == "MD"]

Summary

  • You can use a list comprehension to filter values from a list, possibly also applying a transformation to the values

  • The syntax for a list comprehension with filtering is
    [expr for itervar in iterable if condition]

Generator expressions

  • Sometimes we’d rather not store a whole new list in memory

  • Instead of using a list comprehension, we can use a generator expression

    • With a generator, only one new item is in memory at any one time

(expr for itervar in iterable)
(expr for itervar in iterable if condition)

nums = [15, 10, 7, 9, 14, 6]
sum(n**2 for n in nums)

Set comprehensions

  • Similar to list comprehensions, but the result is a set

  • Useful for

    • removing duplicates

    • cases where you plan to do set operations on the result

{expr for itervar in iterable}
{expr for itervar in iterable if condition}

with open("filename", "r", encoding="utf-8") as f:
    hashtags = {line.strip() for line in f if "#" in line}

Dictionary comprehensions

  • Similar to list comprehensions, but the result is a dict

{keyexpr: valueexpr for itervar in iterable}
{keyexpr: valueexpr for itervar in iterable if condition}

items_sold = ["gear", "sprocket", "sprocket", "hub", "gear", "widget"]
sales_count = {item: items_sold.count(item) for item in set(items_sold)}

Summary

  • Comprehensions/generator expressions transform and filter collections of data

    • Often, we can use them in place of a for loop

  • Comprehensions exist for lists, sets, and dictionaries

    • The name of the comprehension refers to the result of the expression

  • Generator expressions conserve memory

    • Only one item at a time is stored in memory