Strings, lists, files

Python fundamentals 2

Strings

  • Strings contain text

  • Strings are a container data type

    • A given string may contain multiple characters (strings of length 1)

  • Strings are ordered sequences

    • The characters in a string have a fixed order

  • Python strings use Unicode

String literals

  • A literal is a fixed value in a program

  • Ways to create a string literal in Python:

    • Single quotes: 'this is a string literal'

    • Double quotes: "this is a string literal"

    • Triple single quotes: '''this is a string literal'''

    • Triple double quotes: """this is a string literal"""

  • Triple-quoted string literals can contain literal line breaks:

"""This string
spans two lines."""

String concatenation

The + operator concatenates strings

>>> "Hello" + " " + "world!"
'Hello world!'

Accessing individual characters

  • Each character in a string has an index

  • The index of the first character is 0

  • We can access individual characters using square brackets [] after the string

    • We call this the index operator

name = "Chelsea"
initial = name[0]
  • An index can be negative

    • -1 means the last character

    • -2 means the next-to-last character, etc.

name = "Chelsea"
last_letter = name[-1]

Accessing ranges of characters (slices)

  • We can also use the index operator to "slice" strings

  • Two flavors of slice:

    • str[start:stop]

    • str[start:stop:step]

  • start is the index of the first character in the slice

  • stop is the index of the character after the last character in the slice

    • (in other words, the slice includes start but excludes stop)

  • step indicates how to get from the start to the stop (default value is 1)

Slice examples

course = "INST 326"
dept = course[0:4]       # evaluates to "INST"
# equivalent:
dept = course[:4]
num = course[5:8]        # evaluates to "326"
# equivalent:
num = course[5:]
# equivalent:
num = course[-3:]
nt3 = course[1:6:2]      # evaluates to "NT3"
backwards = course[::-1] # evaluates to "632 TSNI"
copy = course[:]         # evaluates to "INST 326"
# equivalent:
copy = course[::]

Formatted string literals (f-strings)

  • A way to plug values into string literals

>>> name = "Angela"
>>> job = "pilot"
>>> f"{name} is a {job}"
'Angela is a pilot'
  • F-strings always start with f

  • Expressions in curly braces {} are evaluated and converted to strings

>>> x = 5
>>> y = 4
>>> print(f'{x} + {y} = {x + y}')
5 + 4 = 9

Some important string methods (1 of 3)

string.find(substring)

string.index(substring)

>>> s = "aardvark"
>>> s.find("rdv")
2
>>> s.index("rdv")
2
>>> s.find("xyz")
-1
>>> s.index("xyz")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found

string.startswith(prefix)

string.endswith(suffix)

>>> s = "corner"
>>> s.startswith("corn")
True

Some important string methods (2 of 3)

string.splitlines()

>>> data = """Line 1
... Line 2
... Line 3
... """
>>> data.splitlines()
['Line 1', 'Line 2', 'Line 3']

string.split(sep)

>>> pet_str = "dog/cat/parakeet/gerbil"
>>> pet_str.split("/")
['dog', 'cat', 'parakeet', 'gerbil']

string.join(list_of_strings)

>>> pets = ["dog", "cat", "parakeet", "gerbil"]
>>> "/".join(pets)
'dog/cat/parakeet/gerbil'

Some important string methods (3 of 3)

string.strip([chars])

>>> s = "\thi!\n"
>>> s.strip()
'hi!'
>>> t = "Mississippi"
>>> t.strip("Mip")
'ssiss'

string.lower()

string.upper()

>>> s = "MiXed CaSe StRiNg"
>>> s.lower()
'mixed case string'
>>> s.upper()
'MIXED CASE STRING'

string.replace(old, new)

>>> s = "bassoon"
>>> s.replace("s", "l")
'balloon'

Summary

  • Strings are ordered sequences of Unicode characters

  • String literals are made with quotation marks (single, double, triple)

  • The + operator concatenates strings

  • We can access parts of strings by index or slice

  • F-strings let us plug values into string literals

  • Strings have a lot of useful methods

Lists

  • Lists are a container data type

    • Lists contain objects

  • Lists are ordered sequences

    • The items in a list have a fixed order

  • Items in lists can be accessed by index

    • mylist[0]

  • Slice notation also works for lists

    • mylist[1:3]

  • Unlike strings, lists are mutable

Building lists

  • Empty list:

    • list() or []

  • List with items in it:

    • [item[, item…​]]

  • Convert another container into a list:

    • list(container)

  • Concatenate two lists:

    • list1 + list2

  • Add an item to the end:

    • list.append(item)

  • Add several items to the end:

    • list.extend(otherlist)

Example code: basic list operations

l1 = list()
l1.append("pencil")
l1.extend(["chalk", "marker"])
l2 = ["crayon", "pen"]
l3 = l1 + l2

Some important list methods/techniques (1 of 3)

list.insert(index, item)

>>> l = ["hello"]
>>> l.insert(0, "oh")
>>> print(l)
['oh', 'hello']

list[index] = item

>>> l = ["a", "b", "c"]
>>> l[1] = "x"
>>> print(l)
['a', 'x', 'c']

list[slice] = list2

>>> l = ["one", "two", "three"]
>>> l[1:2] = ["eight", "six", "four"]
>>> print(l)
['one', 'eight', 'six', 'four', 'three']

Some important list methods/techniques (2 of 3)

list.remove(item)

>>> l = [1, 3, 5, 7]
>>> l.remove(3)
>>> print(l)
[1, 5, 7]

del list[index]

>>> l = ["apple", "basket", "chair"]
>>> del l[1]
>>> print(l)
['apple', 'chair']

list.pop(index)

>>> l = ["Jackson", "Rhonda", "Alfred"]
>>> l.pop()
'Alfred'
>>> print(l)
['Jackson', 'Rhonda']

Some important list methods/techniques (3 of 3)

list.index(item)

>>> l = ["red", "blue", "green", "pink"]
>>> l.index("green")
2
>>> l.index("purple")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'purple' is not in list

Mutability

  • Mutable objects can change

  • This isn’t mutability:

x = 5
x += 1
  • But this is:

l = ["alpha", "beta", "gamma"]
l.append("delta")
  • Mutability is useful but dangerous

Aliasing

  • Aliasing occurs when there are multiple references to a single object

>>> people = ["Linda", "Daryl", "Stacey"]
>>> persons = people
>>> persons.append("Amanda")
>>> print(persons)
['Linda', 'Daryl', 'Stacey', 'Amanda']
>>> print(people)
['Linda', 'Daryl', 'Stacey', 'Amanda']
  • Be careful when passing mutable objects to functions

  • If you need two different lists, copy your data:

    • l2 = l1.copy()

    • l2 = l1[:]

Summary

  • Lists are ordered sequences of objects

  • List literals are surrounded by square brackets []

  • Parts of lists can be accessed by index or slice

  • There are several ways to add and remove elements

  • Lists are mutable

  • Watch out for aliasing; avoid it by making a copy of your list

for loops

  • Useful when you need to perform the same operation on a sequence of values

for varname in iterable:
    body

>>> polygon_sides = [5, 3, 4, 6, 2]
>>> perimeter = 0
>>> for side in polygon_sides:
...    perimeter += side
>>> print(f"The perimeter of a polygon with sides {polygon_sides} is {perimeter}")
The perimeter of a polygon with sides [5, 3, 4, 6, 2] is 20

Files

  • Files allow us to write data to persistent storage and read that data back in at a later time

  • All files can be put into one of two categories: binary or text

    • We mostly only care about text files

Encoding

  • Computers don’t understand text, only numbers

  • Today, the standard used to represent text in a computer is called Unicode

  • There are multiple ways to encode Unicode to disk

    • The de facto standard is called UTF-8

Paths

  • Files exist in a file system structured like a tree

    • The root of the tree is "/" on Mac/Unix and a drive name (e.g., "C:/") on Windows

    • Non-terminal nodes are called directories; terminal nodes are called files

  • All files have a path; two kinds of paths

    • Absolute: /home/aric/INST326/myscript.py

    • Relative: myscript.py, INST326/myscript.py

      • Relative paths are relative to the current working directory

  • Python expects forward slashes (/) in paths, even on Windows

File objects

  • All interactions with files in Python are mediated through file objects

  • File objects are created by the open() function

  • File objects follow the principle of least privilege

  • File objects have to be closed when you are done with them

f = open("myfile.txt", "w", encoding="utf-8")

# do things with the file here...

f.close()

Writing to files

Option 1: print() function

f = open("myfile.txt", "w", encoding="utf-8")
print("Hello, world!", file=f)
print("Created by Python", file=f)
f.close()
  • Very similar to how print() works with the console; adds newlines to the end of each line

  • (The console is a special file-like object called sys.stdout)

  • Specify file object as a keyword argument to print()

Writing to files

Option 2: write() method

f = open("myfile.txt", "w", encoding="utf-8")
f.write("Hello, world!\n")
f.write("Created by Python\n")
f.close()
  • write() is a method of the file object whereas print() is a function

  • Doesn’t add newlines to the end of lines; you have to do that yourself

Reading from files

Option 1: Read the whole file into a single string

f = open("myfile.txt", "r", encoding="utf-8")
contents = f.read()
f.close()

Reading from files

Option 2: Read one line

f = open("myfile.txt", "r", encoding="utf-8")
line = f.readline()
f.close()

Reading from files

Option 3: Read all lines, one at a time

  • Uses file object as an iterator

  • Very Pythonic

f = open("myfile.txt", "r", encoding="utf-8")
for line in f:
    # do something with line here...
f.close()

Working with data read from a text file

  • Lines read from a text file usually end in newline characters ("\n")

    • You can remove these with the strip() or rstrip() methods of strings

for line in f:
    line = line.rstrip()
  • Data in text files is often delimited (e.g., value1,value2,value3)

    • You can break up delimited strings with the split() method of strings

line = "name,age,profession"
values = line.split(",")

with statements and context managers

  • with statements let you use context managers

  • File objects can be used as context managers

    • File objects used as context managers will close themselves

with context_expr as varname:
    body

with open("myfile.txt", "r", encoding="utf-8") as f:
    for line in f:
        # do something with line here...
# file closes itself at the
# end of the with statement
  • Use with statements whenever you open files

Two files at once in a with statement

with open("myfile.txt", "r", encoding="utf-8") as f_in, \
    open("myfile2.txt", "w", encoding="utf-8") as f_out:
    for line in f_in:
        if " " in line:
            word1 = line.split(" ")[0]
            f_out.write(word1)

Summary

  • Two kinds of file: text and binary

  • Two kinds of path: absolute and relative

  • Reading from/writing to a file requires a file object

    • open() creates file objects

  • There are several ways to write to and read from a file object

  • When reading text files, strip() and split() come in handy

  • When used with file objects, with statements close files for us

Example script

  • You run a store called Sprocket Emporium

  • Customer orders are stored in files where each line contains

    • part name

    • quantity

    • unit price

Example data:

medium widget,1,5.25
bevel gear,3,10.90
titanium axle,4,43.08
#3 oblong cog,2,0.76
  • Goal: write a function to calculate the total cost of the order

Example script

""" Read orders from a file and calculate the total cost of an order.
Each line in an order file will consist of a product name, a quantity, and a
unit price, separated by commas. """


def process_line(line):
    """ Parse a line from an order file; print information about the line;
    return the cost of this line of the order.
    
    Args:
        line (str): one line from the order file; contains product name,
            quantity ordered, and unit price, separated by commas.
    
    Returns:
        float: the cost of this line (quantity * unit price).
    
    Side effects:
        Writes information about this line to stdout.
    """
    values = line.strip().split(",")
    product = values[0]
    quantity = int(values[1])
    unit_price = float(values[2])
    ending = "" if quantity == 1 else "s"
    line_cost = quantity * unit_price
    print(f"{quantity} {product}{ending} @ ${unit_price}: {line_cost}")
    return line_cost


def total_cost(filepath):
    """ Read an order and calculate the total cost.
    
    Args:
        filepath (str): order file containing one item ordered per line
            (product name, quantity, unit cost).
    
    Side effects:
        Writes the total cost to stdout.
    """
    total = 0
    with open(filepath, "r", encoding="utf-8") as f:
        for line in f:
            total += process_line(line)
    print(f"Total cost: ${total}")


total_cost("order.txt")