>>> import sqlite3
>>> conn = sqlite3.connect(':memory:')
>>> conn
<sqlite3.Connection object at 0x10507b110>
For our work with databases, we’ll be using three tools:
SQLite (database)
sqlite3 (Python module)
DB Browser (GUI application)
lightweight relational database (RDBMS)
runs locally and stores data in files
no separate database server required
already installed on your system (as part of Python)
project website: https://www.sqlite.org
module in the standard library
facilitates interaction with the database by:
managing connections
passing queries to the database system
making results of queries accessible to Python
graphical application (GUI)
allows you to view the contents of a SQLite database
an optional convenience for this course
downloadable from https://sqlitebrowser.org
Invented by E. F. Codd in the 1970s
Data stored in tables (relations)
Relations consist of rows (tuples)
Tuples have columns (attributes) that hold values
The term "database" now often connotes this type of relational database
Database design is a vast topic. In fact it is an entire course (INST 327)! Here are a just a few database systems you may encounter:
Relational:
mySQL/MariaDB
PostgreSQL
Non-relational ("NoSQL"):
Document: MongoDB (JSON), BaseX (XML)
Graph: Neo4j, Fuseki
Through a design process, data are "normalized"
Normalization means applying a set of rules (called "normal forms") to increase efficiency and reduce redundancy
Many specifics of database design and normalization are outside the scope of this course
a standardized language is used to interact with a relational database
this is known as the Structured Query Language (SQL)
SQL has different flavors but generally works the same way across database systems
Basic database operations are called "CRUD":
CREATE: add data to the database
READ: retrive data from one or more tables
UPDATE: make changes to the data
DELETE: remove rows from a table (potentially with a cascade effect)
Import sqlite3
Connect to an in-memory database
>>> import sqlite3
>>> conn = sqlite3.connect(':memory:')
>>> conn
<sqlite3.Connection object at 0x10507b110>
For persistent data, use a database file
>>> conn2 = sqlite3.connect('test.sqlite')
>>> conn2
<sqlite3.Connection object at 0x10507b030>
This object manages the database connection
In addition, we need a cursor to manage state
The cursor sends queries and contains results
conn = sqlite3.connect('biblio.sqlite')
cursor = conn.cursor()
Use a CREATE statement to set up a table
cq = '''CREATE TABLE books (
title TEXT, author TEXT, date INTEGER
)'''
cursor.execute(cq)
Next add a row to the table with INSERT
iq = '''INSERT INTO books VALUES (
'2001: A Space Odyssey',
'Arthur C. Clarke',
'1951'
)'''
cursor.execute(iq)
This works fine, but can be impractical at scale
Larger numbers of records can be created with executemany()
data = [
("I, Robot", "Isaac Asimov", 1950),
("The Martian", "Andy Weir", 2012),
("The Left Hand Of Darkness", "Ursula K. Le Guin", 1969)
]
Map your data tuples onto a query using the ? placeholder
imq = '''INSERT INTO books VALUES (?,?,?)'''
cursor.executemany(imq, data)
To view data in the database, use READ
.fetchall() can be used to get all results as a list
sq = '''SELECT title FROM books'''
books = cursor.execute(sq).fetchall()
print(books)
[('2001: A Space Odyssey',),('I, Robot',),('The Martian',),('The Left Hand Of Darkness',)]
Now, imagine we need to update some data.
The Martian was in fact published in 2011, so let’s make that correction.
uq = '''UPDATE books
SET year=2011
WHERE title="The Martian"'''
cursor.execute(uq)
vq = '''SELECT *
FROM books
WHERE title="The Martian"'''
cursor.execute(vq)
print(cursor.fetchall())
[('The Martian', 'Andy Weir', 2011)]
Now, let’s look at deleting data
This works much as you would expect
dq = '''DELETE
FROM books
WHERE author="Isaac Asimov"'''
cursor.execute(dq)
So far, what we have done has occurred in-memory
To persist, changes must be committed (saved)
In practice, you should commit frequently
conn.commit()
conn.close()
We have just covered a lot of ground
Connections, cursors, queries, CRUD
There is also a lot we have not touched
Next time we will look at normalization
Specifically: primary keys, foreign keys, and joins
There are many reasons related to optimization
But the simplest way to think about it is this:
Consider that one author can write many books
Conversely one book can have many authors
To model such relationships effectively, author data should be stored apart from book data
The first requirement for modeling relationships between tables is to have unambiguous identifiers
These identifiers, called keys, allow data to be looked up
The unique id for a particular row in a table is called a primary key
Rows can also reference rows in other tables — this cross-reference is called a foreign key
For example, the row "Hamlet" in the plays table might reference "William Shakespeare" in the authors table
When designing a database, before doing any coding tables and their relationships should be mapped out
The diagram created during this mapping process is called an ERD
This stands for Entity-Relationship Diagram
In addition to mapping out relationships, you need to create code to analyze the data and write it to the correct locations
For example, in working with our list of books and authors, you might:
Store the authors names in a separate table
As you read the data file, lookup the author
If the author is present already, get the id
If the author is not present, add the author and get the id
Add the book to the books table, referencing the author’s id
title | author | year |
---|---|---|
Things Fall Apart | Chinua Achebe | 1958 |
Chimera | John Barth | 1972 |
The Sot-Weed Factor | John Barth | 1960 |
Under the Volcano | Malcolm Lowery | 1947 |
id | title | author | year |
---|---|---|---|
1 | Things Fall Apart | Chinua Achebe | 1958 |
2 | Chimera | John Barth | 1972 |
3 | The Sot-Weed Factor | John Barth | 1960 |
4 | Under the Volcano | Malcolm Lowry | 1947 |
id | title | author_id | year |
---|---|---|---|
1 | Things Fall Apart | 1 | 1958 |
2 | Chimera | 2 | 1972 |
3 | The Sot-Weed Factor | 2 | 1960 |
4 | Under the Volcano | 3 | 1947 |
id | name |
---|---|
1 | Chinua Achebe |
2 | John Barth |
3 | Malcolm Lowry |
To lookup normalized data, you can use SQL’s JOIN syntax
You specify the fields to match on (linking foreign key to primary key)
jq = '''SELECT authors.name, books.title, books.year
FROM books JOIN authors
ON books.author_id=authors.id'''
books = cursor.execute(join_query, filter).fetchall()
Normalizing data introduces some additional complications
Consider our authors and books examples
If you remove a row from the authors table, what happens to the author’s books?
There is a danger that orphaned rows will clutter the database
In order to control the creation of bad data, SQL allows you to specify constraints in your database schema
Among the constraints is one called CASCADE DELETE
In essence, by specifying this constraint, you would force SQLite to remove books that were written by a deleted author when removing the author