Relational Databases

Why Normalize?

There are many reasons related to optimization
But the simplest way to think about it is this:
1. Consider that one author can write many books
2. Conversely one book can have many authors
3. To model such relationships effectively, author data should be stored apart from book data

Identifiers (Keys)

The first requirement for modeling relationships between tables is to have unambiguous identifiers
These identifiers, called keys, allow data to be looked up
The unique id for a particular row in a table is called a primary key

Using Keys to Create Joins

Rows can also reference rows in other tables — this cross-reference is called a foreign key
For example, the row "Hamlet" in the plays table might reference "William Shakespeare" in the authors table

Creating Normalized Data

When designing a database, before doing any coding tables and their relationships should be mapped out
The diagram created during this mapping process is called an ERD
This stands for Entity-Relationship Diagram
In addition to mapping out relationships, you need to create code to analyze the data and write it to the correct locations

Creating Normalized Data (continued)

For example, in working with our list of books and authors, you might:
1. Store the authors names in a separate table
2. As you read the data file, lookup the author
3. If the author is present already, get the id
4. If the author is not present, add the author and get the id
5. Add the book to the books table, referencing the author’s id

Flat Data

title	author	year
Things Fall Apart	Chinua Achebe	1958
Chimera	John Barth	1972
The Sot-Weed Factor	John Barth	1960
Under the Volcano	Malcolm Lowery	1947

Add Primary Keys

id	title	author	year
1	Things Fall Apart	Chinua Achebe	1958
2	Chimera	John Barth	1972
3	The Sot-Weed Factor	John Barth	1960
4	Under the Volcano	Malcolm Lowry	1947

id	title	author_id	year
1	Things Fall Apart	1	1958
2	Chimera	2	1972
3	The Sot-Weed Factor	2	1960
4	Under the Volcano	3	1947

id	name
1	Chinua Achebe
2	John Barth
3	Malcolm Lowry

Selecting Normalized Data

To lookup normalized data, you can use SQL’s JOIN syntax
You specify the fields to match on (linking foreign key to primary key)

 jq = '''SELECT authors.name, books.title, books.year
        FROM books JOIN authors
        ON books.author_id=authors.id'''
books = cursor.execute(join_query, filter).fetchall()

Deleting Normalized Data

Normalizing data introduces some additional complications
Consider our authors and books examples
If you remove a row from the authors table, what happens to the author’s books?
There is a danger that orphaned rows will clutter the database

Deleting Normalized Data (continued)

In order to control the creation of bad data, SQL allows you to specify constraints in your database schema
Among the constraints is one called CASCADE DELETE
In essence, by specifying this constraint, you would force SQLite to remove books that were written by a deleted author when removing the author

	>>> import sqlite3
	>>> conn = sqlite3.connect(':memory:')
	>>> conn
	<sqlite3.Connection object at 0x10507b110>

	cq = '''CREATE TABLE books (
	title TEXT, author TEXT, date INTEGER
	)'''
	cursor.execute(cq)

	iq = '''INSERT INTO books VALUES (
	'2001: A Space Odyssey',
	'Arthur C. Clarke',
	'1951'
	)'''
	cursor.execute(iq)

	data = [
	("I, Robot", "Isaac Asimov", 1950),
	("The Martian", "Andy Weir", 2012),
	("The Left Hand Of Darkness", "Ursula K. Le Guin", 1969)
	]

	sq = '''SELECT title FROM books'''
	books = cursor.execute(sq).fetchall()
	print(books)
	[('2001: A Space Odyssey',),('I, Robot',),('The Martian',),('The Left Hand Of Darkness',)]

Relational Databases

Introduction

Tools

Tools: SQLite

Tools: sqlite3

Tools: DB Browser for SQLite

Databases in Theory

The Relational Model

Other DB Systems

Normalization

SQL

CRUD

Databases in Practice

Connecting: in-memory

Connecting: database file

The connection object

Setting up the database

Create entries

Scaling up

Scaling up (continued)

Read

Update

Verify Update

Delete

Committing Changes

Summary

Normalization in Depth

Why Normalize?

Identifiers (Keys)

Using Keys to Create Joins

Creating Normalized Data

Creating Normalized Data (continued)

Flat Data

Add Primary Keys

Move Authors to Own Table

Selecting Normalized Data

Deleting Normalized Data

Deleting Normalized Data (continued)

	>>> conn2 = sqlite3.connect('test.sqlite')
	>>> conn2
	<sqlite3.Connection object at 0x10507b030>

	conn = sqlite3.connect('biblio.sqlite')
	cursor = conn.cursor()

	imq = '''INSERT INTO books VALUES (?,?,?)'''
	cursor.executemany(imq, data)

	uq = '''UPDATE books
	SET year=2011
	WHERE title="The Martian"'''
	cursor.execute(uq)

	vq = '''SELECT *
	FROM books
	WHERE title="The Martian"'''
	cursor.execute(vq)
	print(cursor.fetchall())
	[('The Martian', 'Andy Weir', 2011)]

	dq = '''DELETE
	FROM books
	WHERE author="Isaac Asimov"'''
	cursor.execute(dq)

	jq = '''SELECT authors.name, books.title, books.year
	FROM books JOIN authors
	ON books.author_id=authors.id'''
	books = cursor.execute(join_query, filter).fetchall()

	conn.commit()
	conn.close()