CSCI 403 Database Management

Spring 2018

Project 3: Make

(Back to Projects)

Purpose



Overview


In the last project, you practiced some basic SQL SELECT queries (part of the data manipulation language, or DML). This project will have you do some very basic data definition language (DDL) as well as modification queries (INSERT, UPDATE, and DELETE) from the DML. For this project you will be creating a small data table containing information about the Harry Potter books by J.K. Rowling. Along the way we'll also explore some of the issues that arise in dealing with complex data.


Before You Begin


It is a good idea to keep the PostgreSQL SQL reference handy at all times!


Instructions


Follow the steps below to create a simple data table of Harry Potter books. The deliverable for this project is a SQL script implementing each step below. Please include comments identifying which step the following query or queries performs.

Stage 1

For starters, you are going to create a table in your schema in the database and get the information below into it:

Number Title ISBN Publication date Pages
1 Harry Potter and the Philosopher's Stone 0-7475-3269-9 1997-06-26 223
2 Harry Potter and the Chamber of Secrets 0-7475-3849-2 1998-07-02 251
3 Harry Potter and the Prisoner of Azkaban 0-7475-4215-5 1999-07-08 317
4 Harry Potter and the Goblet of Fire 0-7475-4624-X 2000-07-08 636
5 Harry Potter and the Order of the Phoenix 0-7475-5100-6 2003-06-21 766
6 Harry Potter and the Half-Blood Prince 0-7475-8108-8 2005-07-16 607
7 Harry Potter and the Deathly Hallows 0-545-01022-5 2007-07-21 607
8 Harry Potter and the Bunnies of Doom 1-234-56789-0 2010-01-15

Step 1: Create the table

Create a table (using CREATE TABLE) named 'books1' with columns for number, title, isbn number, publication date, and page count. Use your judgment for the names and data types of the columns, being certain you have enough room for the data in each column. (Your choices don't matter a whole lot at this point, but may make future steps easier or harder.) Make the number column a primary key, just for practice.

Pro-tip: before your CREATE TABLE "query", put in the query "DROP TABLE IF EXISTS books1;". As the code suggests, this will drop (remove) the table books1, if it exists in the database. Having this in your script means you can re-run your entire script as often as you want.

Step 2: Insert the data

Write an INSERT query to put the first row from the above table into your new books1 table. Then (your choice), either write one additional query to insert all of the rest of the data, or write separate queries to insert the rest of the books. (This is a tedious way to add data to a database, even with cut-and-paste; we'll see a faster/easier way in a future lecture.) Watch out for the missing value in the last row - that should end up in your table as a NULL value.

Step 3: Fix errors

Oops - there is no eighth book named "Harry Potter and the Bunnies of Doom" (at least to my knowledge - the Harry Potter fanfic community is pretty large and active). Write a DELETE query to eliminate this book from the table. There are many valid ways to do this; typically you would want to eliminate records via a primary key value, just to ensure you don't accidentally eliminate something that happens to share a similar value. In this instance, you can use any of the fields in a WHERE condition to determine the data to be eliminated - you might practice with different approaches here (but just give us one query).

Pro-tip: until we learn about transactions, queries that modify data, such as DELETE queries, take effect immediately and are irrevocable. How do you make sure you are deleting the rows you really want to delete, and nothing more (or less)? A good practice is to always do the query first as a SELECT query, then if the rows you SELECT are the ones you want to delete, you can go ahead with the DELETE query.

Step 4: (Optional) Play

At this point you should have a script that you can rerun as needed to reproduce your work up to this point. This is a perfect time to experiment with whatever DDL or DML you are curious about - drop the table, alter the table in some way, create another table with a foreign key reference, delete data, update data, insert data as you desire. (You can do this anytime, as a matter of fact - your schema is your playground, and we won't grade you on anything in your schema. We will be running your scripts in a separate database.)

Stage 2

Next, you will get some data about the Harry Potter books as published in the United States into your database. (The data in books1 comes from the first UK publication of the books, per Wikipedia.)

Step 5: Create a second table and copy data

Your second table will be named 'books2' and should be designed to hold the data from the table public.project3_us_books. Note you have a few choices here, and whichever way you go is fine; in the spirit of practicing and learning, why not try several? Here are three options to try (but only give us one in your solution script):

  1. CREATE TABLE, setting up the fields how you desire. Then follow up with an INSERT ... SELECT query.
  2. CREATE TABLE ... AS SELECT ... WITH NO DATA - create the table, but without loading the data. Then follow up with an INSERT ... SELECT query.
  3. CREATE TABLE ... AS SELECT ... - create the table and get all of the data from public.project3_us_books loaded in one step.

Stage 3

Time to combine the data into one table.

Step 6: Create the final table

Your final table should be named 'books', and should have the following columns:

Column Type Properties
number integer primary key
title text or varchar(50) not null
isbn text or varchar(17) not null, unique
publicationdate date not null
pages integer
ustitle text or varchar(50)
uspublicationdate date
uspages integer

Step 7: Populate books

Using a single INSERT...SELECT query, populate all of the columns in books except ustitle using a join SELECT query combining data from books1 and books2 (join on number). Yes, we don't need to do this separate from step 8, but you need the practice with UPDATE, so do two separate steps.

Step 8: Create the US titles

Using a single UPDATE query, set all of the ustitle column values equal to the corresponding title column.

Step 9: Fix errors

The title of Book 1 in the United States is "Harry Potter and the Sorcerer's Stone", which is not the same as the title you currently have in your table (which is the UK title). Fix the ustitle data for book 1 with a query.


Notes


Databases about books seem simple at first, but in reality are quite complex. Consider our example above - Harry Potter books have been published around the world, in 73 different languages. In addition, a given book might be published in hardcover, as a "trade" paperback, and as a "mass-market" paperback. It might later be packaged into a multi-volume "collector's edition", or short works might be combined into single volumes. Different publishings of the same book might feature forewords by different people. This means you might have hundreds of different individual publications - all of the same literary work!

The design we used in the final table above obviously would not scale well if we tried to track the publication information for each book in all its forms. As we continue in the semester, we'll start to look at ways to design a database that can more easily accommodate data of this type. Even with the best design, though, data can be difficult to wrangle - be prepared to get messy and accept a certain degree of imprecision when necessary.

For a further taste of the complexities of book databases, consider this booklet from the Library of Congress on the standard format used to exchange data about books between libraries.


Grading


Each step except #4 6 points
README 2 points
Total: 50 points

Documentation:


README:

  1. Include names of all people who helped/collaborated as per the syllabus
  2. Describe the challenges you encountered and how you surmounted them
  3. What did you like/dislike about the assignment?
  4. How long did you spend on this assignment?

Submit a zip file on Canvas containing: