CSCI 403 Database Management

Spring 2018

Project 2: Query (1)

(Back to Projects)

Purpose



Overview


The Structured Query Language (SQL) is a key factor in the success of the relational model of databases. By providing a standard, high-level interface for working with relational databases, SQL enables data abstraction and program-data independence, leading to cleaner and more robust application code. SQL is standardized by the ISO and ANSI standards bodies, most recently as SQL:2011; while all DB vendors provide their own variations on the language (usually by extending it, rather than limiting it), the fundamentals are fairly constant from one vendor to another.

SQL can roughly be divided into three parts: the data definition language (DDL), the data manipulation language (DML), and everything else (primarily security and administration commands). Applications programmers primarily need concern themselves with the DDL and DML, leaving administration and security to dedicated database administrators (DBAs). (Of course, in small companies the programmers may also act as DBAs.)

For this project, you will practice some basic SQL SELECT queries (part of the DML), essentially covering everything we discussed in lecture 3. The setting for this project is the Hogwarts school (from the Harry Potter books by J.K. Rowling). If you haven't read the books or seen the movies, here is a summary of the points you need to know for this assignment:

Hogwarts is a school for wizards and witches; starting at age 11, children with magical powers are sent to Hogwarts for a 7 or 8 year curriculum covering all things magical. Students are placed (sorted) at the start of school into one of four houses, each with its own unique character. Each house was founded by an illustrious wizard when Hogwarts began; each founder had a unique relic with special powers; and each house has a crest with a representative animal.

Most of the faculty at Hogwarts remain the same from year to year, but one position was particularly unlucky during the years of Harry Potter's attendance: the position of Defence Against the Dark Arts (DADA) teacher. Every year from 1991 through 1997 there was a new DADA teacher, as the previous teacher died, went mad, or was otherwise removed from the position.

Despite this being a fictional universe, there is a surprising amount of information about characters from the books online; I have extracted data about a number of Hogwarts students from the internet for your use in this project. For each student there is a first and last name, a house (where known), starting year at Hogwarts (where known), and ending year at Hogwarts (where known). This data is in one table; in addition there is a table with details about each Hogwarts house, and another table giving the names of the DADA teachers in the years 1991-1998.

Your job is to write a number of SELECT queries to answer questions about Hogwarts and its students over the years.


Before You Begin


It is a good idea to keep the PostgreSQL SQL reference handy at all times!

You may wish to download project2.sql as a template for your solution file for this project.

Now, you should get acquainted with some features of your query tool for discovering what tables you have and what the structure of those tables is. Details differ if you are using psql vs SQuirreL, so each tool gets its own section below.

psql

In psql, first try issuing the command \d to list all the tables currently in your search path. (The way I set up your account, your default search path looks at your schema - named the same as your user - first, then the public schema.) The tables for this project are in the public schema.

Next, take a look at the hogwarts_students table by issuing the command \d hogwarts_students. You should see something like the following:

    csci403=# \d hogwarts_students
     Table "public.hogwarts_students"
     Column |     Type     | Modifiers 
    --------+--------------+-----------
     last   | text         | not null
     first  | text         | not null
     house  | text         | 
     start  | numeric(4,0) | 
     finish | numeric(4,0) | 
    Indexes:
        "hogwarts_students_pkey" PRIMARY KEY, btree (last, first)
    Foreign-key constraints:
        "hogwarts_students_house_fkey" FOREIGN KEY (house) REFERENCES hogwarts_houses(house)
    

This block is showing you that there are columns named last, first, house, start, and finish in the hogwarts_students table. The first three columns are of type (or domain) "text", which is a PostgreSQL type that allows arbitrary length strings to be stored. (Most modern databases have a type like this, but it is not standard; the standard only specified types like "varchar" which have a maximum character length.) The last two columns have type "numeric(4,0)", which means they store 4-digit numbers with no decimal component. You can also see that last and first are constrained to not allow NULL values.

An explanation of the fields in this table (this is often called a data dictionary):

Column Description
last Student last name
first Student first name
house Hogwarts house the student sorted into (if known)
start Year the student started at Hogwarts (if known)
finish Year the student graduated from Hogwarts (if known, and if the student graduated)

The rest of the information given by the \d hogwarts_students command tells you about the primary key and a foreign key of the table. The "hogwarts_students_pkey" gives the name of the primary key constraint object, "btree" tells you the data structure used for the index (more on this in a future lecture), and "(last, first)" tells you the columns in the primary key. The information about the foreign key similarly gives you a name, the column in hogwarts_student comprising the foriegn key, and the referenced table and column.

Take a look at the other two tables that start with hogwarts_. The hogwarts_houses table should be pretty self-explanatory from the Overview section above; the hogwarts_dada table columns are similar in meaning to the columns of the same name in hogwarts_students. (If you have any questions, as always, post them on Piazza!)

For more on what you can learn about the database from psql, try issuing the command \?

SQuirreL

In SQuirreL, select the Objects tab to browse objects in the database. The left hand pane lists all the schemas; clicking on the icon to the left of the schema expands to show you a list of the object types, which again can be expanded to show objects of the corresponding type (see image below).

Expand the public schema, and then the TABLE type to see all of the tables in the public schema. Select hogwarts_students to start with. You should now see a number of tabs in the right hand pane which you can click to select which information about the object you want to look at. (Most of these give a lot more detail than you are likely to ever need, unfortunately - they basically dump the catalog information about the object.)

Take a look under the Columns tab to see what attributes are in the hogwarts_students table. You should see a view like the one below:

This view is showing you that there are columns named last, first, house, start, and finish in the hogwarts_students table. The first three columns are of type (or domain) "text", which is a PostgreSQL type that allows arbitrary length strings to be stored. (Most modern databases have a type like this, but it is not standard; the standard only specified types like "varchar" which have a maximum character length.) The last two columns have type numeric with a COLUMN_SIZE value of 4 and a DECIMAL_DIGITS value of 0, which means they store 4-digit numbers with no decimal component. (In SQL, this type is written "numeric(4,0)".) You can also see that last and first are constrained to not allow NULL values.

An explanation of the fields in this table (this is often called a data dictionary):

Column Description
last Student last name
first Student first name
house Hogwarts house the student sorted into (if known)
start Year the student started at Hogwarts (if known)
finish Year the student graduated from Hogwarts (if known, and if the student graduated)

Some of the other tabs give additional important information. The Primary Key tab gives you information about the primary key constraint on the hogwarts_students table, showing that the key uses two attributes (last, first), and the constraint object is named "hogwarts_students_pkey". The Imported Keys tab shows you information about foreign keys of the hogwarts_students table; this view is one of those that is unfortunately a bit difficult to decode. In short, it tells you that there is a foreign key constraint named "hogwarts_students_house_fkey" on the house column referencing the house column of the hogwarts_houses table.

Take a look at the other two tables that start with hogwarts_. The hogwarts_houses table should be pretty self-explanatory from the Overview section above; the hogwarts_dada table columns are similar in meaning to the columns of the same name in hogwarts_students. (If you have any questions, as always, post them on Piazza!)


Instructions


All of the questions below should be answered strictly via queries executed on the database, e.g., using psql or SQuirreL. That is, although you could query the database and do some post-processing within a client programming environment, you are required for this project to do all of the work on the database (server) side, using pure SQL. The deliverable for this project is a SQL script (a text file containing your queries - traditionally with a .sql extension - see project2.sql for a template you can edit) containing one query for each question posed. Please put a comment (anything between /* and */ will be ignored by the SQL processor, or anything after a -- on a line) before each query identifying which question the query answers. Be sure to terminate each query with a semi-colon!

Example:

If the question is "What do we know about Gryffindor house?", then your script might include the query

        SELECT * FROM hogwarts_houses WHERE house = 'Gryffindor';
	

Questions:

    Easy starters
  1. What year did Nymphadora Tonks start at Hogwarts?
  2. What records do we have for students who started at Hogwarts before 1900?
  3. What house did Padma Patil sort into?

  4. Functions and operators
  5. How many years was Percy Weasley at Hogwarts?
  6. What students have a last name starting with "Q" or a first name starting with "Ph"?
  7. What students' houses are unknown?
  8. Who founded the house whose crest displays a badger? Hint: you may need the function lower() to answer this one, unless you cheat and capitalize 'Badger' in your query.
  9. What are the names of all Gryffindor students, given as "firstname lastname", without extra spaces, ordered by last name and first name? E.g., the answer should include strings like
  10.         Hermione Granger
            Harry Potter
            Ronald Weasley
            
  11. What defense against the dark arts teacher's first name started with 'A' whose last name did not start with 'M'?

  12. Miscellaneous
  13. What are the names of the Gryffindor students who started in 1991, sorted by last name then first name?
  14. What unique ending years do we have student records for, ordered by ending year?

  15. Simple Joins - you must use a join in each of your answers
  16. What are the names and years of all the students whose houses are known, together with their house colors, ordered by starting year?
  17. Who founded the house that Morag McDougal sorted into?
  18. What are the names and houses of the defense against the dark arts teachers (you only need worry about the teachers for whom we have student records)?

Grading:


Each question/query (14 total) 5 points each
README 5 points
Total: 75 points

Documentation:


README:

  1. Include names of all people who helped/collaborated as per the syllabus
  2. Describe the challenges you encountered and how you surmounted them
  3. What did you like/dislike about the assignment?
  4. How long did you spend on this assignment?

Submit a zip file on Canvas containing: