CSCI 262 Data Structures

Fall 2018

Lab 3: Analysis

Goals

Practice analyzing algorithms
Validate your analysis with experimental data

Overview

This lab will walk you through some exercises concerning analysis. As usual, we ask that you work with a partner or a team. If you end up solo, please raise your hand and we'll pair you with someone else (or make a three-person team if necessary). You are welcome to change partners throughout the semester!

Instructions

Step 1: analyze

For this lab you will try to find the "big-O" complexity of three common algorithms: binary search, insertion sort, and the Fisher-Yates shuffling algorithm. (Yes, you can look these up on Wikipedia, but don't until you've at least tried to analyze them yourself, first.)

To start with, look at the pseudo-code given below for each algorithm, and try to work out the "big-O" complexity using the techniques we discussed in class. For all of these, express the "big-O" in terms of the size of the input (which is a vector in the code we provided, but you can treat it as if it were an array - the operations we use on the vector all take constant time). If you find the pseudo-code too confusing, you can try the actual code implementing these functions in lab3.cpp, but the pseudo-code is likely to be easier to work with! Write up your analyses in the lab 3 quiz in Canvas.

Fisher-Yates shuffle

    shuffle(A) : A is a vector
        let i = A.size - 1
        while i > 0 do
            let index = a random integer between 0 and i, inclusive
            swap(A[i], A[index])
            i--

In words, Fisher-Yates starts at the end of the vector and swaps the last value with the value in a position chosen at random from all possible positions in the vector. The last position is then fixed, and the remaining values are shuffled, i.e., the second-to-last value is swapped with a value chosen at random from all but the last position, etc. When only one position is left, there is no reason to continue and the algorithm is finished. Given a true random value function, this algorithm produces a "perfect" shuffle (i.e., all permutations are equally probable).

Insertion sort

    insertion_sort(A) : A is a vector
        for i = 1 to A.size - 1 do
            let x = A[i]
            let j = i - 1
            while j >= 0 and A[j] > x do
                A[j + 1] = A[j]
                let j = j - 1  
            A[j + 1] = x

In words, insertion sort starts by assuming that the first value (e.g., the sub-vector of length 1 starting at index 0) is "sorted". Now, starting with the second element and moving up, each element in turn is shifted left (into the sorted part of the vector) until it finds its proper place in the sorted part (which is when the element is no longer less than the element to its left).

Binary search

    binary_search(A, x) : A is a vector, x is an integer
        if A.size == 0 return -1
        let pivot = floor(A.size / 2)
        if A[pivot] == x return pivot
        else if A[pivot] > x return binary_search(A[0..pivot-1], x)
        else return binary_search(A[pivot+1..A.size-1], x)

We will soon study this algorithm in lecture. The basic approach is this: we know the input A is sorted. Look at the middle element - if the middle element is the value we are searching for, then we're done. Else, if the middle element is greater than what we're searching for, recursively search only in the lower half of A. Else, the middle element is less than the search value, so recursively search in the upper half of A. If we ever start our search on an empty vector, then we return -1 to indicate the element isn't there.

Step 2: experiment

How well do these algorithms perform in practice? You will do some experiments to find out. Download lab3.cpp and put it into a new project. The code should compile and run as is. You can do anything you want to with this code - it is your microscope for this lab, but you won't be turning it in.

As written, the lab code prompts you for a size of vector to test with and a number of trials to run. For some algorithms, it is difficult to accurately measure timing with just one run, particularly if the algorithm runs very fast. So we measure a lot of runs of the same algorithm and average.

The code provided only runs trials on insertion sort - the code is there for the other two algorithms, but commented out. It is recommended that you experiment separately on each algorithm. For each algorithm, run timing experiments on each of the algorithms for various size inputs - you will need quite a few data points to get a good experimental result. Collect the timing data and (optionally) plot it in Excel (or the tool of your choice). Your x-axis should show the size of the input, while your y-axis should indicate timing information. Label your axes! You will want to experiment with and plot each algorithm separately, as it is difficult to get them all on one plot in a useful form. If you choose not to plot your data, please present it in Canvas in a well-labeled tabular form, along with some text explaining what you see in the data.

Based on your experimental data, revisit your "big-O" analyses. Provide an analysis in the lab 3 quiz in Canvas.