sort
functionThis lab will walk you through some exercises using the standard library sort
function. As usual, we ask that you work with a partner or a team. If you end up solo, please raise your hand and we'll pair you with someone else (or make a three-person team if necessary). You are welcome to change partners throughout the semester!
As we discussed in class, sorting is a first step in many algorithms. Some of the problems we'll explore below, such as finding the mode of a collection, are quite expensive to solve without sorting, but become linear or cheaper once a collection is sorted. (Another example is binary search, which we know becomes an O(log n) problem after sorting. While sorting has a cost of O(n log n), if we search the collection many (>n) times, sorting pays off.)
The standard library contains multiple sort functions for different applications. We'll just focus on the basic sort
function, which has O(n log n) worst-case performance and uses no additional memory (in a big-O sense) over the collection being sorted.
The sort
function is actually a sort
knows how to sort collections of different types without any special work. For this assignment, we'll be sorting vectors of integers, vectors of strings, and strings.
As we saw in class, to sort a vector named vec
, we only need to #include <algorithm>, and then simply:
sort(vec.begin(), vec.end());
The begin
and end
methods return special objects called
The
To explore this particular problem, you will work with the file words.txt, which contains 50,000 unique words chosen randomly from the Scrabble dictionary we used in a previous lab. For this task, you need to read in all the words from words.txt and then tell us the 8489th lexicographically smallest, 38719th lexicographically smallest, and 2796th lexicographically smallest words in the file. (A word is lexicographically smaller than another if it sorts less than the other word - lexicographical sorting is like alphabetical, except it uses ASCII codes rather than the alphabet. In our setting, we only have lower-case letters, so it is the same as alphabetical.) Enter your answer in the Canvas quiz for this lab. Don't forget about zero-based indexing!
You can write all the code yourself if you want the practice, but since we are focusing on sorting and not file I/O for this lab, here's some starter code you are welcome to paste into main.cpp in a new project:
/* * solution1.cpp * * Starter code for solving Problem 1: Selection & Median * Lab 4 - Sorting * CSCI 262 Data Structures * */ #include <string> #include <iostream> #include <fstream> #include <vector> using namespace std; int main() { vector<string> vec; ifstream fin("words.txt"); if (!fin) { cerr << "File 'words.txt' not found. Exiting." << endl; return -1; } for (int i = 0; i < 50000; i++) { string word; fin >> word; if (!fin) { cerr << "Error reading words from 'words.txt'. Exiting." << endl; return -1; } vec.push_back(word); } fin.close(); // print 8489th, 38719th, and 2796th words from vec return 0; }
Note that the words.txt file must be in the directory where your program is running from. In CLion by default, that is the "cmake-build-debug" directory.
The
Sorting makes this problem easier by putting equivalent values adjacent to each other. You can then find the mode in linear time by looping over the values and counting adjacent "runs" of values (sequences of the same value). Find the mode of the values in the file numbers.txt, which contains 50,000 integers. (Note that multiple modes are possible, but we've fixed it so there is only one mode in this instance.) Report both the mode and how many times it appears.
As for the previous problem, here's some starter code.
/* * solution2.cpp * * Starter code for solving Problem 2: Mode * Lab 4 - Sorting * CSCI 262 Data Structures * */ #include <iostream> #include <fstream> #include <vector> using namespace std; int main() { vector<int> vec; ifstream fin("numbers.txt"); if (!fin) { cerr << "File 'numbers.txt' not found. Exiting." << endl; return -1; } for (int i = 0; i < 50000; i++) { int number; fin >> number; if (!fin) { cerr << "Error reading number from 'numbers.txt'. Exiting." << endl; return -1; } vec.push_back(number); } fin.close(); // print the mode of the values in numbers.txt return 0; }
How can you tell if two strings are anagrams of each other? One way is to simply count the number of times each letter appears in each string, then compare counts. A less tedious way, though, is to simply sort the values in each string and then compare. If the original strings were anagrams of each other, the sorted strings will be the same!
Sorting a string is the same as sorting a vector. Given string s
, just do:
sort(s.begin(), s.end());
For this problem, we're giving you a vector of strings, all of the same length. Find the string that is an anagram of "computer science rocks", and report it on the Canvas quiz! Here's your starter code with the problem embedded:
/* * solution3.cpp * * Starter code for solving Problem 3: Detecting anagrams * Lab 4 - Sorting * CSCI 262 Data Structures * */ #include <string> #include <iostream> #include <vector> #include <algorithm> using namespace std; int main() { vector<string> possibles = { "poor musket conscience", "cricketer unstop mucus", "concupiscent elks roam", "eccentric coke possums", "crescent pocket curium", "mockup curt reticences", "poke concentric mouses", "skip comets occurrence", "concrete cupric musket", "circumspect coke noses", "sum concoction keepers", "succinct composer ekes", "mercuric petcock tunes", "muskie concocts crepes", "smoker conscience pout", "cement truck occupiers", "eccentrics truck poems" }; string match = "computer science rocks"; // print the anagram of "computer science rocks" return 0; }
Answer the questions for the lab 4 quiz on Canvas.