→This assignment is due by Friday, July 01, 2022, 11:59 PM.←
→ As with all assignments, this must be an individual effort and cannot be pair programmed. Any debugging assistance must follow the course collaboration policy and be cited in the comment header block for the assignment.←
→ Do not forget to complete the following labs with this set: L2A,
L2B,
L2C,
L2D
←
· Instructions · Rubric ·Submission ·
In this assignment, we will focus on arrays, vectors, strings, File I/O, and Functions!
Overview
Have you ever finished a book and wondered, "Geez, I wonder how many times each word occurs in this text?" No? This assignment illustrates a fundamental use of the array & vector: storing related values in a single data structure, and then using that data structure to reveal interesting facts about the data.
For this assignment, you will read in a text file containing the story Green Eggs and Ham (plus some others). You will then need to count the number of occurrences of each word & letter and display the frequencies. You'll be amazed at the results!
The Specifics
For this assignment, download the starter code pack. This zip file contains several files:
main.cpp
- the predetermined main.cpp. This file shows the usage and functionality that is expected of your program. You are not allowed to edit this file. You will not be submitting this file with your assignment.Makefile
- the preset Makefile to build with your program.input/aliceChapter1.txt
- the first chapter of Alice in Wonderland in text format.input/greeneggsandham.txt
- the contents of Green Eggs and Ham in text format.input/romeoandjuliet.txt
- the contents of Romeo and Juliet in text format.output/aliceChapter1.out
- the expected output when running your program against thealiceChapter1.txt
fileoutput/greeneggsandham.out
- the expected output when running your program against thegreeneggsandham.txt
fileoutput/romeoandjuliet.out
- the expected output when running your program against theromeoandjuliet.txt
file
The contents of main.cpp
are shown below:
#include "word_functions.h"
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main(int argc, char* argv[]) {
// get filename to open
string filename;
if(argc == 1) {
cout << "Enter the name of the file you wish to open: ";
cin >> filename;
} else if(argc == 2) {
filename = argv[1];
} else {
cerr << "Usage: " << argv[0] << " [filename]" << endl;
cerr << " filename - optional file to open upon start" << endl;
return -2;
}
// open file for parsing
ifstream fileIn;
if( !open_file(fileIn, filename) ) {
cerr << "Could not open file \"" << filename << "\"" << endl;
cerr << "Shutting down" << endl;
return -1;
}
// read all the words in the file
vector<string> allWords = read_words_from_file( fileIn );
fileIn.close();
cout << "Read in " << allWords.size() << " words" << endl;
// clean the words to remove punctuation and convert to uppercase
const string PUNCTUATION_TO_REMOVE = "?!.,;:\"()_-'&[]";
remove_punctuation(allWords, PUNCTUATION_TO_REMOVE);
capitalize_words(allWords);
// find only the unique words in the file
vector<string> uniqueWords = filter_unique_words(allWords);
cout << "Encountered " << uniqueWords.size() << " unique words" << endl;
// count the occurrences of each unique word in the entire text
vector<unsigned int> uniqueWordCounts = count_unique_words(allWords, uniqueWords);
print_unique_word_counts(uniqueWords, uniqueWordCounts);
// count the occurrences of every letter in the entire text
unsigned int letters[26] = {0};
count_letters(allWords, letters);
print_letter_counts(letters);
// print statistics on letter frequencies
print_max_min_letter(letters);
return 0;
}
main() Parameters
Hmmm, there's something slightly different about our main()
...it can take parameters! Every program we write
must contain a main()
function. When we execute our programs, the computer executes a sequence of functions.
Each function corresponds to a stack frame. The stack frame contains a list of the local variables (declared within the
scope of the function) and a list of the function parameters. Our main()
function is just like all the
other functions we write and it can have parameters. So where does main()
get its arguments from? From the command
line!
When we run our program, we enter a command. Currently it looks like:
./A3
The parameters to main()
correspond to the arguments we enter on the command line to run our program, or call
our main()
function. The two parameters for main()
correspond to the following:
int argc
- the argument count. The number of arguments the user entered. (for the example call above, the count is 1)char* argv[]
- the argument values. This is an array of c-Strings. The size of the array is equal to the argument count. The elements of the array correspond to the specific text the user typed in the terminal. The first entry in the array always corresponds to the executable name (for the example call above, the argument values are{ "./A3" }
)
Why is this advantageous? We can provide additional arguments to the program via the command line. These are called command line arguments. Our program will then read these arguments and we can access them via the parameters. Our program call can now look like:
./A3 input/greeneggsandham.txt
We can then look at how the parameters are used to make sense of the above call.
int main(int argc, char* argv[]) {
// get filename to open
string filename;
if(argc == 1) {
cout << "Enter the name of the file you wish to open: ";
cin >> filename;
} else if(argc == 2) {
filename = argv[1];
} else {
cerr << "Usage: " << argv[0] << " [filename]" << endl;
cerr << " filename - optional file to open upon start" << endl;
return -2;
}
When the program begins, it first checks how many arguments were supplied to the command line. If only one argument was supplied, then we prompt the user at runtime to provide a filename to open. However, if two arguments were supplied via the command line, then we'll take the second argument from the array and use that as the filename. Now the user has the ability to specify the input at start up and there's no need for the program to wait for input. Lastly, if the user gives too many arguments, then we'll display a message showing the expected usage of our program.
Procedural Programming
Referring back to the full code in main.cpp
, Take note how the program now reads as a series of subtasks and the provided comments are redundant.
The code is "self documenting" with the function names providing the steps that are occurring. Your task is to
provide the implementations for all the called functions. You will need to create two files: word_functions.h
and word_functions.cpp
to make the program work as intended.
You will want to make your program as general as possible by not having any assumptions about the data hardcoded in. Three public input files have been supplied with the starter pack. We will run your program against a fourth private input file.
Function Requirements
The requirements of each function are given below. The input, output, and task of each function is described. The functions are:
- open_file()
- read_words_from_file()
- remove_punctuation()
- capitalize_words()
- filter_unique_words()
- count_unique_words()
- print_unique_word_counts()
- count_letters()
- print_letter_counts()
- print_max_min_letter()
open_file()
Input: (1) The input file stream (2) The string filename to open
Output: True if the file successfully opened, False if the file could not be opened
Task: Open the input file stream for the corresponding filename. Check that the file
opened correctly. The string filename will remain unchanged.
read_words_from_file()
Input: The input file stream
Output: A vector of strings
Task: Read all the words that are in the file stream and return a vector of all the words
in the order present in the file.
remove_punctuation()
Input: (1) A vector of strings (2) A string of all the punctuation characters to remove
Output: None
Task: For each word in the vector, remove all occurrences of all the punctuation characters denoted
by the punctuation string. When complete, the input vector will now hold all the words with punctuation removed. The
punctuation string will remain unchanged.
capitalize_words()
Input: A vector of strings
Output: None
Task: For each word in the vector, convert each character to its upper case equivalent. When complete,
the input vector will now hold all the words capitalized.
filter_unique_words()
Input: A vector of strings
Output: A vector of strings
Task: The function will return only the unique words present in the input vector. The output vector
will not contain any duplicate words.
count_unique_words()
Input: (1) A vector of strings containing all the words (2) A vector of strings containing only the unique words
Output: A vector of unsigned integers
Task: The function will count the number of occurrences of each unique word in the entire text. The output vector
will be the same size as the vector of unique words with element positions corresponding to the same word and count.
print_unique_word_counts()
Input: (1) A vector of strings (2) A vector of unsigned integers
Output: None
Task: For each word, print out the word and its corresponding count. Format the output as follows:
WORD1 : #C
WORD2 : #C
...
WORDN : #C
Notice how there are two columns. We want the values aligned in each column. The columns correspond to the following values:
WORD
- The word. Left align all values. Allocate enough space for the length of the longest word present. (Assume the longest word will be at most 20 characters long.)#C
- The corresponding count of the letter. Right align all values. Allocate enough space for the length of the most frequent letter present in the file. (Assume there will be at most 1010 unique words.)
An example (based on singing Happy Birthday to Bjourne) is shown below:
HAPPY : 4
BIRTHDAY : 4
TO : 4
YOU : 3
BJOURNE : 1
Refer to the expected output files for longer examples on the expected formatting.
count_letters()
Input: (1) A vector of strings (2) An array of 26 unsigned integers
Output: None
Task: Count the number of occurrences of each letter present in all words. Each position of the array
corresponds to each letter as ordered by the English alphabet. Upon completion, the array will hold the counts of
each letter and the vector of strings will remain unchanged.
print_letter_counts()
Input: An array of 26 unsigned integers
Output: None
Task: For each letter, print out the letter and its corresponding count. Format the output as follows:
A: #C
B: #C
...
Y: #C
Z: #C
Notice how there are two columns. We want the values aligned in each column. The columns correspond to the following values:
A
- The letter#C
- The corresponding count of the letter. Right align all values. Allocate enough space for the length of the most frequent letter present in the file. (Assume there will be at most 1010 occurrences of each letter.)
An example (based on singing Happy Birthday to Bjourne) is shown below:
A: 8
B: 5
C: 0
D: 4
E: 1
F: 0
G: 0
H: 8
I: 4
J: 1
K: 0
L: 0
M: 0
N: 1
O: 8
P: 8
Q: 0
R: 5
S: 0
T: 8
U: 4
V: 0
W: 0
X: 0
Y: 11
Z: 0
Refer to the expected output files for longer examples on the expected formatting.
print_max_min_letter()
Input: An array of 26 unsigned integers
Output: None
Task: Print out the two letters that occur least often and most often. If there is more than one
letter that occurs the same number of times, print the one that comes first alphabetically. Upon completion,
the input array will remain unchanged. Print out the following pieces of information:
- The letter
- The number of occurrences
- The frequency of appearance as a percentage to 3 decimal places
Format the output as follows:
Least Frequent Letter: A #C (#P%)
Most Frequent Letter: Z #C (#P%)
Notice how there are three columns of values. The columns correspond to the following values:
A
- The letter.#C
- The corresponding count of the letter. Right align all values. Allocate enough space for the length of the most frequent letter present in the file. (Assume there will be at most 1010 occurrences.)#P
- The frequency of the letter. Right align all values. Print to three decimal places.
An example with actual values is shown below:
Least Frequent Letter: C 0 ( 0.000%)
Most Frequent Letter: Y 11 ( 14.667%)
Refer to the expected output files for longer examples on the expected formatting.
Extra Credit
For extra credit, sort the unique words and their associated counts. Sample outputs are provided and denoted by output/*_xc.out
.
Functional Requirements
- You may not make use of the standard library functions
sort()
,find()
,any_of()
or anything else from#include <algorithm>
. You must implement your own functions. - DO NOT use global variables.
- You must use parameters properly, either pass-by-value or pass-by-reference.
- DO NOT use any global variables. You must use parameters properly.
- Mark parameters as const appropriately if the function is not modifying the parameter value.
- For this assignment, the output must match the example outputs exactly.
Hints
- Do not wait until the day before this is due to begin.
- The first step is to create the files and function stubs to get the program to compile and run.
- The second step is to implement each function one at a time. Verify the function is correct before moving on to the next function.
- Do not just dive into the assignment. Create a mental plan of what tasks your program needs to accomplish. Convert this to pseudocode. Tackle the first task (eg, "can I open the file ok?") and conduct a sanity check. Then tackle the next task (eg, "can I read all the words in the file, and store the frequencies of each word?") and conduct another sanity check. We strongly suggest writing your program (one step at a time!)
- You may modify
main.cpp
to verify each step is working properly. - You may add additional functions to assist if you deem it necessary. A common task is determining how many digits are present in an integer.
Testing
The graders will test your program with the following executions:
./A2 input/greeneggsandham.txt
./A2 input/aliceChapter1.txt
./A2 input/romeoandjuliet.txt
./A2 input/privateTestFile.txt
The public provided test files are expected to match the provided output files exactly. The private test file will need to generate the expected output as well.
Grading Rubric
Your submission will be graded according to the following rubric.
Points | Requirement Description |
20 | All labs completed and submitted L2A, L2B, L2C, L2D |
+2 | L2A Extra Credit Completed |
30 | Each function input/output correct as specified and performs correct task meeting the functional requirements. |
+3 | A2 Extra Credit Completed |
3 | Public input test files generate correct results. |
1 | Private input test file generates correct results. |
4 | (1) Comments used (2) Coding style followed (3) Appropriate variable names, constants, and data types used (4) Instructions followed |
58 | Total Points |
→This assignment is due by Friday, July 01, 2022, 11:59 PM.←
→ As with all assignments, this must be an individual effort and cannot be pair programmed. Any debugging assistance must follow the course collaboration policy and be cited in the comment header block for the assignment.←
→ Do not forget to complete the following labs with this set: L2A,
L2B,
L2C,
L2D
←
Submission
Always, always, ALWAYS update the header comments at the top of your main.cpp file. And if you ever get stuck, remember that there is LOTS of help available.
It is critical that you follow these steps when submitting homework.
If you do not follow these instructions, your assignment will receive a major deduction. Why all the fuss? Because we have several hundred of these assignments to grade, and we use computer tools to automate as much of the process as possible. If you deviate from these instructions, our grading tools will not work.
Submission Instructions
Here are step-by-step instructions for submitting your homework properly:
-
Make sure you have the appropriate comment header block at the top of every source code file for this set. The header
block should include the following information at a minimum.
Be sure to fill in the appropriate information, including:/* CSCI 261: Assignment 2: A2 - Green Eggs and Ham
* * Author: XXXX (INSERT_NAME) * Resources used (Office Hours, Tutoring, Other Students, etc & in what capacity): * // list here any outside assistance you used/received while following the * // CS@Mines Collaboration Policy and the Mines Academic Code of Honor * * XXXXXXXX (MORE_COMPLETE_DESCRIPTION_HERE) */- Assignment number
- Assignment title
- Your name
- If you received any type of assistance (office hours - whose, tutoring - when), then list where/what/who gave you the assistance and describe the assistance received
- A description of the assignment task and what the code in this file accomplishes.
Additionally, update theMakefile
for A2 to generate a target executable namedA2
.
- File and folder names are extremely important in this process.
Please double-check carefully, to ensure things are named correctly.
- The top-level folder of your project must be named
Set2
- Inside
Set2
, create 5 sub-folders that are required for this Set. The name of each sub-folder is defined in that Set (e.g.L2A
,L2B
,L2C
,L2D
, andA2
). - Copy your files into the subdirectories of
Set2
(steps 2-3), zip thisSet2
folder (steps 4-5), and then submit the zipped file (steps 6-11) to Canvas. - For example, when you zip/submit
Set2
, there will be 5 sub-folders calledL2A
,L2B
,L2C
,L2D
, andA2
inside theSet2
folder, and each of these sub-folders will have the associated files.
- The top-level folder of your project must be named
- Using Windows Explorer (not to be confused with Internet Explorer), find the files
named
word_functions.h, word_functions.cpp
.
STOP: Are you really sure you are viewing the correct assignment's folder? - Now, for A2, right click on
word_functions.h, word_functions.cpp
to copy the files. Then, return to theSet2/A2
folder and right click to paste the files. In other words, put a copy of your homework'sword_functions.h, word_functions.cpp
source code into theSet2/A2
folder.
Follow the same steps for each lab to put a copy of each lab's deliverable into theSet2/L2
folders. Do this process forSet2/L2A
(main.cpp, Makefile
),Set2/L2B
(main.cpp, coordinate_conversion.h, coordinate_conversion.cpp, Makefile
),Set2/L2C
(main.cpp, Makefile
),Set2/L2D
(string_functions.cpp
).
STOP: Are you sure yourSet2
folder now has all your code to submit?
- Now, right-click on the
"Set2"
folder.- In the pop-up menu that opens, move the mouse
"Send to..."
and expand the sub-menu. - In the sub-menu that opens, select
"Compressed (zipped) folder"
.
STOP: Are you really sure you are zipping aSet2
folder with sub-folders that each contain amain.cpp
file in it?
- In the pop-up menu that opens, move the mouse
- After the previous step, you should now see a
"Set2.zip"
file.
- Now visit the Canvas page for this course
and click the
"Assignments"
button in the sidebar.
- Find Set2, click on it, find the
"Submit Assignment"
area, and then click the"Choose File"
button.
- Find the
"Set2.zip"
file created earlier and click the"Open"
button.
STOP: Are you really sure you are selecting the right homework assignment? Are you double-sure?
- WAIT! There's one more super-important step. Click on the blue
"Submit Assignment"
button to submit your homework.
- No, really, make sure you click the
"Submit Assignment"
button to actually submit your homework. Clicking the"Choose File"
button in the previous step kind of makes it feel like you're done, but you must click the Submit button as well! And you must allow the file time to upload before you turn off your computer!
- Canvas should say "Submitted!". Click "Submission Details" and you can download the zip file you just submitted. In other words, verify you submitted what you think you submitted!
In summary, you must zip the "Set2"
folder
and only the "Set2"
folder, this zip folder must have several sub-folders, you must name all these folders correctly, you must submit the correct zip file for this
homework, and you must click the "Submit Assignment"
button. Not doing these steps is like bringing your
homework to class but forgetting to hand it in. No concessions will be made for
incorrectly submitted work. If you incorrectly submit your homework, we will not be able to
give you full credit. And that makes us unhappy.
→This assignment is due by Friday, July 01, 2022, 11:59 PM.←
→ As with all assignments, this must be an individual effort and cannot be pair programmed. Any debugging assistance must follow the course collaboration policy and be cited in the comment header block for the assignment.←
→ Do not forget to complete the following labs with this set: L2A,
L2B,
L2C,
L2D
←