→This assignment is due by Thursday, March 12, 2026, 11:59 PM.←
→ As with all assignments, this must be an individual effort and cannot be pair programmed. Any debugging assistance must follow the course collaboration policy and be cited in the comment header block for the assignment.←
Jump To: · Rubric · Submission ·
In this assignment, we will focus on classes, vectors, strings, File I/O, and Functions!
Overview
Have you ever finished a book and wondered, "Geez, I wonder how many times each word occurs in this text?" No? This assignment illustrates a fundamental use of collections: storing related values in a single data structure, and then using that data structure to reveal interesting facts about the data.
For this assignment, you will read in a text file containing the story Green Eggs and Ham (plus some others). You will then need to count the number of occurrences of each word & letter and display the frequencies. You'll be amazed at the results!
The Specifics
For this assignment, download the starter code pack. This zip file contains several files:
main.cpp- the predetermined main.cpp. This file shows the usage and functionality that is expected of your program. You are not allowed to edit this file for submission.Makefile- the Makefile to build with your program.input/aliceChapter1.txt- the first chapter of Alice in Wonderland in text format.input/greeneggsandham.txt- the contents of Green Eggs and Ham in text format.input/happybirthday.txt- the contents of singing Happy Birthday to Bjourne in text format.input/romeoandjuliet.txt- the contents of Romeo and Juliet in text format.solutions/aliceChapter1.ans- the expected output when running your program against thealiceChapter1.txtfilesolutions/greeneggsandham.ans- the expected output when running your program against thegreeneggsandham.txtfilesolutions/happybirthday.ans- the expected output when running your program against thehappybirthday.txtfilesolutions/romeoandjuliet.ans- the expected output when running your program against theromeoandjuliet.txtfile
Object Oriented Programming
Referring to the implementation in main.cpp, take note how the program reads as a series of subtasks and the provided comments are redundant.
The code is "self documenting" with the function names providing the steps that are occurring. Your task is to
provide the implementations for all the called functions. You will need to create six files: StreamUtility.h & StreamUtility.cpp,
InputProcessor.h & InputProcessor.cpp,
and OutputProcessor.h & OutputProcessor.cpp to make the program work as intended.
You will want to make your program as general as possible by not having any assumptions about the data hardcoded in. Your program will need to adapt to the contents of the specific file that is being processed. Four public input files have been supplied with the starter pack. We will run your program against additional private input files.
Class Requirements
The UML of each class is given below.
| StreamUtility |
|
+ selectInput() : istream* + selectOutput() : ostream* |
| InputProcessor |
| - words : vector< string > |
|
+ InputProcessor() + getAllWords() : vector< string > + readFromStream( istream ) : void + sanitizeWords( string ) : void |
| OutputProcessor |
|
- uniqueWords : vector< string > - uniqueWordCounts : vector< unsigned long > - letterCounts : vector< unsigned long > |
|
+ OutputProcessor() + anaylzeWords( vector< string > ) : void + writeToStream( ostream ) : void |
You may add additional data members and/or member functions as appropriate to assist with the needed tasks.
The input, output, and task of each member function is described below as well. The functions are:
- StreamUtility::selectInput()
- StreamUtility::selectOutput()
- InputProcessor::InputProcessor()
- InputProcessor::readFromStream()
- InputProcessor::sanitizeWords()
- InputProcessor::getAllWords()
- OutputProcessor::OutputProcessor()
- OutputProcessor::analyzeWords()
- OutputProcessor::writeToStream()
StreamUtility::selectInput()
Input: None
Output: Pointer to an istream object
Task: Prompt the user if they want to read from the standard input or a file. If they wish to
read from the standard input, then return a reference to our standard input object. If they wish to read
from a file, then prompt them for a file name and open the corresponding file. If the file can't be opened,
then return a null pointer. Otherwise, return a pointer to the input file stream object. (hint: the
file stream will need to be allocated on the free store to prevent dangling pointers)
Note: User interaction must match exactly. See example program flow at end.
StreamUtility::selectOutput()
Input: None
Output: Pointer to an ostream object
Task: Prompt the user if they want to write to the standard output or a file. If they wish to
write to the standard output, then return a reference to our standard output object. If they wish to write
to a file, then prompt them for a file name and open the corresponding file. If the file can't be opened,
then return a null pointer. Otherwise, return a pointer to the output file stream object. (hint: the
file stream will need to be allocated on the free store to prevent dangling pointers)
Note: User interaction must match exactly. See example program flow at end.
InputProcessor::InputProcessor()
Input: None
Output: N/A
Task: Initializes any private data members to sensible values
InputProcessor::readFromStream()
Input: An input stream object (hint: istream objects cannot be copied and thus cannot be passed-by-value)
Output: None
Task: Read all the words from the input stream until ENDEND is encountered and store each word
in the private vector of all words. Don't store ENDEND in your vector.
InputProcessor::sanitizeWords()
Input: A string denoting punctuation to remove
Output: None
Task: For each word in the provided vector, remove all occurrences of all the punctuation characters denoted
by the punctuation string and convert each character to its upper case equivalent.
InputProcessor::getAllWords()
Input: None
Output: A vector of strings containing all the words
Task: The function will return the private vector of strings.
OutputProcessor::OutputProcessor()
Input: None
Output: N/A
Task: Initializes private data members to sensible values.
OutputProcessor::analyzeWords()
Input: A vector of strings containing all the words
Output: None
Task: Compute the unique set of words present in the modified words vector. It will also count
the number of occurrences of each unique word in the entire text. The private vectors
will be the same size with element positions corresponding to the same word and count. Additionally, counts the
number of occurrences of each letter A-Z.
OutputProcessor::writeToStream()
Input: An output stream object (hint: ostream objects cannot be copied and thus cannot be passed-by-value)
Output: None
Task: This function will print the following information to the output stream, in the order and format specified.
This output MUST MATCH the specification EXACTLY.
- How many words were read in
- How many unique words were read in
- The complete list of unique words and their associated counts
- The most frequent word
- The least frequent word
- A list of letters and their associated counts
- The most frequent letter
- The least frequent letter
How many words were read in & How many unique words were read in
Print how many total words and how many unique words were read in. Format the output as follows:
Analyzed #T total words
Counted #U unique words
The values correspond to the following :
#T- The number of total words read in.#U- The number of unique words read in.
An example (based on singing Happy Birthday to Bjourne) is shown below:
Anaylzed 16 total words
Counted 5 unique words
Refer to the solution files for longer examples on the expected formatting.
The complete list of unique words and their associated counts
For each word, print out the word and its corresponding count. Format the output as follows:
WORD1 - #C
WORD2 - #C
...
WORDN - #C
Notice how there are two columns. We want the values aligned in each column. The columns correspond to the following values:
WORD- The word. Left align all values. Allocate enough space for the length of the longest word present in the file.#C- The corresponding count of the word. Right align all values. Allocate enough space for the length of the most frequent word present in the file.
An example (based on singing Happy Birthday to Bjourne) is shown below:
HAPPY - 4
BIRTHDAY - 4
TO - 4
YOU - 3
BJOURNE - 1
Refer to the solution files for longer examples on the expected formatting.
The most frequent word & The least frequent word
Print out the two words that occur most often and least often. If there is more than one word that occurs the same number of times, print the one that is encountered first. Print out the following pieces of information:
- The word
- The number of occurrences
- The frequency of appearance as a percentage to 3 decimal places
Format the output as follows:
Most Frequent Word: WORD1 #C (#P%)
Least Frequent Word: WORD2 #C (#P%)
Notice how there are three columns of values. The columns correspond to the following values:
WORD#- The word. Left align all values. Allocate enough space for the length of the longer of the two words.#C- The corresponding count of the word. Right align all values. Allocate enough space for the length of the most frequent word present in the file.#P- The frequency of the word as a percentage. Right align all values. Print to three decimal places.
An example with actual values is shown below:
Most Frequent Word: HAPPY 4 ( 25.000%)
Least Frequent Word: BJOURNE 1 ( 6.250%)
Refer to the solution files for longer examples on the expected formatting.
A list of letters and their associated counts
For each letter, print out the letter and its corresponding count to the provided output stream. Format the output as follows:
A.....#C
B.....#C
...
Y.....#C
Z.....#C
The width needs to match the width of the word table from above. We want the values aligned in each column with a period filling in the gaps. The columns correspond to the following values:
A- The letter left aligned.#C- The corresponding count of the letter. Right align all values.
An example (based on singing Happy Birthday to Bjourne) is shown below:
A..........8
B..........5
C..........0
D..........4
E..........1
F..........0
G..........0
H..........8
I..........4
J..........1
K..........0
L..........0
M..........0
N..........1
O..........8
P..........8
Q..........0
R..........5
S..........0
T..........8
U..........4
V..........0
W..........0
X..........0
Y.........11
Z..........0
Refer to the solution files for longer examples on the expected formatting.
The most frequent letter & The least frequent letter
Print out the two letters that occur most often and least often to the provided output stream. If there is more than one letter that occurs the same number of times, print the one that comes first alphabetically. Print out the following pieces of information:
- The letter
- The number of occurrences
- The frequency of appearance as a percentage to 3 decimal places
Format the output as follows:
Most Frequent Letter: Z #C (#P%)
Least Frequent Letter: A #C (#P%)
Notice how there are three columns of values. The columns correspond to the following values:
A- The letter.#C- The corresponding count of the letter. Right align all values. Allocate enough space for the length of the most frequent letter present in the file.#P- The frequency of the letter as a percentage. Right align all values. Print to three decimal places.
An example with actual values is shown below:
Most Frequent Letter: Y 11 ( 14.474%)
Least Frequent Letter: C 0 ( 0.000%)
Refer to the solution files for longer examples on the expected formatting.
Extra Credit
For extra credit, when analyzing the words ask the user if they wish to sort the unique words.
If they desire to sort the unique words, then ask if they want to sort the words alphabetically or by their associated counts.
If they want to sort alphabetically, then sort the words from A to Z. If they want to sort by counts, then sort the words
by decreasing counts. If two words have the same count, then sort them alphabetically.
Sample outputs are provided and denoted by solutions/*_xc_*.ans.
Note: User interaction must match exactly. See example program flow at end.
The sample output sorting by word for singing Happy Birthday is below:
BIRTHDAY : 4
BJOURNE : 1
HAPPY : 4
TO : 4
YOU : 3
Most Frequent Word: BIRTHDAY 4 ( 25.000%)
Least Frequent Word: BJOURNE 1 ( 6.250%)
The sample output sorting by count for singing Happy Birthday is below:
BIRTHDAY - 4
HAPPY - 4
TO - 4
YOU - 3
BJOURNE - 1
Most Frequent Word: BIRTHDAY 4 ( 25.000%)
Least Frequent Word: BJOURNE 1 ( 6.250%)
Notice the additional change in the most frequent word selected and the resultant alignment.
Sample Program Executions
The execution of your program is expected to conform to the following I/O examples:
Select input source:
(1) - Standard Input
(2) - File Input
Choice: 2
Enter the name of the file to read from: input/happybirthday.txt
Do you wish to sort the words (Y|N): N
Select output destination:
(1) - Standard Output
(2) - File Output
Choice: 2
Enter the name of the file to write to: output/happybirthday.txt
Analysis complete, verify results
Select input source:
(1) - Standard Input
(2) - File Input
Choice: 2
Enter the name of the file to read from: input/happybirthday.txt
Do you wish to sort the words (Y|N): y
Do you wish to sort by (W)ord or (C)ount? W
Select output destination:
(1) - Standard Output
(2) - File Output
Choice: 2
Enter the name of the file to write to: out/sorted.txt
Analysis complete, verify results
Select input source:
(1) - Standard Input
(2) - File Input
Choice: 4
Select input source:
(1) - Standard Input
(2) - File Input
Choice: f
Select input source:
(1) - Standard Input
(2) - File Input
Choice: 2
Enter the name of the file to read from: input/happybirthday.txt
Do you wish to sort the words (Y|N): r
Do you wish to sort the words (Y|N): 5
Do you wish to sort the words (Y|N): y
Do you wish to sort by (W)ord or (C)ount? d
Do you wish to sort by (W)ord or (C)ount? 7
Do you wish to sort by (W)ord or (C)ount? c
Select output destination:
(1) - Standard Output
(2) - File Output
Choice: n
Select output destination:
(1) - Standard Output
(2) - File Output
Choice: 3
Select output destination:
(1) - Standard Output
(2) - File Output
Choice: 2
Enter the name of the file to write to: happybirthdayOut.txt
Analysis complete, verify results
Select input source:
(1) - Standard Input
(2) - File Input
Choice: 1
Enter text to analyze ("ENDEND" will cease input): It's good practice to perform additional tests for other scenarios that could possibly occur. ENDEND
Do you wish to sort the words (Y|N): n
Select output destination:
(1) - Standard Output
(2) - File Output
Choice: 1
Analyzed 14 total words
Counted 14 unique words
ITS - 1
GOOD - 1
PRACTICE - 1
TO - 1
PERFORM - 1
ADDITIONAL - 1
TESTS - 1
FOR - 1
OTHER - 1
SCENARIOS - 1
THAT - 1
COULD - 1
POSSIBLY - 1
OCCUR - 1
Most Frequent Word: ITS 1 ( 7.143%)
Least Frequent Word: ITS 1 ( 7.143%)
A............5
B............1
C............6
D............4
E............5
F............2
G............1
H............2
I............6
J............0
K............0
L............3
M............1
N............2
O...........11
P............3
Q............0
R............7
S............7
T............9
U............2
V............0
W............0
X............0
Y............1
Z............0
Most Frequent Letter: O 11 ( 14.103%)
Least Frequent Letter: J 0 ( 0.000%)
Analysis complete, verify results
Functional Requirements
- Your classes must match the UML Diagram specifications exactly. (You are permitted to add additional helper members, but the above interface must be conformed to.)
- You may not make use of the standard library functions
std::sort(),std::find(),std::any_of()or anything else from#include <algorithm>. You must implement your own functions. (You are permitted, and encouraged, to usestd::string::find().) - DO NOT use global variables.
- DON'T even think about using
goto. - You must use parameters & class members properly.
- The graders will use the base
main.cppprovided with the distribution so your solution must work with the provided implementation. - For this assignment, the output must match the example solutions exactly. The public provided test files are expected to match the provided output files exactly. The private test files will need to generate the expected output as well.
- You must use
constappropriately within your classes and functions.
Hints
- Do not wait until the day before this is due to begin.
- The first step is to create the files and class function stubs to get the program to compile and run.
- The second step is to implement each function one at a time. Verify the function is correct before moving on to the next function.
- Do not just dive into the assignment. Create a mental plan of what tasks your program needs to accomplish. Convert this to pseudocode. Tackle the first task (eg, "can I open the file ok?") and conduct a sanity check. Then tackle the next task (eg, "can I read all the words in the file, and store the frequencies of each word?") and conduct another sanity check. We strongly suggest writing your program (one step at a time!)
- You may modify
main.cppto verify each step is working properly. Be sure your classes work with the expected provided files. - You may add additional functions to assist if you deem it necessary. A common task is determining how many digits are present in an integer.
- Do not make any assumptions about the file contents. Cleanly handle scenarios where counts may be zero.
Grading Rubric
As this is an introductory C++ course that teaches the fundamental concepts of the language and implementation details of each algorithm, the use of the C++ algorithm library, lambda functions, structured bindings, and smart pointers are prohibited. The use of auto is also discouraged to be aware of the explicit type of every variable used throughout your program.
Your submission will be graded according to the following rubric:
| Points | Requirement Description |
| 0.5 | Submitted correctly by Thursday, March 12, 2026, 11:59 PM |
| 0.5 | Project builds without errors nor warnings. |
| 2.0 | Best Practices and Style Guide followed. |
| 0.5 | Program follows specified user I/O flow. |
| 0.5 | Public and private tests successfully passed. |
| 2.0 | Fully meets specifications. |
| 6.00 | Total Points |
| Extra Credit Points | Requirement Description |
| +0.5 | Words sorted by criteria when output. |
Submission
Always, always, ALWAYS update the header comments at the top of your main.cpp file. And if you ever get stuck, remember that there is LOTS of help available.
Zip together your main.cpp, Makefile, InputProcessor.h, InputProcessor.cpp, OutputProcessor.h, OutputProcessor.cpp, StreamUtility.h, StreamUtility.cpp files and name the zip file A3_USERNAME.zip where USERNAME is your user id. Upload this zip file to Canvas under A3.
After submitting to Canvas, download your submission to ensure your submission is correct and complete. Submissions that are empty or contain only the starter code will not be considered.
→This assignment is due by Thursday, March 12, 2026, 11:59 PM.←
→ As with all assignments, this must be an individual effort and cannot be pair programmed. Any debugging assistance must follow the course collaboration policy and be cited in the comment header block for the assignment.←