→This assignment is due by Friday, June 02, 2023, 11:59 PM.←
→ As with all assignments, this must be an individual effort and cannot be pair programmed. Any debugging assistance must follow the course collaboration policy and be cited in the comment header block for the assignment.←
→ Do not forget to complete the following labs with this set: L3A,
L3B
←
→ Do not forget to complete zyBooks Assignment 3 for this set.←
· Instructions · Rubric · Best Practices · Submission Process · Submission Contents ·
In this assignment, we will focus on classes, vectors, strings, File I/O, and Functions!
Overview
Have you ever finished a book and wondered, "Geez, I wonder how many times each word occurs in this text?" No? This assignment illustrates a fundamental use of collections: storing related values in a single data structure, and then using that data structure to reveal interesting facts about the data.
For this assignment, you will read in a text file containing the story Green Eggs and Ham (plus some others). You will then need to count the number of occurrences of each word & letter and display the frequencies. You'll be amazed at the results!
The Specifics
For this assignment, download the starter code pack. This zip file contains several files:
main.cpp
- the predetermined main.cpp. This file shows the usage and functionality that is expected of your program. You are not allowed to edit this file. You will not be submitting this file with your assignment.processor.h
- declaration of function to execute your classes in the expected orderprocessor.cpp
- definition of function to execute your classes in the expected orderMakefile
- the preset Makefile to build with your program.input/aliceChapter1.txt
- the first chapter of Alice in Wonderland in text format.input/greeneggsandham.txt
- the contents of Green Eggs and Ham in text format.input/romeoandjuliet.txt
- the contents of Romeo and Juliet in text format.solutions/aliceChapter1.out
- the expected output when running your program against thealiceChapter1.txt
filesolutions/greeneggsandham.out
- the expected output when running your program against thegreeneggsandham.txt
filesolutions/romeoandjuliet.out
- the expected output when running your program against theromeoandjuliet.txt
file
Object Oriented Programming
Referring to the implementation in processor.cpp
, take note how the program reads as a series of subtasks and the provided comments are redundant.
The code is "self documenting" with the function names providing the steps that are occurring. Your task is to
provide the implementations for all the called functions. You will need to create four files: StringCounter.h
& StringCounter.cpp
and StringFilter.h
& StringFilter.cpp
to make the program work as intended.
You will want to make your program as general as possible by not having any assumptions about the data hardcoded in. Three public input files have been supplied with the starter pack. We will run your program against a fourth private input file.
Class Requirements
The UML of each class is given below.
The input, output, and task of each member function is described below as well. The functions are:
- StringCounter::StringCounter()
- StringCounter::readAllWords()
- StringCounter::printLetterCounts()
- StringCounter::printLetterStats()
- StringCounter::getAllWords()
- StringFilter::StringFilter()
- StringFilter::addWords()
- StringFilter::printUniqueWordCounts()
- StringFilter::printUniqueWordStats()
- StringFilter::getUniqueWords()
StringCounter::StringCounter()
Input: None
Output: N/A
Task: Initializes private data members to sensible values (there are no letters present)
StringCounter::readAllWords()
Input: (1) Reference to the input stream (2) a string of characters to remove from any read words
Output: None
Task: Read all the words that are in the input stream and store in the private vector of all words.
For each word, remove all occurrences of all the punctuation characters denoted
by the punctuation string and convert each character to its upper case equivalent.
StringCounter::printLetterCounts()
Input: Reference to the output stream
Output: None
Task: For each letter, print out the letter and its corresponding count to the standard out. Format the output as follows:
A: #C
B: #C
...
Y: #C
Z: #C
Notice how there are two columns. We want the values aligned in each column. The columns correspond to the following values:
A
- The letter#C
- The corresponding count of the letter. Right align all values. Allocate enough space for the length of the most frequent letter present in the file. (Assume there will be at most 1010 occurrences of each letter.)
An example (based on singing Happy Birthday to Bjourne) is shown below:
A: 8
B: 5
C: 0
D: 4
E: 1
F: 0
G: 0
H: 8
I: 4
J: 1
K: 0
L: 0
M: 0
N: 1
O: 8
P: 8
Q: 0
R: 5
S: 0
T: 8
U: 4
V: 0
W: 0
X: 0
Y: 11
Z: 0
Refer to the solution files for longer examples on the expected formatting.
StringCounter::printLetterStats()
Input: Reference to the output stream
Output: None
Task: Print out the two letters that occur least often and most often to the standard out. If there is more than one
letter that occurs the same number of times, print the one that comes first alphabetically. Print out the following pieces of information:
- The letter
- The number of occurrences
- The frequency of appearance as a percentage to 3 decimal places
Format the output as follows:
Most Frequent Letter: Z #C (#P%)
Least Frequent Letter: A #C (#P%)
Notice how there are three columns of values. The columns correspond to the following values:
A
- The letter.#C
- The corresponding count of the letter. Right align all values. Allocate enough space for the length of the most frequent letter present in the file. (Assume there will be at most 1010 occurrences.)#P
- The frequency of the letter. Right align all values. Print to three decimal places.
An example with actual values is shown below:
Most Frequent Letter: Y 11 ( 14.667%)
Least Frequent Letter: C 0 ( 0.000%)
Refer to the solution files for longer examples on the expected formatting.
StringCounter::getAllWords()
Input: None
Output: A vector of strings containing all the words
Task: The function will return the private vector of strings.
StringFilter::StringFilter()
Input: None
Output: N/A
Task: Initializes private data members to sensible values (there are no words present).
StringFilter::addWords()
Input: A vector of strings containing all the words
Output: None
Task: The function will compute the unique set of words present in the input vector. It will also count
the number of occurrences of each unique word in the entire text. The private vectors
will be the same size with element positions corresponding to the same word and count.
StringFilter::printUniqueWordCounts()
Input: Reference to the output stream
Output: None
Task: For each word, print out the word and its corresponding count. Format the output as follows:
WORD1 : #C
WORD2 : #C
...
WORDN : #C
Notice how there are two columns. We want the values aligned in each column. The columns correspond to the following values:
WORD
- The word. Left align all values. Allocate enough space for the length of the longest word present. (Assume the longest word will be at most 20 characters long.)#C
- The corresponding count of the letter. Right align all values. Allocate enough space for the length of the most frequent letter present in the file. (Assume there will be at most 1010 unique words.)
An example (based on singing Happy Birthday to Bjourne) is shown below:
HAPPY : 4
BIRTHDAY : 4
TO : 4
YOU : 3
BJOURNE : 1
Refer to the solution files for longer examples on the expected formatting.
StringFilter::printUniqueWordStats()
Input: Reference to the output stream
Output: None
Task: Print out the two words that occur least often and most often. If there is more than one
word that occurs the same number of times, print the one that is encountered first. Print out the following pieces of information:
- The word
- The number of occurrences
- The frequency of appearance as a percentage to 3 decimal places
Format the output as follows:
Most Frequent Word: WORD1 #C (#P%)
Least Frequent Word: WORD2 #C (#P%)
Notice how there are three columns of values. The columns correspond to the following values:
WORD#
- The word. Left align all values. Allocate enough space for the length of the longest word present. (Assume the longest word will be at most 20 characters long.)#C
- The corresponding count of the word. Right align all values. Allocate enough space for the length of the most frequent letter present in the file. (Assume there will be at most 1010 occurrences.)#P
- The frequency of the word. Right align all values. Print to three decimal places.
An example with actual values is shown below:
Most Frequent Word: HAPPY 4 ( 25.000%)
Least Frequent Word: BJOURNE 1 ( 6.250%)
Refer to the solution files for longer examples on the expected formatting.
StringFilter::getUniqueWords()
Input: None
Output: A vector of strings containing all the unique words
Task: The function will return the private vector of strings.
Extra Credit
For extra credit, sort the unique words and their associated counts. Sample outputs are provided and denoted by solutions/*_xc.out
. The sample output
for singing Happy Birthday is below:
BIRTHDAY : 4
BJOURNE : 1
HAPPY : 4
TO : 4
YOU : 3
Most Frequent Word: BIRTHDAY 4 ( 25.000%)
Least Frequent Word: BJOURNE 1 ( 6.250%)
Notice the additional change in the most frequent word selected.
Functional Requirements
- You may not make use of the standard library functions
sort()
,find()
,any_of()
or anything else from#include <algorithm>
. You must implement your own functions. - DO NOT use global variables.
- You must use parameters & class members properly.
- Mark parameters and member functions as const appropriately if the function is not modifying a value.
- For this assignment, the output must match the example solutions exactly. The public provided test files are expected to match the provided output files exactly. The private test file will need to generate the expected output as well.
Hints
- Do not wait until the day before this is due to begin.
- The first step is to create the files and class function stubs to get the program to compile and run.
- The second step is to implement each function one at a time. Verify the function is correct before moving on to the next function.
- Do not just dive into the assignment. Create a mental plan of what tasks your program needs to accomplish. Convert this to pseudocode. Tackle the first task (eg, "can I open the file ok?") and conduct a sanity check. Then tackle the next task (eg, "can I read all the words in the file, and store the frequencies of each word?") and conduct another sanity check. We strongly suggest writing your program (one step at a time!)
- You may modify
main.cpp
orprocessor.cpp
to verify each step is working properly but you will not be submitting either of these files. Be sure your classes work with the expected provided files. - You may add additional functions to assist if you deem it necessary. A common task is determining how many digits are present in an integer.
Best Practices To Follow
· Code Style · Code Correctness · Code Structure · Dynamic Memory Management · Software Engineering Design Principles ·
Code Style
The following set of guidelines ensure all code in this class will be written in a similar and consistent manner, allowing any reader to understand the program's intent and contents.
- One clear and consistent coding style used (such as K&R, 1TBS, or Allman).
- Course naming scheme is followed for variable, function, class, and other identifiers. See the course style guide for more specifics.
- Code is self-documenting. Variables sensibly named, function names descriptive of their purpose.
Code Correctness
The following set of guidelines ensure all programs written in this class behave properly without side effects.
- Code compiles and links without any errors or warnings.
- Program runs without any run time errors. Exceptions are properly caught, user input is validated appropriately, and program exits successfully without error.
- Use
const
wherever possible:- If you declare a variable and that variable is never modified, that variable should be
const
. - If your function takes a parameter and does not modify that parameter, that parameter should be
const
. - If a member function does not modify the callee, that member function should be
const
. - If you are pointing at a value that does not change, the pointer should point at a constant value (e.g.
const T*
). - If the pointer itself is never modified, the pointer should be a constant pointer (e.g.
T* const
). - If the pointer itself is never modified AND the value pointed at does not change, the pointer should be a constant pointer AND the pointer should point at a constant value (e.g.
const T* const
).
- If you declare a variable and that variable is never modified, that variable should be
Code Structure
The following set of guidelines ensure all programs written in this class are done in an abstracted, modular, extendable, and flexible manner.
- Do not use global variables unless absolutely necessary. Instead, encapsulate them and design your interfaces effectively. If there is no way around using a global variable, be prepared to defend and justify its usage.
- Program flow uses structural blocks (conditionals/loops) effectively, appropriately, and efficiently.
- Keep your headers clean. Put the absolute minimum required in your headers for your interface to be used.
Anything that can go in a source file should. Do not
#include
any system headers in your .h files that are not absolutely required in that file specifically. Do not addusing namespace
in headers. - Use header guards correctly and appropriately.
- Place templated class and function definitions in a
*.hpp
file. - Place static class and function definitions in abstracted
*.h
and*.cpp
files. - Place each class and structure in their own files as appropriate based on their makeup.
Dynamic Memory Management
The following set of guidelines ensure all programs written in this class behave properly without side effects.
- Implement the Big-3 as appropriate.
- Do not leak memory. Every allocation using
new
needs to have a correspondingdelete
.
Software Engineering Design Principles
The following set of guidelines ensure all program components written in this class are done in an abstracted, modular, extendable, and flexible manner.
- Follow and apply the following design principles:
- Write Once, Use Many / Write Once, Read Many (WORM) / Don't Repeat Yourself (DRY): Use loops, functions, classes, and
const
as appropriate. - Encapsulate what varies: Use functions and classes as appropriate. Identify the aspects that vary and separate them from what stays the same.
- Favor composition over inheritance.
- Program to an interface, not an implementation & SOLID Principles: When using object-oriented inheritance & polymorphism, do the following:
- No variable should hold a reference to a concrete class.
- No class should derive from a concrete class.
- No method should override an implemented method from any of its base classes.
- Use appropriate inheritance access. Only expose necessary members to derived classes and/or publicly.
- Use
virtual
andoverride
as appropriate. Mark members asfinal
wherever possible and/or appropriate on derived classes.
- Write Once, Use Many / Write Once, Read Many (WORM) / Don't Repeat Yourself (DRY): Use loops, functions, classes, and
Grading Rubric
Your submission will be graded according to the following rubric.
Points | Requirement Description |
10 | All labs completed and submitted L3A, L3B |
30 | Each function input/output correct as specified and performs correct task meeting the functional requirements. |
+4 | A3 Extra Credit Completed. |
2 | Public input test files generate correct results. |
1 | Private input test file generates correct results. |
5 | Best practices are followed:
|
0 | Submission structured appropriately. Submissions structured improperly will receive deductions. |
48 | Total Points |
→This assignment is due by Friday, June 02, 2023, 11:59 PM.←
→ As with all assignments, this must be an individual effort and cannot be pair programmed. Any debugging assistance must follow the course collaboration policy and be cited in the comment header block for the assignment.←
→ Do not forget to complete the following labs with this set: L3A,
L3B
←
→ Do not forget to complete zyBooks Assignment 3 for this set.←
Submission
Always, always, ALWAYS update the header comments at the top of your main.cpp file. And if you ever get stuck, remember that there is LOTS of help available.
It is critical that you follow these steps when submitting homework.
If you do not follow these instructions, your assignment will receive a major deduction. Why all the fuss? Because we have several hundred of these assignments to grade, and we use computer tools to automate as much of the process as possible. If you deviate from these instructions, our grading tools will not work.
Submission Instructions
Here are step-by-step instructions for submitting your homework properly:
-
Make sure you have the appropriate comment header block at the top of every source code file for this set. The header
block should include the following information at a minimum.
Be sure to fill in the appropriate information, including:/* CSCI 200: Assignment 3: A3 - Green Eggs and Ham Classes
* * Author: XXXX (INSERT_NAME) * Resources used (Office Hours, Tutoring, Other Students, etc & in what capacity): * // list here any outside assistance you used/received while following the * // CS@Mines Collaboration Policy and the Mines Academic Code of Honor * * XXXXXXXX (MORE_COMPLETE_DESCRIPTION_HERE) */- Assignment number
- Assignment title
- Your name
- If you received any type of assistance (office hours - whose, tutoring - when), then list where/what/who gave you the assistance and describe the assistance received
- A description of the assignment task and what the code in this file accomplishes.
Additionally, update theMakefile
for A3 to generate a target executable namedA3
.
- File and folder names are extremely important in this process.
Please double-check carefully, to ensure things are named correctly.
- The top-level folder of your project must be named
Set3
- Inside
Set3
, create 3 sub-folders that are required for this Set. The name of each sub-folder is defined in that Set (e.g.L3A
,L3B
, andA3
). - Copy your files into the subdirectories of
Set3
(steps 2-3), zip thisSet3
folder (steps 4-5), and then submit the zipped file (steps 6-11) to Canvas. - For example, when you zip/submit
Set3
, there will be 3 sub-folders calledL3A
,L3B
, andA3
inside theSet3
folder, and each of these sub-folders will have the associated files.
- The top-level folder of your project must be named
- Using Windows Explorer (not to be confused with Internet Explorer), find the files
named
StringCounter.h, StringCounter.cpp, StringFilter.h, StringFilter.pp
.
STOP: Are you really sure you are viewing the correct assignment's folder? - Now, for A3, right click on
StringCounter.h, StringCounter.cpp, StringFilter.h, StringFilter.pp
to copy the files. Then, return to theSet3/A3
folder and right click to paste the files. In other words, put a copy of your homework'sStringCounter.h, StringCounter.cpp, StringFilter.h, StringFilter.pp
source code into theSet3/A3
folder.
Follow the same steps for each lab to put a copy of each lab's deliverable into theSet3/L3
folders. Do this process forSet3/L3A
(main.cpp, Makefile
),Set3/L3B
(string_functions.cpp
).
STOP: Are you sure yourSet3
folder now has all your code to submit?
The structure of the submission is as follows:- Set3/
- A3/
- StringCounter.h
- StringCounter.cpp
- StringFilter.h
- StringFilter.pp
- L3A/
- main.cpp
- Makefile
- L3B/
- string_functions.cpp
- A3/
*
only if present and appropriate to the implementation.
- Set3/
- Now, right-click on the
"Set3"
folder.- In the pop-up menu that opens, move the mouse
"Send to..."
and expand the sub-menu. - In the sub-menu that opens, select
"Compressed (zipped) folder"
.
STOP: Are you really sure you are zipping aSet3
folder with sub-folders that each contain amain.cpp
file in it?
- In the pop-up menu that opens, move the mouse
- After the previous step, you should now see a
"Set3.zip"
file.
- Now visit the Canvas page for this course
and click the
"Assignments"
button in the sidebar.
- Find Set3, click on it, find the
"Submit Assignment"
area, and then click the"Choose File"
button.
- Find the
"Set3.zip"
file created earlier and click the"Open"
button.
STOP: Are you really sure you are selecting the right homework assignment? Are you double-sure?
- WAIT! There's one more super-important step. Click on the blue
"Submit Assignment"
button to submit your homework.
- No, really, make sure you click the
"Submit Assignment"
button to actually submit your homework. Clicking the"Choose File"
button in the previous step kind of makes it feel like you're done, but you must click the Submit button as well! And you must allow the file time to upload before you turn off your computer!
- Canvas should say "Submitted!". Click "Submission Details" and you can download the zip file you just submitted. In other words, verify you submitted what you think you submitted!
In summary, you must zip the "Set3"
folder
and only the "Set3"
folder, this zip folder must have several sub-folders, you must name all these folders correctly, you must submit the correct zip file for this
homework, and you must click the "Submit Assignment"
button. Not doing these steps is like bringing your
homework to class but forgetting to hand it in. No concessions will be made for
incorrectly submitted work. If you incorrectly submit your homework, we will not be able to
give you full credit. And that makes us unhappy.
→This assignment is due by Friday, June 02, 2023, 11:59 PM.←
→ As with all assignments, this must be an individual effort and cannot be pair programmed. Any debugging assistance must follow the course collaboration policy and be cited in the comment header block for the assignment.←
→ Do not forget to complete the following labs with this set: L3A,
L3B
←
→ Do not forget to complete zyBooks Assignment 3 for this set.←