CSCI 261 - Programming Concepts (C++)

Summer 2018 - A6 - Sam I Am (Part I)

Quick Links: Canvas | CS @ Mines | Piazza | zyBooks

|   Home |  Contact |  Syllabus |  Assignments |  Schedule |  Resources   |
This assignment is due by Friday, June 08, 2018, 11:59 PM.



In this homework, we will focus on arrays, vectors, strings, structs, and File I/O.


Overview



Have you ever finished a book and wondered, "Geez, I wonder how many times each word occurs in this text?" No? This week's assignment illustrates a fundamental use of the array & vector: storing related values in a single data structure, and then using that data structure to reveal interesting facts about the data.

For this assignment, you will read in a text file containing the story Green Eggs and Ham. You will then need to count the number of occurrences of each word and display the frequencies. You'll be amazed at the results!


Program from what YOU KNOW



Did you know that most programs aren't well-designed? They really aren't. Often, they're put together by geographically diverse people with different styles, knowledge, programming ability and design taste. But, the programs work. It's a freaky miracle, but they do work.

When given a task in your engineering field that could be solved with software, we do not expect you to be expert programmers (unless your field is software engineering). However, we do expect you to be able to assemble the things you know how to code into cohesive, meaningful, useful programs.

You know how to open a file and read data from it. You know how to declare and work with arrays. You know that C++ can be your friend or your enemy. You know how to use a framework to draw some simple graphics. You know how to conduct repetitive tasks with loops. You know about data types.

Try to rely on these things that you know, before over-complicating your own solution with things that you don't yet know. This doesn't mean you shouldn't explore and learn more. You should! Always! Until your last dying breath! But, start with what you know first, and build from there.


ABCUF



Always Be Using Functions. Let that be a rule of thumb for the remainder of the semester. When working on your programs, try to reach an end result where main doesn't do much "low level stuff" but rather leverages functions in order to do what your program needs to do.

For example, if you were to write a pizza-making program, don't write one super-long, hard-to-read, scare-your-date-away implementation of main . Use functions:

int main() {
    CreateShoppingList();
    BuyIngredients();
    GatherIngredientsInKitchen();
    PourYourselfGlassOfFineChianti();
    MakeDough();
    MakaSauce();
    //...
    CookPizza();
    return 0;
}


The Specifics Part I: File I/O



You must first create a struct named WordCount to represent a word and the number of occurrences. It must have two members, a count stored as an int and a word stored as a string. Since this struct definition will be used in multiple files, you probably would want to create this in its own header file and then include this header in any other file that requires the use of the WordCount struct. (Hmm, this seems similar to the Yahtzee set up).

Next, you need to open the following file using an input file stream: greeneggsandham.txt. Read this file one word at a time. Every time you read a word in, before you do anything you must remove the special characters from the word (if present): . ? ! , ( ) - ; ' " _ :

We'll want to have created a vector of WordCount to store all of our words and their counts. Once you have stripped out each of the above characters, if this is the first time you are seeing a word then you need to insert it into your vector. Otherwise, if you have seen this word before then you will need to increment the count. (Hint: you'll need to use a searching algorithm on your vector)

After you have read all of the words in the file, you will then need to sort the vector alphabetically by words. (Hint: you'll need to use a sorting algorithm on your vector). Print out all the words and their counts using the following format (substituting the actual words and counts):

# 1 AWORD   :  3
# 2 WORDS2  : 14
...
#21 WORDS21 :  1
Most Frequent:  WORDS2  (14)
Least Frequent: WORDS21 ( 1)

Note how the data is aligned and words are alphabetical. Finally, print out the least frequent word with its count along with the most frequenct word and its count. To verify your output, the most frequent word is "I" with a count of 83. The least frequent word is "IF" with a count of 1. Additionally, there are 50 unique words. (Hint: you'll need to use the minMax algorithm)

We used a vector to store the words and in that context it was an appropriate choice. We are next interested in the frequency of all the letters that appear in the book. For this use case, an array is more appropriate (think about why). We want to be sure we are treating uppercase and lowercase letters the same in our counting. Once the counts for each letter have been stored in an array (what is the appropriate type and size of the array?), print out the letter and frequency in the following format (using 3 decimal places). Also, print the most and least frequent letter with their counts. In the event that two letters have the same frequency, report the letter that comes first alphabetically. To verify your program is running properly, E occurs 277 times (11.58%) and J occurs 0 times (0.0%).

A: 30.123%
B:  0.532%
C: 10.001%
...
Z:  5.330%
Most Frequent: Q (1000)
Least Frequent: E (0)

You will want to make your program as general as possible and not having any assumptions about the data hardcoded in. We will run your program against the greeneggsandham.txt input file. We will also run your program against a second secret input file to ensure your program is flexible and will work on any input file.


Functional Requirements



  • You may not make use of the standard library functions sort(), find(), any_of() or anything else from #include <algorithm>. You must implement your own sorting and searching functions.
  • Use functions. The function prototypes must be written in a separate header file. The function definitions must be defined in a separate implementation file. DO NOT use global variables. You must use parameters properly, either pass-by-value or pass-by-reference.


Hints



  • Do not wait until the day before this is due to begin.
  • As discussed above, we encourage you to Always Be Using Functions. However, there is nothing wrong with doing all of your steps inside main at first and then refactoring your work into functions later, once your program is working.
  • Do not just dive into the assignment. Create a mental plan of what tasks your program needs to accomplish. Convert this to pseudocode. Tackle the first task (eg, "can I open the file ok?") and conduct a sanity check. Then tackle the next task (eg, "can I read all the words in the file, and store the frequencies of each word?") and conduct another sanity check. We strongly suggest writing your program (one step at a time!)


Grading Rubric


Your submission will be graded according to the following rubric.

PointsRequirement Description
2 All code submitted properly.
4 All labs completed and submitted
2 Towers of Hanoi AutoGrader lab completed
4 Output matches for public Green Eggs and Ham test file & private hidden test file
4 Output format matches specifications from example
4 Words and letters are properly sorted ignoring case
4 Arrays, vectors, string, structs used appropriately
2 File I/O strutured appropriately
2 Functional requirements above met.
2 (1) Comments used
(2) Coding style followed
(3) Appropriate variable names, constants, and data types used
(4) Instructions followed
30 Total Points

This assignment is due by Friday, June 08, 2018, 11:59 PM.


Submission


Always, always, ALWAYS update the header comments at the top of your main.cpp file. And if you ever get stuck, remember that there is LOTS of help available. The following instructions are copied from How to Submit Homework.


It is critical that you follow these steps when submitting homework.

If you do not follow these instructions, your assignment will receive a major deduction. Why all the fuss? Because we have several hundred of these assignments to grade, and we use computer tools to automate as much of the process as possible. If you deviate from these instructions, our grading tools will not work. And that makes us very unhappy. And when we're unhappy, we give penalties. Thus, make us happy.


Submission Instructions



Here are step-by-step instructions for submitting your homework properly:
  1. File and folder names are extremely important in this process. Please double-check carefully, to ensure things are named correctly.
    1. The top-level folder of your project must be named Set6
    2. Inside Set6, create 3sub-folders that are required for this Set. The name of each sub-folder is defined in that Set (e.g.,L6A, L6B, and A6).
    3. Copy your program main.cpp and supporting files into the subdirectories of Set6 (steps 1-2), zip this Set6 folder (steps 3-4), and then submit the zipped file (steps 5-11) to Canvas.
    4. For example, when you zip/submit Set6, there will be 3 sub-folders called L6A, L6B, and A6 inside the Set6 folder, and each of these sub-folders will have a file called main.cpp and nothing else.

  2. Using Windows Explorer (not to be confused with Internet Explorer), find the file named "main.cpp" located inside the folder for the particular lab or homework assignment you will submit. For example, find main.cpp in your Z:\CSCI261\Set6\ folder. Repeat this for all supporting files.

    STOP: Are you really sure you are viewing the correct assignment's folder?

  3. Now, for A6, right click on the main.cpp to copy the file. Then, return to the Set6/A6 folder and right click to paste the file. In other words, put a copy of your homework's main.cpp source code and supporting files into the Set6/A6 folder.

    Follow the same steps for L6A, to put a copy of your lab's main.cpp and supporting files into the Set6/L6A folder. Repeat this process for Set6/L6B.

    STOP: Are you sure your Set6 folder now has all your code to submit?

  4. Now, right-click on the "Set6" folder.
    1. In the pop-up menu that opens, move the mouse "Send to..." and expand the sub-menu.
    2. In the sub-menu that opens, select "Compressed (zipped) folder".

    STOP: Are you really sure you are zipping a Set6 folder with sub-folders that each contain a main.cpp file in it?

  5. After the previous step, you should now see a "Set6.zip" file.

  6. Now visit the Canvas page for this course and click the "Assignments" button in the sidebar.

  7. Find Set6, click on it, find the "Attach file" area, and then click the "Browse My Computer" button.

  8. Find the "Set6.zip" file created earlier and click the "Open" button.

    STOP: Are you really sure you are selecting the right homework assignment? Are you double-sure?

  9. WAIT! There's one more super-important step. Click on the blue "Submit" button to submit your homework.

  10. No, really, make sure you click the "Submit" button to actually submit your homework. Clicking the "Open" button in the previous step kind of makes it feel like you're done, but you must click the Submit button as well! And you must allow the file time to upload before you turn off your computer!

  11. Canvas should say "This assignment is complete. Click OK to review the results.". Click "OK" and view the files within the zip file you submitted. In other words, verify you submitted what you think you submitted!
In summary, you must zip the "Set6" folder and only the "Set6" folder, this zip folder must have several sub-folders, you must name all these folders correctly, you must submit the correct zip file for this homework, and you must click the "Submit" button. Not doing these steps is like bringing your homework to class but forgetting to hand it in. No concessions will be made for incorrectly submitted work. If you incorrectly submit your homework, we will not be able to give you full credit. And that makes us unhappy.


This assignment is due by Friday, June 08, 2018, 11:59 PM.

Last Updated: 05/30/18 15:49


Valid HTML 4.01 Strict Valid CSS! Level Triple-A conformance, W3C WAI Web Content Accessibility Guidelines 2.0