Dictionaries

You're going to the write some functions in the same program file, and submit the file itself to the homework server and submit the answers that they calculate to the Comments-to-Teacher.


Make sure that you have read and understood the little tutorial on dictionaries.

Suppose you have a list of random numbers:
L = [2, 9, 8, 6, 6, 9, 19, 8, 2, 5, 11, 6, 4, 9, 3, 6, 2, 13, 7, 6, 6, 7, 4, 5, 4, 13, 3, 6, 2, 13]

Suppose you want to create a dictionary D whose key is a number that shows up in the list L, and whose value is how many times that number appears.  For instance, D[9] would be 3 because 9 appears 3 times in L.

Here's a way NOT to do it:

D = {}
for anum in L:
   D[anum] = L.count(anum)

While this will give you the correct answer, it is very wasteful of the computer's time, if you cared.  Suppose the list consisted of 100 sixes: [6, 6, 6, 6, etc. ... 6].  The routine above would go to the first 6, and count the number of sixes, then go to the next 6 and again, go through the entire list counting sixes, and so on.  The routine goes through the entire list 100 times.  Not efficient.


Problem 1: Create the function MinMaxList(filename) that will read a file which consists of an integer on each line, and will go through that those lines and process each number ONLY ONCE and calculate which is the most frequent number in the list and also tell you how many times it occurs.  It will also tell you the same result for the least frequently found number that's in the list and its frequency.



The output for your function above should look like:

Max number 6 appears 7 times
Min number 11 appears 1 time

Here is the file you'll be working on:  randomNums.txt

Write the 2 resultant lines to the Comments-to-Teacher


Problem 2:

We'll be working on Shakepeare's Hamlet.  You'll find it here: https://www.gutenberg.org/ebooks/1524  Download the "Plain text UTF-8" version into a file that you'll later read. 

Now we'll look at the datafile, and clean it of everything that doesn't belong there for our analysis...

Get rid of everything before the actual play starts: It starts at...

THE TRAGEDY OF HAMLET, PRINCE OF DENMARK

And get rid of everything after the play ends.  It ends just before:

End of the Project Gutenberg EBook of Hamlet,

Now, we are going to be asking which characters (including space and punctuation) are the most common and the least common in the file.  An upper-case character and a lower-case character should be counted as the same character, so an "E" is the same as "e".  Also, there may be some weird characters in the file, so ignore any character whose ASCII code is larger than 127.

Use your programming skills to...

Find the 3 most frequent characters and how many times they occur and the 3 least frequent characters and how many times they occur .  You can get the answers from your program in any form that it gives them to you, and then just write them in an understandable way in the Comments-to-Teacher.


Problem 3 (Challenge):

What are the 5 most frequent words in Hamlet and how many of each are there in the play?

We'll define a "word" as consisting of an unbroken sequence of alphabetic characters.  That means that "hamlet" is a word, but "ham.lettuce" is not, and neither is "Hamlet!!" That means that any non-letter can signal the end of a word, not just the space character.  Also, use case-independence when counting duplicates, in other words, "HAMLET" and "hamlet" are the same word.

Write your answers to the Comments-to-Teacher.