{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Python Review Exercises ##\n",
"\n",
"As usual, I'm recommending this French summary page (http://rgruet.free.fr/PQR27/PQR2.7.html) of Python built-in data-types and their primary methods. It's excellent despite the fact that it hasn't been updated for Python 3.x\n",
"\n",
"Also, you need to have covered the Python Review material before doing these exercises. That refers to the notebooks: **PythonReview-1.ipynb** and **PythonReview-2.ipynb** and the links found therein.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ex. 1 ##\n",
"This is a quickie exercise of some of the Python built-in functions and features you should be familiar with.\n",
"\n",
"Figure out or remember the built-in functions *chr()* and *ord()* (you can look them up on the webpage linked above).\n",
"\n",
"Create a one-line function that prints out the lower-case alphabet (\"abcdefghijklmnopqrstuvwxyz\"), using *range()* and a list-comprehension. Or multiple lines if collapsing to one line is not your cup of tea."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def Ex1():\n",
" # Your excellent answer..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ex. 2: the Baseball Card Collection Problem ##\n",
"There's a wonderful old math problem called the Baseball Card Collection Problem. \n",
"\n",
"In the Olde Dayes, a lot of kids collected baseball cards -- each card showing a picture of a baseball player along with his baseball stats. As a kid, you'd go to the corner store, plunk down a couple of cents and get a pack of 3 random baseball cards plus bubble gum. By the end of the summer, dedicated to chewing vast amount of gum, you'd hope to collect all the different baseball cards in a set.\n",
"\n",
"Naturally, since we're hostile to gum, we're going to turn this into a math (optional) and CS (required) problem. Actually, into a sequence of problems. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Version 2a: 9 baseball cards ###\n",
"Suppose you're just interested in seeing every card, at least once, for the 9 Mets players who lost (won) the game last week. Your dad, a creative labor organizer, happens to have those very nine cards in his pocket. At the end of each day when you've finished your household chores, he will pick a card, at random, from his pocket and show it to you, and then put it back into his pocket. You can keep track of the cards you've seen. \n",
"\n",
"#### Problem: ####\n",
"How many days of labor, on the average, will it take you to see all 9 cards? It may take only 9 days, though that's extremely unlikely, since most days, he'll be pulling out a card that you'd have already seen. \n",
"\n",
"The CS way of approaching this problem is to say, how can we use our easy access to heavy computing facilities instead of trying to think too deeply about the math. So, let's suppose we model this process by repeatedly asking the machine for a random integer between 1 and 9 (one \"ask\"), keeping track of which have come up, and stopping when all 9 numbers have been produced. We call this whole process a \"trial\" and note the number of \"asks\" it took to see the whole set. What we're interested in is the number of \"asks\" per \"trial\", but every trial is likely to be different, and so we'll do many, many trials and then average the number of \"asks\" per trial, and that will be our answer.\n",
"\n",
"You'll probably want to take a look at the *random.randint()* function. I'd recommend at least 1000 trials to get a good stable answer. \n",
"\n",
"**BTW, just so that we're on the same page, the answer I got for 9 cards is: approx. 25.7**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# estimate for 9 cards:\n",
"def BaseBallEstimate(ntrials):\n",
" # your code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Version 2b: Math (optional) ###\n",
"**Yes there is a math solution for this problem.** Do try to find it, ***but don't get hung up on it.*** It's more important to solve the CS problems below this one, so feel free to skip the attempt to solve this problem mathematically and go on (though I know some of you will take the dare).\n",
"\n",
"There is an exact math solution, and for a large number of baseball cards (say 1,000), there's also a nice approximation.\n",
"\n",
"**BTW, just so you can check your math, the exact answer I get for 9 cards is 7129/280 or approx. 25.5**\n",
"\n",
"And because it's a non-trivial math problem to derive the exact solution and yet another problem to derive the approximation for larger numbers of baseball cards, I'll give a hint (don't read it if you don't want):\n",
"\n",
"?drac ,tnereffid ,txen eht ees ot ekat ti lliw seirt ynam woh\n",
" ,egareva eht nO .sdrac N fo K nees evah ydaerla uoy taht emussA :tniH\n",
" \n",
"**BTW, the mathematical approximation for 1,000 baseball cards is 7,484.** (thanks to Zeynep for finding that I had forgotten to correct my initial result with Euler-Mascheroni constant).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Using the mathematical solution to calculate the exact answer for 1,000 cards:\n",
"def MathSolution(ncards):\n",
" # ..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Version 2c: Lots of baseball cards ###\n",
"Let's scale the problem up. Let's say that you have not 9 cards, but 100,000. Will your code still provide the right answer? Try it...\n",
"\n",
"The likelihood is that you're not going to wait while your code tries to compute 1,000 trials of 100,000 cards each, in order to get an average. \n",
"\n",
"So try only one trial of 100,000 cards. Can your code calculate this in, say, under 15 seconds of compute time? How about just one trial of just 10,000 cards? If this takes too long (longer than 15 seconds), then it's time to rethink your algorithm.\n",
"\n",
"So, if you need to, rethink your algorithm for large numbers. And your code should calculate the number of \"asks\" for one trial of *ncards* but also the time to compute it. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# a better way for large numbers of cards\n",
"def Faster(ncards):\n",
" import time.\n",
" start = time.time()\n",
" \n",
" # your code here\n",
" \n",
" elapsed = time.time() - start\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Version 2d: Removing duplicates ###\n",
"To a large extent, your job so far has been to detect and ignore duplicates as new cards are randomly selected from a fixed set. \n",
"\n",
"Let's take that, now, as our primary task. Suppose you are fed a stream of data, say a string at a time, and are asked, periodically, how many unique strings you've seen so far. If the number if small, say, under 1,000, this is not a big problem. But suppose the number is big, like in the hundreds of thousands or millions? You really do not have the luxury of simply storing all the strings, sorting them and eliminating duplicates and counting the uniques ones that are left.\n",
"\n",
"To test various methods of doing this, I've created a small function below to create a random alphabetic string of a requested number of characters in the cell below.\n",
"\n",
"In the cell below that, exercise that function to create, say, 100,000 strings of length 4, and add your code to compute the number of unique ones that you've seen.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"from random import randint\n",
"alpha = 'abcdefghijklmnopqrstuvwxyz'\n",
"def random_string(nchars):\n",
" answer = ''\n",
" for i in range(nchars):\n",
" answer += alpha[randint(0,25)]\n",
" return answer\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create 100,000 4-char random strings and compute the number of unique ones that you see\n",
"for i in range(10000): \n",
" astring=random_string(4)\n",
" # your code here\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}