Tuesday, 21 October 2014

Infinite Monkey Theorem using Python

They say that if you sat an infinite number of monkeys with an infinite number of typewriters, then they would write the complete works of Shakespeare. This is known as the Infinite Monkey Theorem. I always feel it would be nice to try the actual experiment out. I mean all these monkeys with typewriters would be kinda cool.



However wouldn't it be easier to get one monkey and one typewriter and give him an infinite amount of time to see how he goes?

Although in theory it sounds simpler, getting hold of even a single monkey is not that easy.

But what if we created a simulated monkey in Python? Now we are talking! That would be easy, and unlike a real monkey our simulated one could work all day and all night. No food required.

Sounds like a plan for a blog post.

However I want to give our virtual monkey a chance. If he managed to type the complete works of Shakespeare without worrying about punctuation, we would be happy right? Same goes for capital letters, let's just ignore them for now. We want to give him a fighting chance of completing his task.

To help you get started I have taken a text file of the complete works of Shakespeare, and removed all punctuation and made all letters small. If any of you are thinking I did that by hand, think again. Python is your friend for automating tasks like that.

Because the complete works is a large tome, I have also included a similar file, with just Hamlet. You can choose which one you want your monkey to attempt!

As with all my blog posts I will give you the complete program first of all. Have a read through this and see how much of it you understand. Try and figure out those sections you don’t. I will then go through the program a line at a time explaining it in more detail.

import random
import string

def typeShakespeare(scriptRead):
    monkeyTyped = ''
    while monkeyTyped == '' or scriptRead.count(monkeyTyped) >= 1:
        monkeyTyped = monkeyTyped + (random.choice(string.ascii_lowercase + ' '))
            
    else:
        return (monkeyTyped[:-1])

script = open('Shakespeare.txt','r') # opens the files
scriptRead = script.read()
script.close()

keyPresses = 0
monkeyTyped = ''
while len(monkeyTyped) != len(scriptRead): ## infinite loop ... unless the monkey does it!

    monkeyTyped = typeShakespeare(scriptRead)
    keyPresses = keyPresses + len(monkeyTyped) + 1
    
    if len(monkeyTyped) >=5:
        monkeyTyped = "'"+monkeyTyped+"'"
        print monkeyTyped, keyPresses
    if keyPresses%10000==0:
        print keyPresses

The first thing we do is import a couple of libraries we will use. We want our monkey to type some random letters, so we will call on the random library. We will also be needing strings (many letters joined together) so we should import the string library as well.

import random
import string

There is now a function called typeShakespeare(). It will make sense to explain this when we call it, so lets skip it for now.

The aim of the program is to work out if the monkey has managed to type the complete works of Shakespeare, or Hamlet, if thats what you prefer. To do this we need to judge his effort against the finished script. Therefore we will open the file with the completed text.

script = open('Shakespeare.txt','r') # opens the files

Change this to read Hamlet.txt if you want to use Hamlet instead.

We then store the text into a variable called scriptRead. To do this we use the .read() command on the script we have opened. Finally as we have the data stored in memory we will close the script.

scriptRead = script.read()
script.close()

I have now created a variable called keyPresses and set this to 0. There are two reasons for this. When he has typed a word worth reporting I want to know how many key presses it took before he got there. I also want to keep an eye on my monkey, to make sure he is working. Therefore every time he has pressed a key a set number of times I will report back.

keyPresses = 0

We also need to know what the monkey has typed and to keep an eye on that. Every-time he types a letter we will check he is on the right path. If he isn't, we will get him to start from scratch. We will create a blank string to store his effort in.

monkeyTyped = ''

Now we enter into our main loop which is a while loop.

We want the monkey to keep typing until he has finished the job. If he hasn't then we will make him keep going. The first line of our while loop checks to see if what is stored in monkeyTyped is as long as the file we are looking into him typing. As we are checking the monkey is on the right track every letter, should he type something the length of the complete works of Shakespeare, then we will know he has achieved his goal.

while len(monkeyTyped) != len(scriptRead): ## infinite loop ... unless the monkey does it!

Now we call the function typeShakespeare we skipped over earlier and we store the result into monkeyTyped.

    monkeyTyped = typeShakespeare(scriptRead)

I think it would be a good time to now go and write that function. It is in this function that the monkey does most of his work. He tries to type the complete works of Shakespeare. We will check his work every time he presses a key. If he is on the right track, we will let him continue. If what he has typed is not in Shakespeare, we will return his best efforts to our main program.

So directly below import string we should type

def typeShakespeare(scriptRead):

This defines our function and tells us we are passing scriptRead into it.

As I mentioned we are going to be in this function while the monkey types something. The first thing we need to do is to set monkeyTyped to = ''

    monkeyTyped = ''

Now we enter a new while loop. This one needs to keep looping while the monkey has typed something which appears in Shakespeare.

    while monkeyTyped == '' or scriptRead.count(monkeyTyped) >= 1:

We do this by using scriptRead.count(monkeyTyped) >= 1 What does that even mean? Well we take the complete works of Shakespeare stored in scriptRead, and we do a count of how many times the thing in the brackets appears. If we put monkeyTyped into the brackets then it will count how many times monkeyCount appeared in scriptRead. We only need it to appear once but if it appears more than once, then that is ok as well. We check this with the greater than or equal to symbol >=.

But the line also has monkeyTyped = ''. Why is that? Well when we start off this while loop the monkey has not typed anything, as we have set the variable to = '' , which means an empty string. This would mean our while loop would be false straight away, and our monkey would never start typing.

To get around this we us an or condition. We check that monkeyTyped is '' or scriptRead.count(monkeyTyped) >= 1

The next line is the line which does the actual typing.

monkeyTyped = monkeyTyped + (random.choice(string.ascii_lowercase + ' '))

We take monkeyTyped and we add the result of (random.choice(string.ascii_lowercase + ' ')) onto it and save the result as monkeyTyped.

What on earth is (random.choice(string.ascii_lowercase + ' ')) ?

Well we want to simulate the monkey typing on a keyboard. We have already said to make it easy we will not ask for punctuation, although we should ask him for a space between words. Thats not too much to ask is it? We also said we will not be too concerned about it being in capital letters.

So random.choice picks something from inside the brackets by random selection. So what have we put in the brackets. The first thing is

string.ascii_lowercase

This creates a string of the lower case ascii letters. i.e

abcdefghijklmnopqrstuvwxyz

We also add onto the end of that a space using ' '

Now remember this has a space between the two speech-marks. This is different from when we are creating a blank string, which doesn't have a space.

Once he has typed a word, the loop will start again and check if what has been typed is in the complete works. If it is he will continue. If not he will go to the else statement.

    else:
        return (monkeyTyped[:-1]

Here we return what he has typed. However remember we only get to this place if what is typed is not in the complete works. That's no use to us! We will need to remove the last letter of what he typed, as it was that last letter we know to have made perfect Shakespeare prose into gibberish.

So we return monkeyTyped[:-1] which returns everything up to the last letter but not the last letter.

Ok back to where we were in the main program.

Remember we said we would keep track of the number of key presses? Well lets update that now. We know the monkey has just typed a word, so it is easy to determine the size of that word and increase keyPresses by that amount. Oh but the monkey typed a letter which turned his prose into gibberish. We should add that on as well as we want our count to be accurate!

    keyPresses = keyPresses + len(monkeyTyped) + 1

Whats the point of doing something if no one knows about it? Mmmmm... this blog post is perhaps not the best place to get into a philosophical debate. We want our monkey to report back if he has typed some Shakespeare. However we don't want him reporting back if he has just typed one letter do we?  No one likes a show off!

We only want to be informed if the length of what he has typed is greater than 5, although you can vary this depending on how much chat you want with your monkey.

    if len(monkeyTyped) >=5:

As we are reporting the monkeys best effort in that round of typing, and his efforts are going to be set back to the beginning, it doesn't matter if we play around with our monkeyTyped variable to help make his reporting a little nicer. Lets first of all add speech marks onto his typing, so we can see what he has typed.

        monkeyTyped = "'"+monkeyTyped+"'"

Pay careful attention to those speech marks in there. Each side is a single quote ' surrounded by double quotes "

Now we will print what the monkey has typed, and the number of keyPresses he has made in total so far.

        print monkeyTyped, keyPresses

Finally if you are only checking for big words the monkey may be away working some time before he gets to report back. I think its wise to check in on him every so often.

    if keyPresses%10000==0:
        print keyPresses

This line checks to see if the number of key presses he has made is a multiple of 10,000. The % checks if keyPresses / 10,000 has no remainder i.e. it is a multiple of 10,000. If it is then it will print the number of key-presses made. This only works if the key-presses are an exact multiple of 10,000. There will be times when the monkey is in the middle of a word when this happen, so it will not report back. However it reports enough to make you realise the monkey is still working and not fallen asleep.

Remember to press F5 to save and run your program.

That is the end of the program. All you need now it to download the complete works of Shakespeare or just Hamlet if thats what you want to use. Use the links below to do that.

Download complete works of Shakespeare

Download Hamlet

Well I hope you have enjoyed this simple program, and that it helped you learn a little bit of Python! Good luck to all your Monkeys typing away, let me know how they get on!

I have created a version of this program which tweets the monkeys' progress. Check out the twitter feed @TheMonkeyBard.