Monday, 5 August 2013

Python - Crossword Solver

As I have mentioned in a previous blog, one of the first programs I wrote in Python was a simple word checker. You type in a word and it checks to see if the word exists in a dictionary. Once I could program that I was able to expand the program into a crossword solver.

One of the great things about programming is you can re-use code you have already written. Some of the code I wrote in my word checker program can be reused here. If you have not already done so, it might be worth you reading through my blog post covering the programming of the word checker, before progressing with this post.

Blog post of Word Checker.

Again this program is quite small, but I think it demonstrates the power of Python, that you can achieve a lot in a few lines of code.



The idea behind the Crossword Solver is you type a word with some letters missing, and the program works out all the possible words that would fit with the known letters and spaces.

So for instance if you had an incomplete word in a crossword and you knew the first letter was 'f', you did not know the second or third letters, but knew the fourth was an 's' and the fifth was 't'. i.e. you are trying to find all words that would fit with 'f??st'

The solution would be

faust
feast
feist
first
foist
frost

There would be 6 words which fit.

This blog is a tutorial teaching you how to create a program to solve this for you.

Before programming this you will need access to a dictionary that your program can access. If you have not already done so, please download the file DictionaryE.txt from the link below. This is a text file which contains a lot of words.

Click here to download DictionaryE.txt

Throughout this blog, where there could be confusion, I will show you what to type, and then show you a snippet of code showing the line you need to type and the preceding line. This should avoid any confusion about where you should be typing the code, and will show you the required level of indentation for each line.

Ok let's get started!

When writing a program I always roughly plan the program out first of all, and break it down into smaller sections. It always seems much less daunting when you think about each section individually, rather then the program as a whole. So for a crossword solver what do we need to be able to do?


  • Stage 1. We need to somehow input our incomplete word to let the program know what we are trying to solve.
  • Stage 2. We also need to be able to access a list of real words in a dictionary.
  • Stage 3. We will then have to compare our incomplete word with the words in the dictionary and determine if they are a potential match.
  • Stage 4. Finally we will need to print any words which match the criteria.


Now we have planned our program, we can now start to program it.

Stage 1


Remember earlier I said that we can re-use code we have already written? Well in our word checker program we had the same requirement to input a word, and recall that word later. So let's use the same code here.

In a new Python window type the following:

testWord = raw_input ('Please input a word to solve.\nUse Spaces to signify unkown letters: ').lower()

This stores our word into a variable called testWord.

The \n tells the program to print the next text on a new line.

The only difference between this line and the one in the word checker program is the fact I have changed the text explaining what input is required.

Stage 2


We need to be able to look at a list of real words. This is exactly the same as in the word checker program.

Therefore you can copy the getDictionary() function into the top of your program.

def getDictionary():
    dictionaryOpen = open('dictionaryE.txt','r')
    dictionary = dictionaryOpen.read().split()
    dictionaryOpen.close()
    return dictionary


Ok that completed Stage 2. We have done half our program, and not had to type any new code. You should re-use code wherever possible, as it saves you so much time.

Stage 3


In this stage we need to compare our incomplete word with all the words in the dictionary.

This is a little more tricky than the testWord function in the word checker program, so lets first think about how we will approach this.

We now have a list of real words in the dictionary and an incomplete word. We want to find all the potential solutions.

We can put together some rules which, if followed, would test a word from the dictionary and find potential match to our incomplete word.

Rule 1
We know that for a word in the dictionary to be a potential match with our incomplete word, they need to be the same length. So we will only look at words the same length and rule out all those which are different lengths.

Rule 2
As blanks are in effect wild cards and can be any letter we want to skip over any blanks in our incomplete word.

Rule 3
For any letters in our incomplete word we want to see if it matches with the respective letter from each word in the dictionary. i.e. We want to make sure the first letter in our word matches the first letter of the word in the dictionary. We want to do the same for the second letter and the third letter and so on. We want to be able to know if all the non blank letters from our incomplete word match the respective letters for the word from the dictionary.

So here are our three rules in summary.

  • Check words are the same length.
  • Ignore any blanks.
  • Compare the 1st letter of the incomplete word with the 1st from the word in the dictionary. Then the second etc. Check they all match.


Are these three rules enough to solve the puzzle?

Lets look at an example, and we will use the example of our incomplete word being

'f  st' 

with two blanks after the f.

Lets compare this with the word apple using our rules above.

They are both the same length, so rule 1 is satisfied.

We can manually skip over each letter and compare letter 1 from 'f  st' to 'apple'
this would compare
f --> a
skip the next two letters as they are blanks (Rule 2)
s --> l
t --> e

To satisfy Rule 3 all the non blank letters would have to match the respective letters in the word from the dictionary. In this case 0 out of the three non blanks matched. This is not a potential match.

What about if we compared 'f  st' to 'first'

f --> f
skip the next two letters as they are blanks. (Rule 2)
s --> s
t --> t

We count those that match, which is 3 in this case, and compare this to the number of none blank letters which is 3. Therefore we can say this is a potential match.

Im fairly confident that these three rules should satisfy what we need, so I will start to program a function to carry out this comparison.

Let's start with creating our function. Underneath the getDictionary function type the following.

def checkWord (testWord,dictionary):

def checkWord (testWord,dictionary):

We will call our function checkWord and will pass both our testWord variable, and the dictionary into this function so we can use both of these in a comparison.

We know from our rules that in order to be a potential match all of the non blank letters have to match with the word from the dictionary. Therefore we will need to determine the number of non blank letters in our test word. This can be done before we look at any words in a dictionary, so lets look at this now.

Before typing any more into our program, we shall look at two useful python commands we will use to work out the number of none blanks.

The first is len()

len(testWord)

tells us how long testWord is.

So that's the total length of the word including blanks.

We can use a second command on testWord to count the number of blanks.

This is:

testWord.count(' ')

So if there are two blanks, this will produce a number 2.

So if we know how many letters are in the word, and how many are blanks, we can deduce the number which are not blanks.

Therefore if we create a variable called nonBlanks, we should easily be able to work out the number of nonBlank characters in our word. Therefore type this equation into your function.

    nonBlanks = len(testWord)-testWord.count(' ')


def checkWord (testWord,dictionary):
    
    nonBlanks = len(testWord)-testWord.count(' ')


For the rest of the function we need to check through each word in the dictionary and compare them to our incomplete word.

To iterate through each word in our dictionary one at a time we can simply say:

    for word in dictionary:

    nonBlanks = len(testWord)-testWord.count(' ')
    for word in dictionary:

Everything inside of this for loop will be carried out on each individual word.

As we check each word there are a couple of things we should keep track of.


  • We need to know which letter in both our word and the word from the dictionary we are comparing.
  • We want to be able to keep a count of any non blank letters which match.


So lets create two variables which we can increase as we do our counting.

        incLetter = 0
        incMatch = 0

    for word in dictionary:
        incLetter = 0
        incMatch = 0

incLetter will increase as we check each letter. This will keep track of what letter we are looking at.
incMatch will increase as we match a letter.

With these being inside the for loop they will reset to 0 each time a new word is checked. Just what we want.

The next step is to start narrowing down potential options. The best way to do this is to discard any words which are not the same length as our potential word, which is Rule 1.

We can use the len() function we used earlier to help with this.

    if len(word) == len(testWord):

        incMatch = 0
        if len (word) == len (testWord):

So we will only progress further if the words match in length.

now lets start another for loop to allow us to check each individual letter.

        for letter in testWord:

        if len (word) == len (testWord):
            for letter in testWord:

This will go through our testWord one letter at a time.

There are three options now.


  • If the letter is a blank, we want to ignore it, but we want to ensure we move onto the next letter in our word which covers Rule 2
  • If it is not a blank, we want to check this letter matches against the respective letter from the word in the dictionary. Increasing incMatch if the words match. We also want to then move onto the next letter in the word. This helps with Rule 3
  • If it doesnt match we just want to move onto the next letter. Again this helps with Rule 3.


The first part is to check if it is a blank and if so move onto the next letter.

if letter == ' ':
   incLetter += 1

            for letter in testWord:
                if letter == ' ':
                    incLetter += 1

So we say if the letter is equal to a blank, increase incLetter one.

incLetter += 1 is a short way to say incLetter = incLetter + 1
or increase incLetter by 1.

 The second option if the letter is not a blank should read

elif letter == word[incLetter]
   incLetter += 1
      incMatch += 1

                    incLetter += 1
                elif letter == word[incLetter]:
                    incLetter +=1
                    incMatch +=1

What is word[incLetter] saying? Well why not try this in a Python shell? A great thing with Python is you can easily test things out in the shell as you go along. Click on Window and then Python Shell to load a shell window.

Create a variable called word and store 'apple' in there.

word = 'apple'



We can then find out what each individial letter in the word is by typing the following:

word[0] returns 'a'
word[1] returns 'p'
word[2] returns 'p'
word[3] returns 'l'
word[4] returns 'e'




as incLetter = 0 we know we are looking at the first letter. If incLetter was 1 we would be looking at the second letter and so on. So its a way of moving through the word and keeing track of where we are up to.

If the respective letters from both words match we will increase incLetter to move onto the next letter, and we will also increase incMatch to say the letters match.

The third option is if the words are not a blank, and do not match. We dont want to do anything here other than move onto the next letter.

else:
   incLetter += 1


                    incMatch +=1
                else:
                    incLetter += 1

How is that looking? Well to me it looks as though we are repeating ourselves several times. Every option we increase incLetter, and only do something different, increase incMatch, if the letters match.

Lets rewrite that section from 'for letter in testWord:' to make it shorter.

for letter in testWord:
    if letter == word[incLetter]:
incMatch +=1
    incLetter += 1

            for letter in testWord:
                if letter == word[incLetter]:
                    incMatch += 1
                incLetter += 1

So this says if the letters match, increase incMatch but no matter what we need to increase incLetter so we start looking at the next letter.

I think that has tidied up our program a little. Often when you see the code on the page it becomes obvious that things can be improved. Its not a bad thing. One thing I would say is make sure whatever you do you will understand it when you look at it in a few weeks,  months or even years time. If you use a few extra lines, but it makes your code more readable, then leave the code longer.

Now we have been through each letter in the word and compared them we need to report back if the word matches or not. This takes us onto the last stage.

Stage 4


We said at the beginning we would print any matching words onto the screen, so lets just do that.

Our plan was that for words of the same length, if the number of matching letters equalled the number of non blank letters then we have a potential solution.  As we have already determined if the words are the same length we can say.

            if incMatch == nonBlanks:
                print (word)

                incLetter += 1
            if incMatch == nonBlanks:
                print (word)
We will then move onto the next word in the dictionary. This is already taken care for us in the for loop.

Once the loop has been through every word in the dictionary we shall exit out of the function with a simple return.

return

                print (word)
    return

So far we have done the following:


  • We have asked the user to input an incomplete word they want to solve.
  • We have a function to create a dictionary.
  • We have a function which compares a word against each word in the dictionary and reports any potential solutions back.


All we need now is to tie the whole thing together.

Underneath the user input lets get the dictionary and assign it to a variable we can pass into our function.

dictionary = getDictionary()

Finally we can run our testWord and the dictionary through the checkWord function.

checkWord (testWord, dictionary)

dictionary = getDictionary()
checkWord (testWord,dictionary)

Pressing F5 will now save and run your program.


Typing in 'f  st' when asked should result in the following:


If you would like the to download the source code you can from the following link.

Crossword Solver Source Code

There we go, that should help you solve any crosswords. We have seen a few new concepts today, and it may seem a little daunting at first. The key is to break everything down into its most basic element.

If you are struggling to understand, just read through the code one line at a time, and see if you can work out what it is doing.

No comments:

Post a Comment