Preparation: Crossword Creator in Python

I finally have found a programming problem again that kept me occupied for more than a day simply because I could not do it. But let’s start from the beginning. I wrote a “Search the Words”-Game recently (like this one here) and it worked pretty well actually. There were some limitations to it, but it gave me a good result 9 times out of 10 (I may even post the code for it).

I then figured: “Hey actually, the code I used in this program, I can easily adapt to write a CrossWord Creator!” I have tried this and I unfortunately got nowhere 😦 The main problem here is, that I used a fairly brute force approach in my “Search the Words”-script. In a crossword, there are many more constraints which leads to many, many dead ends. Thus either, the code does not yield a satisfactory result or it takes up all memory for a very long… and in the end does not yield a satisfactory result 😉 The problem here was twofold. My code was not optimized for speed (many unnecessary loops) and in most cases it was leading towards a dead end.

With this in mind, I looked a little bit into constraint logic programming, the branch that deals with these problems. Actually, crosswords are some of the more widely used constraint logic problems (alongside Sudoku and probably other Puzzles). I found a Python implementation by Bryan Helmig, as well, but I wanted to do it myself, so I only read through his approach and not his code.

I will now for myself, outline the approach I will try and possible implementations. If you, the reader, have anything to add, please feel free to chime in. Every constructive input is welcome. The only thing you need to remember is, that this is a hobby of mine and I have never ever formally studied computing science or software engineering, therefore sentence like “Just use xyz algorithm to implement concept abc…” do unfortunately not help me in the slightest 😉

So here is my analysis. The problem can be easily broken down into several steps.

  1. Order the list of words to be used in a useful way
  2. Place the words in a grid according to specific rules
  3. Check the quality of the crossword (if it yielded a useful result)
  4. Remember the result and repeat the first 3 steps X times to yield the best possible result.

This is a very general approach, but I will try to move within these confines and sum up my thoughts on each. How to optimise each section and get a meaningful result.

General Definition of Constraints

Before I even discuss the steps, it is first important to assess the constraints that we have for this problem. To this end, I will use the crosswords, that are common in German newspapers. I know there are international differences, but these are the ones I grew up with so I will use those.

  1. Except the first word, each successive word must be placed with one or more letter overlaps with another word already placed on the grid.
  2. Apart from at the overlaps, letters are not allowed to be in parallel cells of the grid (unless they are part of the same word).
  3. A word placement may not be longer than the boundaries of the grid.
  4. The two words of an overlap must be at right angles to each other, e.g. the words ‘creator’ and ‘random’ may not both be horizontal, otherwise you will see a grid with something like this CREATORANDOM

These are the constraints that I can think of and that I have to work with. Now onto the discussion of each point.

1. Order the list of words

This step is rather easy. I will actually follow Bryan Helmig’s assumption here that the longer the word, the harder it will be to place it. Therefore, putting the words in a list ordered by length of word could be an option and then placing it accordingly. If a word cannot be placed on the grid, because there is no good overlap, we skip it and try the next. Then we revisit the same word again.

It can also be envisioned to place the words in a dictionary, with the keys being the length of the words and the values a list with all words of each length. If this would give more comprehensive code or a faster lookup will have to be tested. In this case, it is foreseeable that we can have more variation, simply by being able to choose at random from a list of words of equal length. Something that may not be as easy when working with an order list.

2. Place the words in the grid according to specific rules

This algorithm will be the most crucial part and the one that will need most optimising. I need to subdivide this section into two parts.

  1. The search for a suitable space
  2. The actual placement of the word

The search for a suitable place

I have been pondering about this step for quite some time. Before, I was just brute-forcing it by going through the list of letters already placed and finding a match with a letter in the word to be placed. Unfortunately, this would always lead to the same options tested first and thus can lead into the same dead end over and over again. Additionally, you would always start at the beginning and go through all the letters. Again and again. That is not very efficient and will lead to long waiting times in calculations.

I came up with an idea that I will try for this approach. When placing a word, I will save the coordinates of each letter in a grid in a hash (python dictionary), so that it will look something like this at some point:

letter_coords = {
'a': [(1,0), (3,6)],
't': [(9,0), (1,4), (5,7)],

This format will avoid that we have to go through all letters in the grid again and again and again. When we want to place a word, we just look for the letters in the word and check the coordinates given from this dictionary. The speed advantage will most likely allow us, to go through all possible positions, to find the “best” position to place the word (what the best is, we need to discuss in the next section).


An addition to this format is to also note down in which direction they still free to (left to right or top to bottom). Secondly, once both direction are used up for a letter, it can be removed from the list.

The actual placement of the word

I have just discussed how to find in a quick manner possible positions of overlap. The actual placement of the word requires also a little thinking. In a random placement, you can imagine that it will be easy at first, but towards the end, you will run out of suitable options very quickly. Therefore, I need to think of a smart way of placing the words.

The first thought that came to my mind is to give a score for each placement according to several factors. Here are my current thoughts:

  • Start at 1
  • multiply with 0 (to get 0) when an undesirable situation is found
    • letter is parallel to a letter not belonging to the same word
    • word is too long
  • If score is ever 0, discard and go to next
  • depending on how much “space” (free cells) around the letters add a score
    • One free cell on either side +1
    • two on one side, one on the other + 2 etc.
    • +X for each additional overlap
  • Choose the word with the highest score, if more than one, choose at random.

Based on this, I should get a reasonably good placement of the word and generally good variability based on the first two major steps.

3. Check the ‘quality’ of the crossword

Here, I will try to recycle an idea from before. I will try to assign a score to each crossword. The best factor here would be overlaps. Another one would be how many of the words were placed. That is right. Maybe there is a word that just cannot be placed. So rather than aborting the entire mission, give the crossword with all the words that were possible to be placed. The more words in, the better, of course. The score needs to be carefully designed, however, so that there is no chance that fewer words with more overlaps is worth more than all words placed with minimum overlaps.

4. Repeat and choose the “best” crossword

This step is self-explanatory. Based on the score of the previous step, you choose after x repeats, the best crossword.

So, based on the above waffling, I will try to write this crossword generator script. If I manage it and it looks good, I will share the code on here. If you have any additions or improvements. Feel free to mention them here, as well :).