Writing a basic Vocabulary Trainer in 30 minutes with Python

(Original post from 31. May 2012)

Below is a vocabulary trainer for my effort of improving my French. It took me about 30 minutes to write and another 30 minutes with some good music on the headphones to write the relevant XML structure. I will probably try to expand this list as I read books and articles etc.

To start, you just need to call it from command line with the correct parameters. For example to test general vocabulary with English as the given language and French as the one to be tested you would need to type in (this is for linux):

./main.py general en fr

#!usr/bin/env python
# -*- coding: iso-8859-1 -*-

import random
import string
import sys

from lxml import etree

vocab = {'chem': 'chemvocab.xml',
         'general': 'general.xml'}

def load_xml2dict(fname, lang1, lang2):
    """Load xml data from file into dict depending on flag"""

    d = {}
    with open(fname) as xml:
        tree = etree.parse(xml)
    root = tree.getroot()

    for elem in root:
        l1 = []
        l2 = []
        for child in elem:
            if child.tag == lang1:
                l1.append(child.text)
            elif child.tag == lang2:
                l2.append(child.text)
        d[tuple(l1)] = l2
    return d


def main(fname, lang1, lang2):
    d = load_xml2dict(fname, lang1, lang2)

    while True:
        choice = random.choice(d.keys())
        print '[%s]: %s' % (lang1.upper(), ', '.join(choice))
        raw_input('Press Enter for the answer')
        print '[%s]: %s' % (lang2.upper(), ', '.join(d[choice]))

        end = raw_input('Do you want to continue Enter/N: ')
        if end == "N" or end == "n":
            break

if __name__ == "__main__":
    args = sys.argv
    main(vocab[args[1]], args[2], args[3])

The XML structure is fairly straightforward and is designed with possible expansion in mind and further filtering. It looks something like this:

<vocabulary>
    <word type="verb">
        <en>to agree</en>
        <de>jmdm./etw. zustimmen</de>
        <de>in etw. einwilligen</de>
        <fr>acquiescer à qn./qc.</fr>
    </word>
    <word type="adj">
        <en>horrible</en>
        <de>schrecklich</de>
        <de>fürchterlich</de>
        <fr>épouvantable</fr>
        <fr>terrible</fr>
    </word>
    <word type="noun">
        <en>murmur</en>
        <de gender="n">Murmeln</de>
        <fr gender="m">frémissement</fr>
    </word>
</vocabulary>

Note the type attribute for . It will hopefully allow me to filter by type of word. The gender attribute is only necessary for German and French and I only really put it in for French as there are cases where it is not clear, e.g. l’émission. If I really expand on this thing, I may use it to put the gender in square brackets after the word.

Of course, this is nothing to write home about, but maybe, I will expand it a little and create a GUI for it with some filtering options or abandon python altogether and put this on my website with some jquery. We shall see. I thought it was worth sharing.

Advertisements

RPG Music’N’SFX – How to filter using tags

I’ve been pondering over the same question for the best part of a day today. It is a very simple question, but difficult to answer, because it depends on habits.

My filtering system relies on tags for each file, i.e. every file has a potentially unlimited amount of tags you can add to it. But the big question I have is: how should I filter it?

There are two possibilities that I can see; Say you are searching for the tags “safe” and… Dunno “arabic”. Does that now mean that only the files that have “both” those tags should show up or should all files that have at least one of them show up…

Talking python code, for the former option, I would imagine that I would have a dictionary of the tags as keys and lists of filepaths. These would feed into a set as a tracklist pool a bit like the following:

tag_dict = {"safe" : ["an/example/file.mp3", "another/file.wav"],
           "travel" : ["an/example/file.mp3",
                       "unrelated/other/file.ogg"]}

# this would lead to playlist with both tags present looking
# as follows:

playlist = set(["an/example/file.mp3",
                "another/file.wav",
               "unrelated/other/file.ogg"])

In the latter case, I would probably just have a dictionary of the filepath as the key and its tags as a list in the values. Then whenever another tag is added or removed, I would iterate over the dictionary and check if all tags are present in the list using set.intersection (That is something I learnt from here). If that is the case, it will be added to the playlist. A bit like so:

track_dict = {"an/example/file.mp3" : ["safe", "travel"],
              "another/file.wav" : ["safe"],
              "unrelated/other/file.ogg" : ["travel"]}
 
playlist = []

choices = set(["safe", "travel"])
# i.e. only the first should be in the playlist

for key in track_dict:
    tags = set(track_dict[key])
    intersect = list(tags.intersection(choices))
    if intersect == list(choices):
        playlist.append(key)

print playlist

My tendency, user-wise, goes towards the latter as it seems more logical to me, but I have noticed on several separate occasions that this means, in fact, nothing. My brain tends not realise the most “common” modus operandi.

I am tempted to try and implement both and give the user the choice…