Converting data from ProVoc to Anki

After having tried out about a dozen of other vocabulary trainers I decided to switch to Anki completely. I knew Anki already and was pretty sure that with some changes it might suit my needs. However, there were two problems to be solved:

  1. How to convert all my vocabulary from ProVoc to Anki?
  2. How to convert my learning progress from ProVoc to Anki?

Decisions I made were:

  1. As ProVoc has an export facility and Anki can import various file formats, an easy script could convert my ProVoc vobulary to Anki.
  2. As Anki uses a completely different system of learning (spaced repetition) which is somewhat incompatible to ProVoc’s own system, that only partly resembles the SRS techniques of Anki, I decided to start afresh in Anki.
  3. Because I didn’t want to have a stack of tens of thousands of unlearned vocabulary in Anki after the initial import – of which I probably knew the greatest part – I wanted to have the imported vocabulary correctly tagged with the lessons’ and chapters’ names from ProVoc. That way I could move the old vocabulary to some subdeck for later use and concentrate on adding new, current vocabulary directly to Anki.

Conversion from ProVoc to Anki

In ProVoc I had all my vocabulary sorted into lessons (I used for the different books) and chapters (I used for the different lessons and chapters):

I separated the different german translations for a single hungarian word with / (space, slash, space), whereas the different forms of a hungarian word where separated by , (comma, space). Alternative forms for a form where separated by / (slash with no surrounding white space). I used the few fields in ProVoc for

I wanted to have it converted to a simple card in Anki made up of the following fields:

  1. HUN Wort: hungarian word (base form)
  2. HUN 1. Stammform: hungarian word (first principal form)
  3. HUN 2. Stammform: hungarian word (first principal form)
  4. HUN 3. Stammform: hungarian word (first principal form)
  5. GER Wort: german translations (separated by , )
  6. Quelle: source of the vocabulary entry, like book, chapter and so on
  7. Note ID: internal Note ID by Anki
  8. Tags: source of the vocabulary entry, this time converted to a single word, white space converted to underscores, so one logical entry becomes a single tag in Anki

Export from ProVoc

The export file from ProVoc was easily created by File – Export. The settings used were:

Conversion

The conversion was a matter of a single oneliner in awk:

awk -F "\t" '{ if ($0 ~ /^#/) { LEKTION = $0; gsub(/^# /, "", LEKTION); TAG = LEKTION; gsub(/\ /, "_", TAG); } else if ($0 ~ /^$/) {} else { gsub(/\ \/\ /, ", ", $1); print $2 "\t" $3 "\t" $1 "\t" LEKTION "\t" "HUN_"TAG}}' < export-provoc.txt > import-anki.txt

Import into Anki

That one was straightfoward:

  1. Create a new card type
  2. Choose the file to import: import-anki.txt
  3. Assign the fields from the imported file to fields of your new card type
  4. Let the import run

The first import of about 40,000 hungarian cards ran without any problems.
Now I only need to make cosmetic changes like splitting up the different principal forms, weeding out no longer needed duplicate cards and from time to time moving some blocks of old vocabulary to my current deck to repeat them and see what I remember from like ten years ago.

Farewell and thank you, ProVoc! – Welcome, Anki!