Wenlin is the the best piece of software around for students of Chinese. Among other tools, it has a powerful and handy offline dictionary with very flexible and fast search options as well.
I know many students of Chinese that use Wenlin to get their definitions and input vocabulary into flashcard software. Most recently I saw someone do this in a coffee shop here in Taipei, and it brought back a lot of memories of me doing the same in Beijing almost a decade ago.
Wenlin doesn’t make it easy for you, however, to get the word entries into a format that can be easily imported into flaschard applications. There is no “export” feature, presumably because the developer doesn’t like the idea of large parts of the Wenlin dictionary getting out of the software and into a separate database. However, the lack of such a feature means that students have to copy and paste words from Wenlin and add their own tabs. In my case, I also like to delete the alternate hanzi to keep my flashcards more clean.
Although a more experience programmer with good regular expressions skills could easily take this further, I am releasing the results of an evening spent trying to learn how to program in the programming language Ruby:
Wenlin Conversion Script
Here is a screencast explaining how to use the script:
Wenlin Conversion Script Screencast
This script takes a text file with a list of Wenlin dictionary entries (Saved in TextEdit, not in Wenlin) and puts tabs between the hanzi and the pinyin and between the pinyin and the definition. It saves the converted file which can then be easily imported into your favorite flashcard program.
It is made up of two scripts: the convert.app applescript application which you is what you use to run the script and the convert.rb ruby script which does the actual conversion. You can customize three options in the convert.rb script. Just open it up and set the three option variables at the top to true or false according to your preference for that option. There is a description of what each option does in the ruby file but basically they control whether the alternate traditional/simplified hanzi are removed, whether the “|” character is changed to “Example: ” and the “~” in examples replaced by the pinyin of the word.
I haven’t tested this too extensively so if you see it do strange things with the wenlin vocab items let me know and I’ll tweak the script in the future.
UPDATES:
-I just noticed in the screencast that it split the word “fandong fenzi” and put “fenzi” into the definition - I need to update the regular expression so that it looks for the part of speech rather than a space to separate the pinyin from the definition. I didn’t realize that Wenlin sometimes puts spaces into its pinyin words. I’ll release this soon.
-I just updated a 1.1 version. See the enclosed Read Me file for things I have fixed and changed in this new version of the script.
Managing Keyboard Inputs Methods
One problem that makes it difficult to quickly and efficiently enter large numbers of vocabulary directly into flashcard software if you are dealing with non-Roman languages is the fact that the user has to keep switching the keyboard input back and forth between English and the other language, whatever it may be. This is a problem for all the flashcard applications I have seen so far, with the exception with some older versions of iFlash.
I’m wondering if this is a completely insurmountable programming problem in OS X or if perhaps the Cocoa programming API does offer some way of overcoming this issue.
Today I found this in the reference for the NSTextFieldCell class:
setAllowedInputSourceLocales - Sets an array of locale identifiers representing input sources that are allowed to be enabled when the receiver has the keyboard focus.
allowedInputSourceLocales - Returns an array of locale identifiers representing input sources that are allowed to be enabled when the receiver has the keyboard focus.
I don’t know much about Cocoa programming but I wonder if these two things (OS X 10.5) or something similar can be used to help remedy the problem?
Also, programmers might want to read over this posting about keyboard events and non-Roman languages.