[sword-devel] New module: Chinese dictionary
Mon, 10 Jun 2002 07:01:49 -0600 (MDT)
On Mon, 10 Jun 2002, Christian Renz wrote:
> Date: Mon, 10 Jun 2002 13:36:19 +0800
> From: Christian Renz <email@example.com>
> Reply-To: firstname.lastname@example.org
> To: email@example.com
> Subject: [sword-devel] New module: Chinese dictionary
> first of all -- I discovered the Sword Project a few days ago, and I
> am very impressed by the huge amount of modules available. Praise God!
> I am working on creating dictionaries for the Sword Project, using the
> CEDICT project (a freely available Chinese dictionary; I already
Oh, interesting. I've made a few modules out of CEDICT myself already:
pinyin to Chinese
Chinese to English
English to Chinese
> contacted the author about his copyright terms and hope he'll rely
My email got returned and that's why I haven't pursued this route further.
> soon). As you might know, the standard Chinese (Mandarin) uses the
> Pinyin system for transliteration. Therefore, I created three
> dictionaries, so that words can be searched by english translation,
> characters and pinyin. (This is for simplified characters, once that
> works, I'll do the same for traditional characters.) I converted the
> GB2312 dictionary file to UTF-8, then used a perl script to generate
> the dictionary files (calling addld).
> So far, so good. The dictionary has 15000 to 20000 entries (depening
> on direction) and the pinyin and english dictionaries work well. I
> still have to do some formatting (tone marks, nice layout etc).
> Now, after this lengthy introduction, on to my questions and issues
> (anybody still reading?):
> (general issues)
> - Has anybody come up with an utility to add more than one entry? It
> should be easy to modify addld to read its input from a file, but I
> don't have the time to do the programming right now, and I was
> hoping that somebody already did that. On my slow little Linux
> server, creating the dictionaries takes about fifty minutes -- just
> because the script has to start tens of thousands of processes!
I used a perl script runing under cygwin. It goes slow but works. I
installed a RAM drive afterwards which should make this go much faster.
> - What are the ThML tags for formatting available in the Sword Project
> viewers? Is there something like a table tag? I'd like to group
> entries e.g. by same pronounciation. Also, does the big tag work? It
> would be useful to display the characters in a bigger typeface.
> (Issues with the windows version of the Sword Project and the Glory
> Union Bible, Simplified Characters)
> - There is some confusion with the code tables used to display text in
> the windows software. Apparently, in the bible text windows the text
> is displayed as GB2312 rather than UTF-8. The search combo box uses
Works for me (win95, 98, & 2k) in the bible window. You're right about
search window & others.
> the system font, which is not capable of displaying anything else
> than iso8859-1 on my machine, so I don't see character at all. (Have
> to try out Windows 2000, though. Might be better on that platform.)
> What is the situation for other platforms (Linux, Mac OS X)? Is the
> text of the Glory Union Bible displayed as GB2312 also? If yes, I
> could try to keep the definitions as UTF-8, but encode the search
> terms as GB2312.
> - When looking up a term, it is displayed in the upper left corner, in
> a small, blue typeface. This is not useful for the characters
> dictionary, because the term is displayed in a Western encoding, not
> in Unicode... if I put the characters inside the definition, they
> are displayed just fine. Can the display of the search term in the
> definition window somehow be suppressed?
> Thanks for your help!
> Greetings and blessings,
> Christian Renz
> firstname.lastname@example.org - http://www.web42.com/crenz/ - http://www.web42.com/
> "The worst attitude of all would be the professional attitude which regards
> children in the lump as a sort of raw material which we have to handle."
> -- C.S. Lewis, On Three Ways of Writing for Children