[sword-devel] New module: Chinese dictionary

Steve Tang sword-devel@crosswire.org
Mon, 10 Jun 2002 07:01:49 -0600 (MDT)


On Mon, 10 Jun 2002, Christian Renz wrote:

> Date: Mon, 10 Jun 2002 13:36:19 +0800
> From: Christian Renz <crenz-swordproject@web42.com>
> Reply-To: sword-devel@crosswire.org
> To: sword-devel@crosswire.org
> Subject: [sword-devel] New module: Chinese dictionary
> 
> Hello,
> 
> first of all -- I discovered the Sword Project a few days ago, and I
> am very impressed by the huge amount of modules available. Praise God!
> 
> I am working on creating dictionaries for the Sword Project, using the
> CEDICT project (a freely available Chinese dictionary; I already

Oh, interesting. I've made a few modules out of CEDICT myself already:
pinyin to Chinese
Chinese to English
English to Chinese

> contacted the author about his copyright terms and hope he'll rely

My email got returned and that's why I haven't pursued this route further.

> soon). As you might know, the standard Chinese (Mandarin) uses the
> Pinyin system for transliteration. Therefore, I created three
> dictionaries, so that words can be searched by english translation,
> characters and pinyin. (This is for simplified characters, once that

Great.

> works, I'll do the same for traditional characters.) I converted the
> GB2312 dictionary file to UTF-8, then used a perl script to generate
> the dictionary files (calling addld).
> 
> So far, so good. The dictionary has 15000 to 20000 entries (depening
> on direction) and the pinyin and english dictionaries work well. I
> still have to do some formatting (tone marks, nice layout etc).
> 
> Now, after this lengthy introduction, on to my questions and issues
> (anybody still reading?):
> 
> (general issues) 
> 
> - Has anybody come up with an utility to add more than one entry? It
>   should be easy to modify addld to read its input from a file, but I
>   don't have the time to do the programming right now, and I was
>   hoping that somebody already did that. On my slow little Linux
>   server, creating the dictionaries takes about fifty minutes -- just
>   because the script has to start tens of thousands of processes!

I used a perl script runing under cygwin. It goes slow but works. I
installed a RAM drive afterwards which should make this go much faster.

> 
> - What are the ThML tags for formatting available in the Sword Project
>   viewers? Is there something like a table tag? I'd like to group
>   entries e.g. by same pronounciation. Also, does the big tag work? It
>   would be useful to display the characters in a bigger typeface.
> 
> (Issues with the windows version of the Sword Project and the Glory
> Union Bible, Simplified Characters)
> 
> - There is some confusion with the code tables used to display text in
>   the windows software. Apparently, in the bible text windows the text
>   is displayed as GB2312 rather than UTF-8. The search combo box uses

Works for me (win95, 98, & 2k) in the bible window. You're right about
search window & others.

>   the system font, which is not capable of displaying anything else
>   than iso8859-1 on my machine, so I don't see character at all. (Have
>   to try out Windows 2000, though. Might be better on that platform.)
> 
>   What is the situation for other platforms (Linux, Mac OS X)? Is the
>   text of the Glory Union Bible displayed as GB2312 also? If yes, I
>   could try to keep the definitions as UTF-8, but encode the search
>   terms as GB2312.
> 
> - When looking up a term, it is displayed in the upper left corner, in
>   a small, blue typeface. This is not useful for the characters
>   dictionary, because the term is displayed in a Western encoding, not
>   in Unicode... if I put the characters inside the definition, they
>   are displayed just fine. Can the display of the search term in the
>   definition window somehow be suppressed?
> 
> Thanks for your help!
> 
> Greetings and blessings,
>    Christian Renz
> 
> -- 
> crenz@web42.com - http://www.web42.com/crenz/ - http://www.web42.com/
> 
> "The worst attitude of all would be the professional attitude which regards
> children in the lump as a sort of raw material which we have to handle."
>     -- C.S. Lewis, On Three Ways of Writing for Children
> 

Steve Tang...