[sword-devel] New module: Chinese dictionary
Mon, 10 Jun 2002 13:36:19 +0800
first of all -- I discovered the Sword Project a few days ago, and I
am very impressed by the huge amount of modules available. Praise God!
I am working on creating dictionaries for the Sword Project, using the
CEDICT project (a freely available Chinese dictionary; I already
contacted the author about his copyright terms and hope he'll rely
soon). As you might know, the standard Chinese (Mandarin) uses the
Pinyin system for transliteration. Therefore, I created three
dictionaries, so that words can be searched by english translation,
characters and pinyin. (This is for simplified characters, once that
works, I'll do the same for traditional characters.) I converted the
GB2312 dictionary file to UTF-8, then used a perl script to generate
the dictionary files (calling addld).
So far, so good. The dictionary has 15000 to 20000 entries (depening
on direction) and the pinyin and english dictionaries work well. I
still have to do some formatting (tone marks, nice layout etc).
Now, after this lengthy introduction, on to my questions and issues
(anybody still reading?):
- Has anybody come up with an utility to add more than one entry? It
should be easy to modify addld to read its input from a file, but I
don't have the time to do the programming right now, and I was
hoping that somebody already did that. On my slow little Linux
server, creating the dictionaries takes about fifty minutes -- just
because the script has to start tens of thousands of processes!
- What are the ThML tags for formatting available in the Sword Project
viewers? Is there something like a table tag? I'd like to group
entries e.g. by same pronounciation. Also, does the big tag work? It
would be useful to display the characters in a bigger typeface.
(Issues with the windows version of the Sword Project and the Glory
Union Bible, Simplified Characters)
- There is some confusion with the code tables used to display text in
the windows software. Apparently, in the bible text windows the text
is displayed as GB2312 rather than UTF-8. The search combo box uses
the system font, which is not capable of displaying anything else
than iso8859-1 on my machine, so I don't see character at all. (Have
to try out Windows 2000, though. Might be better on that platform.)
What is the situation for other platforms (Linux, Mac OS X)? Is the
text of the Glory Union Bible displayed as GB2312 also? If yes, I
could try to keep the definitions as UTF-8, but encode the search
terms as GB2312.
- When looking up a term, it is displayed in the upper left corner, in
a small, blue typeface. This is not useful for the characters
dictionary, because the term is displayed in a Western encoding, not
in Unicode... if I put the characters inside the definition, they
are displayed just fine. Can the display of the search term in the
definition window somehow be suppressed?
Thanks for your help!
Greetings and blessings,
firstname.lastname@example.org - http://www.web42.com/crenz/ - http://www.web42.com/
"The worst attitude of all would be the professional attitude which regards
children in the lump as a sort of raw material which we have to handle."
-- C.S. Lewis, On Three Ways of Writing for Children