[sword-devel] for the love of unicode

Martin Gruner sword-devel@crosswire.org
Mon, 18 Jun 2001 21:38:01 +0200

> > What about fronends which don't support UTF-8?
> > IMO they can normally only display latin-1 chars, so we should have a
> > conversion UTF-8 to Latin-1.
> Do you think we should do UTF-8 to x converters for all the formats we
> use, or just Latin-1?  We could go use the Greek & Hebrew ISO standards
> too, plus KOI8 for Cyrillic translations.
> What should happen if you want to access a text that uses characters
> outside of Latin-1, like the BHS or a Chinese/Japanese translation, from
> iraeneus?

There we are at the beginning point again, when thinking about how to store 
the modules.
If we decide to provide functions for locale-specific output (iso5589-x, ...) 
we can also store the modules in these encodings, reducing the sizes.
If not, we shouldn't do it, also not for greek modules which are now encoded 
in "symbol".

Chris, you may want to take a look at the QT sources (QT 2.3.1, 
http://www.trolltech.com). They have all the conversion tables compiled in 
and working somehow. The object is called QTextCodec.

I'd personally prefer to store the modules in their native (but 
standardized!) encodings and provide 
a) native
b) UTF-8
in- and output functions.
But this will be much more complicated (what about 2Byte charsets?) than just 
switching to UTF-8 and may not even be worth the effort.