[bt-devel] RE: UTF-8 and new module classes

Martin Gruner bt-devel@crosswire.org
Thu, 24 May 2001 19:17:39 +0200


Hi Joachim,

> I think UTF-8 is a standard. Wouldn't it be better to have all modules
> available in UTF-8 so all the fonts problems go away?

Yes and no. UTF-8 is just not necessary for the majority of modules. They 
will use twice the size since each character is 2 Byte. And there might be 
frontends which will not be able to display unicode at all. (e.g. irenaeus)

But: If the modules are encoded with the correct language specific encodings 
they are still 1 Byte, and it is just very easy to map these encodings into 
the UTF-8 unicode encoding. So we could internally work with unicode while 
other apps do not have to, and the modules are still small.
The point is that the modules should be rebuilt using those iso8859-x 
encodings, which is _much_ better than just encoding with some fontspecific 
ascii encoding, which we can not map into unicode.

I wonder how searching in unicode modules works. Does sword now internally 
use unicode?

Martin



> > I favor moving from the font= tag to an encoding= tag. This way we'd not
> > have to use huge fonts, but still the flexibility to let the user choose
> > his/her font. E.g. encoding=iso8859-7 would define greek text. You can
> > then just display this text with a 1 Byte iso8859-7 font or map it into
> > unicode for different purposes.
> > IMO using standards is always a good way to go.
> > We could implement some mapping filters in sword which map from
> > fontspecific ascii encodings to the correct language specific encodings
> > (Like a bstgreek2iso8859-7 filter) to also support frontends favoring the
> > font= solution.
> >
> > Some good links I want to recommend to you:
> > http://czyborra.com/
> > http://czyborra.com/charsets/iso8859.html
> > http://czyborra.com/charsets/cyrillic.html
> >
> > Martin