[sword-devel] unicode / utf-8
Fri, 25 May 2001 14:18:23 +0200
> > > Lot's of things to consider over the next few weeks as we try to hash
> > > out an initial shot at supporting this new range of modules.
> If Martin talks us into using iso8859 and other 8/16-bit encodings to save
> space, there are some very nice conversion tables at
> http://www.unicode.org/Public/MAPPINGS/. And it might be nice to provide
> mechanisms for this to aid front-ends that have no hope of Unicode support.
Well. Troy's comments on UTF-8 were really delighting for me, I didn'd now
that UTF-8 enables storing with variable length sizes, and therefore is not
blowing up most of the modules.
So I suggest using UTF-8 for _all_ of the sword modules instead of using
iso8859-x etc. Sword could handle the characters as unsigned long internally
which may be easier to handle than variable length characters.
Using fixed sized chars internally will make the handling much more simple.
We could still support modules with different encodings, which would be
mapped into unicode internally. And there yould be output routines which
convert the unicode chars to a frontend specific encoding, say iso8859-1 for
irenaeus in the western locale.
This would increase the usability and efficiency of sword a lot.