[sword-devel] Unicode Bible program

Uwe Koloska sword-devel@crosswire.org
Fri, 25 Feb 2000 11:07:57 +0100


You wrote on Fre, 25 Feb 2000:
>On Thu, 24 Feb 2000, Troy A. Griffitts wrote:
>> I'm sure we would all love for this project to support unicode.  If I
>> knew anything technical about unicode I'd comment on if there is
>> anything that would hinder you from subclassing SWText and creating your
>> own.  I believe the code is modular enough so that if say, the search
>> method did ASCII specific things like the call to strstr(...), you may
>> still override the search method in your own SWText subclass and provide
>> unicode searching algorithms.
>> 
>> I would love to learn more and take out all 8bit dependencies to help
>> you, but it's just a question of time.
>
>I understand that completely! I don't expect you to put aside what you're
>doing to look into the Unicode issue. There are a lot of people who need SWORD
>who don't need Unicode. I am reading through the Unicode standard now and
>experimenting with different options. If there is anyone on this list who is a
>Unicode expert, I'd really appreciate some help in understanding some of the
>issues involved.

I'm no expert, but for a project I have to dig deeper into unicode.  Unicode is
just the name for the "grand unified encoding" (like the "grand unified theory"
einstein was looking for) and there are several ways to represent unicode
characters.  The newest one is with 32 bit because the older one with 16 bit
has turned to be to small ...  And for the actually used 16bit Code there is
UTF-8 able to represent it in a flow of one or more bytes, with the benefit of
real ascii texts that are still real ascii texts (all chars from 0--127 are
presented by one byte).  In the last years the c-library was enhanced by
functions for supporting multibyte chars (wchar_t) and I think c++ has
something similar.
  And just another thought: Tcl (http://www.scriptics.com) has support for
unicode since version 8.0.  And they had to face the problem that most code for
tcl is coded in normal ascii.  So they decided to use UTF-8 internally with the
result:  a program coded just in 7bit ASCII can stay the same for tcl8.0 and
some version before.
  Unicode not only provides support for more than 128 glyphs, for some glyph
systems there are different writing directions (hebrew for example).  This is
covered by the unicode bidi-algorithm that is freely available as fribidi
    http://freshmeat.net/news/2000/01/01/946762146.html
(sorry only have the freshmeat entry)
So there are many languages that need unicode support.  And the internal
structure of sword can be cleard by supporting it.  The frontends don't have to
be different for different platforms, there "only" has to be proper translation
from unicode to non unicode systems.  But this translation is IMHO clearer and
less errorprone than thinking about the right encoding at many different places
(loading hebrew modul, displaying german ...).

Hope that helps
Uwe

-- 
mailto:koloska@rcs.urz.tu-dresden.de
http://rcswww.urz.tu-dresden.de/~koloska/
--                                    --
right now the web page is in german only
but this will change as time goes by ;-)