[sword-devel] IBM ICU

Chris Little sword-devel@crosswire.org
17 Sep 2001 20:13:01 -0700

I've mentioned a couple of times that I was looking into the IBM
International Components for Unicode.  I've now got ICU's low level
(common) and high level (i18n) code building along with Sword.  This is
almost the entire library.  I haven't yet added the data portions of the
ICU, of which we will only need some of the transliteration databases
and a bit of the locale data.

I would like to proceed with this, committing virtually all of the ICU
code to our CVS.  However... it adds 3 megs to the CVS and about 5 megs
to the resulting library on my x86 linux machine (it goes from ~12mb to
~17mb).  I don't see a problem with this, because resulting executables
won't grow at all until they start integrating ICU functions.

Also, use of the ICU will always be optional.  The Sword libarary can be
compiled to exclude ICU stuff.  Front-end authors may or may not choose
maintain ICU as a toggleable ooption, but Troy wanted to make sure that
it is an option for more limited platforms like handhelds.

What does ICU provide us? --
transliteration (between Latin, Cyrillic, Greek, Hebrew, Arabic, Hangul,
& Kana) which we can use for input & output
normalization (composing & decomposing ligatures, characters with
accents, etc.)
collation (for sorting & string compares)
localization data
basic string transformations & conversions (UTF-8/16/32, SCSU, & BOCU)
word boundary analysis

Please let's discuss this and whether the ICU stuff should be
incorporated in full.