[sword-devel] IBM ICU

David Overcash sword-devel@crosswire.org
Mon, 17 Sep 2001 23:35:10 -0500

Well, seeing as how it will mainly be used on the frontends, I think there
would be a benefit to have two different frontends...  One meant more for
the standard english/english alphabet and another for those users who will
view in other languages...

Or possibly used in a frontend that would be for the teacher/student looking
to search more in depth to the original hebrew word, etc.

Definitely, it should be used in some way or another because it would be a
shame to not take advantage of all its uses...

-Dave Overcash

-----Original Message-----
From: owner-sword-devel@crosswire.org
[mailto:owner-sword-devel@crosswire.org]On Behalf Of Chris Little
Sent: Monday, September 17, 2001 10:13 PM
To: sword-devel@crosswire.org
Subject: [sword-devel] IBM ICU

I've mentioned a couple of times that I was looking into the IBM
International Components for Unicode.  I've now got ICU's low level
(common) and high level (i18n) code building along with Sword.  This is
almost the entire library.  I haven't yet added the data portions of the
ICU, of which we will only need some of the transliteration databases
and a bit of the locale data.

I would like to proceed with this, committing virtually all of the ICU
code to our CVS.  However... it adds 3 megs to the CVS and about 5 megs
to the resulting library on my x86 linux machine (it goes from ~12mb to
~17mb).  I don't see a problem with this, because resulting executables
won't grow at all until they start integrating ICU functions.

Also, use of the ICU will always be optional.  The Sword libarary can be
compiled to exclude ICU stuff.  Front-end authors may or may not choose
maintain ICU as a toggleable ooption, but Troy wanted to make sure that
it is an option for more limited platforms like handhelds.

What does ICU provide us? --
transliteration (between Latin, Cyrillic, Greek, Hebrew, Arabic, Hangul,
& Kana) which we can use for input & output
normalization (composing & decomposing ligatures, characters with
accents, etc.)
collation (for sorting & string compares)
localization data
basic string transformations & conversions (UTF-8/16/32, SCSU, & BOCU)
word boundary analysis

Please let's discuss this and whether the ICU stuff should be
incorporated in full.