[sword-devel] Re: Westcott-Hort
Sun, 04 Apr 2004 23:15:05 -0700
Troy A. Griffitts wrote:
> A few comments...
> Costas Stergiou wrote:
>> Hi David/Troy,
>> looking at the texts, I think there is some work to be done:
>> - remove any combining diacriticals & process everything as precomposed.
> I think this is backwards. From my limited understanding and from
> reading recent posts on sword-devel from people with much more knowledge
> than me, I think the text should be stored with no precomposed
> characters. If the renderer needs to send precomposed characters to the
> display control, then it (sword can do this with an ICU filter, I think)
> can precompose them.
In terms of combining characters vs. precomposed, all you really need to
do is to remember to use a single normalization form. Unicode sort of
informally suggests that NFC is best. W3C specifically recommends using
NFC (see http://www.w3.org/TR/charmod-norm/). Roughly, NFC
normalization consists of taking a string, decomposing all characters,
then combining any codepoints that can be combined, provided the
precombined codepoints are not compatability codepoints. The way to
ensure that a string is NFC normalized is to just normalize it with
something like the uconv program I mentioned.
I really don't know whether Extended Greek is NFC or not. So the last
step before creating the Sword module should be normalization.