[sword-devel] Re: Westcott-Hort

Chris Little sword-devel@crosswire.org
Sun, 04 Apr 2004 23:15:05 -0700


Troy A. Griffitts wrote:
> Costas,
>     A few comments...
> 
> Costas Stergiou wrote:
> 
>> Hi David/Troy,
>> looking at the texts, I think there is some work to be done:
>> - remove any combining diacriticals & process everything as precomposed.
> 
> 
> I think this is backwards.  From my limited understanding and from 
> reading recent posts on sword-devel from people with much more knowledge 
> than me, I think the text should be stored with no precomposed 
> characters.  If the renderer needs to send precomposed characters to the 
> display control, then it (sword can do this with an ICU filter, I think) 
> can precompose them.

In terms of combining characters vs. precomposed, all you really need to 
do is to remember to use a single normalization form.  Unicode sort of 
informally suggests that NFC is best.  W3C specifically recommends using 
NFC (see http://www.w3.org/TR/charmod-norm/).  Roughly, NFC 
normalization consists of taking a string, decomposing all characters, 
then combining any codepoints that can be combined, provided the 
precombined codepoints are not compatability codepoints.  The way to 
ensure that a string is NFC normalized is to just normalize it with 
something like the uconv program I mentioned.

I really don't know whether Extended Greek is NFC or not.  So the last 
step before creating the Sword module should be normalization.


--Chris