[sword-devel] New Accented Greek NT with Morph

Thu Apr 21 05:25:21 MST 2005

I did a few more experiments but w/in Java and the results were very 
different. Everything depended upon which font was chosen. By and large, 
it looked better using the decomposed form. If a font was chosen which 
did not support Greek Unicode, I got boxes for accents. In the case of 
decomposed, the words were readable but for composed, every accented 
letter was a box. With a good Unicode font (I tried Java's Dialog and 
Serif, Arial Unicode MS, Code2000, Lucida Sans Unicode, Gentium, Gentium 
Alt) the results were fairly consistent. Decomposed had better support 
overall (i.e. no boxes). Some fonts showed boxing for Composed (Dialog, 
Lucida Sans Unicode). However, when a font supported both, the composed 
looked better, with the accents being positioned better.
But overall, Dialog looked the best for decomposed, but some accents on 
capital letters were not readable.

The other issues dealt with the nature of fonts: scaling, anti-aliasing, 
font size, ascii support.... Some fonts looked better for the Greek but 
looked horrible for ascii. Some looked better anti-aliased, some looked 
worse. Some font sizes scaled badly. And so forth. But these showed no 
differences between the composed and decomposed.

Summary/Thoughts:
Many fonts support unaccented Greek letters.
And few fonts supported accents in either. More fonts supported 
decomposed than composed.
Decomposed resulted in the text being readable when the fonts did not 
support accents. But composed was entirely unreadable.
Fonts that looked good with composed may not look good for ascii (e.g. 
margin notes)
With ICU we can go from composed to decomposed or decomposed to 
composed, though the trip is not lossless (Some decompositions can't be 
recomposed. E.g. 1/4 will not compose to ¼, but ¼ will decompose to 1/4)
Composed makes sense for storage. I don't know how well the module 
compresses when decomposed, but it is large if it is not.
Decomposed makes sense for display in Java. Currently we don't use ICU4J 
and it would nearly triple the size of JSword so having a composed 
module that we decompose using ICU is a big proposition.

My Conclusion: There is no best answer. My guess is that each delivery 
platform will need to use ICU to produce the results it needs.

Troy A. Griffitts wrote:

> Hey guys,
>         I've spent some time cleaning up a module submitted by David 
> (dnr at crosswire dot org) which uses the base Westcott-Hort Accented 
> GNT from CCEL and merges in the morphology tags from Maurice 
> Robinson's WHNU text (our WHNU module).  The result is an OSIS module 
> that is fully UTF8 Accented Greek NT with Morphology.  I'm really 
> excited about this and it has taken me way too long to process this 
> work (sorry guys).  The only thing keeping this module from being the 
> ULTIMATE replacement for our WHNU module is the lack of 
> Nestle-Aland/UBS variants against the WH (the 'NU' part of our current 
> WHNU module).  Without these variants, we still cannot produce the 
> Greek text which is the predominant base text used for all modern 
> Bible translation work.
>
> But it's still really cool! :)
>
> Now, having said all this, we still have problems with the current 
> module.
>
>     o Oddly, Unicode Greek encoding is not very standard.  With 
> Hebrew, everyone expected the extra work to compose consonants and 
> vowels and accents, etc. They've already done the work (well, 
> mostly).  With Greek, there is a whole "Greek Extended" Unicode range 
> defined containing precomposed characters.  Some renderers desire 
> characters precomposed, others like to do the composing themselves.
>
>     This issue makes things a little problematic.  Most resources 
> (including the ICU Unicode library) claim that canonical normal form 
> is precomposed for Greek, and my firefox browser under linux looks 
> great showing precomposed characters.  IE running on _stock_ XP looks 
> horrible.  If one webpage has Greek precomposed characters, and 
> someone enters a search string in decomposed characters, they 
> obviously will not match, unless someone behind the curtain is being 
> smart about things-- we have the necessary filters in place to handle 
> this, but we need to think about the best choices: a) strip all 
> accents before searching; b) NFC both the search string and the text 
> before searching
>
>     I've spent some time making 3 Bibles available on our site: 1) 
> unaccented; 2) accented precomposed; 3) accented decomposed
>
>     Here is a link which should show all 3 in parallel (you can click 
> on words for definitions if you'd like :)   ).
>
> http://crosswire.org/study/parallelstudy.jsp?add=WHNU&add=WHAC&add=WHACD
>
>     We've specified in the HTML that the encoding is UTF-8 so all 
> browsers have a fighting chance :)
>
>     If you have a chance, could you please spend some time trying this 
> link with your browser and report your results and configuration AND 
> ANYTHING YOU DO (with fonts or otherwise) that improves your viewing 
> of the accented Bibles.
>
>     Thanks to everyone who have contributed and I'm excited about this 
> new work!
>
>         -Troy.
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>