[sword-devel] New Accented Greek NT with Morph

Wed Apr 20 16:54:53 MST 2005

Cool!
On WinXP SP2 under FireFox, the decomposed form shows some problems in 
positioning the accent over the letter, but the accent is displayed. The 
composed looks great.
On WinXP SP2 under IE, the decomposed form shows lots of accent 
rendering problems, but each letter is there. In the composed form, the 
accented letters are replace with a box. I wonder if specifying Arial 
Unicode MS and a few others (code2000, ...) in front of the font that 
the page is using, whether that would help.

I'll look at it under Linux FC3 later.

Can we download these modules? I'd like to see what they look like under 
JSword.

Personally, I can never get the accents right. And composing Greek on my 
keyboard is a pain. I generally resort to cut and paste to do searching 
against Greek. But sometimes accents make all the difference. So I would 
want the ability to search using accents. I would love to be able to 
type in "photos" and get all the places that "light" is found in Greek. 
And I want the computer to figure out what I mean.

I have given some thought to how JSword ought to implement indexing 
accented text. Here is what I came up with.
1) The text needs to be indexed in multiple forms and the user needs to 
be able to indicate or software needs to detect which form to search. 
The forms I was planning on indexing were with accents, without accents 
and transliterated into a-z. When indexing with accents I was planning 
on using a decomposed form.
2) When a user submits a search with accents, it would be normalized 
into the same form as held in the index. The user could also specify 
that they wished to look for the words without regard to accents.
3) For display the text would be normalized to what works best for the 
platform, either canonical or decomposed. I would hope that the form in 
the module would be such that accents could be readily stripped. I think 
that is decomposed, but I have not looked at it yet.

The other thought that I had was, hey, while we are at it why not store 
the verse in the index and also index the OSIS canonical reference for 
the work. Then the index would form a complete module. (And with the 
verse store the boundaries for well-formedness... ;) And perhaps have an 
index per testament.

Troy A. Griffitts wrote:

> Hey guys,
>         I've spent some time cleaning up a module submitted by David 
> (dnr at crosswire dot org) which uses the base Westcott-Hort Accented 
> GNT from CCEL and merges in the morphology tags from Maurice 
> Robinson's WHNU text (our WHNU module).  The result is an OSIS module 
> that is fully UTF8 Accented Greek NT with Morphology.  I'm really 
> excited about this and it has taken me way too long to process this 
> work (sorry guys).  The only thing keeping this module from being the 
> ULTIMATE replacement for our WHNU module is the lack of 
> Nestle-Aland/UBS variants against the WH (the 'NU' part of our current 
> WHNU module).  Without these variants, we still cannot produce the 
> Greek text which is the predominant base text used for all modern 
> Bible translation work.
>
> But it's still really cool! :)
>
> Now, having said all this, we still have problems with the current 
> module.
>
>     o Oddly, Unicode Greek encoding is not very standard.  With 
> Hebrew, everyone expected the extra work to compose consonants and 
> vowels and accents, etc. They've already done the work (well, 
> mostly).  With Greek, there is a whole "Greek Extended" Unicode range 
> defined containing precomposed characters.  Some renderers desire 
> characters precomposed, others like to do the composing themselves.
>
>     This issue makes things a little problematic.  Most resources 
> (including the ICU Unicode library) claim that canonical normal form 
> is precomposed for Greek, and my firefox browser under linux looks 
> great showing precomposed characters.  IE running on _stock_ XP looks 
> horrible.  If one webpage has Greek precomposed characters, and 
> someone enters a search string in decomposed characters, they 
> obviously will not match, unless someone behind the curtain is being 
> smart about things-- we have the necessary filters in place to handle 
> this, but we need to think about the best choices: a) strip all 
> accents before searching; b) NFC both the search string and the text 
> before searching
>
>     I've spent some time making 3 Bibles available on our site: 1) 
> unaccented; 2) accented precomposed; 3) accented decomposed
>
>     Here is a link which should show all 3 in parallel (you can click 
> on words for definitions if you'd like :)   ).
>
> http://crosswire.org/study/parallelstudy.jsp?add=WHNU&add=WHAC&add=WHACD
>
>     We've specified in the HTML that the encoding is UTF-8 so all 
> browsers have a fighting chance :)
>
>     If you have a chance, could you please spend some time trying this 
> link with your browser and report your results and configuration AND 
> ANYTHING YOU DO (with fonts or otherwise) that improves your viewing 
> of the accented Bibles.
>
>     Thanks to everyone who have contributed and I'm excited about this 
> new work!
>
>         -Troy.
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>