[sword-devel] Greek UTF-8/Unicode

DM Smith dmsmith555 at yahoo.com
Sun Jan 20 07:06:00 MST 2008


Robin,

There are a few other issues relating to getting this to work.

Unicode allows for a decorated character to be a single code point,  
called a composed character, or multiple code points for the letter  
followed by it's decorations.

These are called NFC and NFD, respectively. There are two other ways  
to represent unicode characters called NFKC and NFKD. For a good  
description see: http://unicode.org/reports/tr15/ and http://unicode.org/faq/normalization.html

At CrossWire, we have settled on NFC. This appears to be the  
recommendation of the w3c. See: http://www.crosswire.org/pipermail/sword-devel/2007-September/025896.html

At this time it is the module encoder's responsibility to encode the  
module correctly. Later osis2mod (and perhaps some of the other  
filters) will be changed to force the text to NFC.

Basically, you need to first run the text through a filter to that  
does Canonical Decomposition and then through one that does Canonical  
Composition. (The Sword filter utf8nfc.cpp does this)

Once that is done, make the module as you have always done.

The next step is to ensure that you have a font that can handle the  
text. On Windows, I believe Arial should work. However, SIL has a  
bunch of open source fonts which are excellent. See: http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=silfontlist

If it doesn't work in BibleCS, try BibleDesktop, FireFox and IE. (you  
should be able to open the dat file if the module is not compressed)

Hope that helps.

In His Service,
	DM

On Jan 19, 2008, at 11:47 PM, RLRANDALLX at aol.com wrote:

>
> DM, Sabastien,
>
> Thanks for your references on encoding. I have read about encodings  
> and Now I need a practical example of making it thru the module  
> process. Let's say I have an alpha with an accent (I use PSPad to  
> get the codes in right) and I have this in XXX.imp.  When I bring  
> this into NotePad with UTF-8 as encoding type (Format is also Greek  
> script) it looks just fine.  Then I run it thru "imp2ld XXX.imp XXX  
> 2" to get XXX.dat and XXX.idx.  No errors, no problems.  The  
> XXX.conf  file has Encoding=UTF-8.  But when I fire up BibleCS and  
> look at XXX in the LD pane I see a box where I am expecting an  
> accented alpha. Unfortunately I know of no accented Greek text that  
> I can reverse engineer to see where I am going wrong. Without clear  
> answers at this point
> I have resigned to include only unaccented Greek text. If there are  
> better tools out there to ensure I am on the right track please let  
> me know.
>
> In His Grace,
> Robin
>
> >On Jan 18, 2008, at 5:24 AM, Sebastien Koechlin wrote:
>
> >> On Thu, Jan 17, 2008 at 11:58:10PM -0500, RLRANDALLX at aol.com wrote:
> >>>>> I'm trying to display Unicode Greek in RawLD   ThML with 1.5.9
> >>>>> BibleCS.
> >>>>> Does anyone know what the .conf file should look like?
> >>>>> "Encoding=Unicode or "Encoding=UNICODE" does NOT work.  I just
> >>>>> get open
> >>>>> squares where the letters should have accents.
> >>>
> >>>> Should be UTF-8, "unicode" is usually for  internal  
> representation
> >>>> only
> >>>> and "unicode" in itself is ambiguous.
> >>
> >> Unicode is not an encoding.
> >>
> >> As encoding is a common source of problems, I tried to write a  
> small
> >> text
> >> about it. As english is not my native language, someone should
> >> probably
> >> correct it.
> >>
> >> http://www.crosswire.org/wiki/index.php/Encoding
> >
> >I've added links to your excellent page from http://www.crosswire.org/wiki/index.php/DevTools:Modules
> > both in the section on Encoding and in Related Links.
> E-mail: RLRandallX at aol.com
>
>
>
> Start the year off right. Easy ways to stay in shape in the new year.
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.crosswire.org/pipermail/sword-devel/attachments/20080120/82b5ac86/attachment-0001.html 


More information about the sword-devel mailing list