[sword-devel] Greek UTF-8/Unicode
dmsmith555 at yahoo.com
Sun Jan 20 07:06:00 MST 2008
There are a few other issues relating to getting this to work.
Unicode allows for a decorated character to be a single code point,
called a composed character, or multiple code points for the letter
followed by it's decorations.
These are called NFC and NFD, respectively. There are two other ways
to represent unicode characters called NFKC and NFKD. For a good
description see: http://unicode.org/reports/tr15/ and http://unicode.org/faq/normalization.html
At CrossWire, we have settled on NFC. This appears to be the
recommendation of the w3c. See: http://www.crosswire.org/pipermail/sword-devel/2007-September/025896.html
At this time it is the module encoder's responsibility to encode the
module correctly. Later osis2mod (and perhaps some of the other
filters) will be changed to force the text to NFC.
Basically, you need to first run the text through a filter to that
does Canonical Decomposition and then through one that does Canonical
Composition. (The Sword filter utf8nfc.cpp does this)
Once that is done, make the module as you have always done.
The next step is to ensure that you have a font that can handle the
text. On Windows, I believe Arial should work. However, SIL has a
bunch of open source fonts which are excellent. See: http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=silfontlist
If it doesn't work in BibleCS, try BibleDesktop, FireFox and IE. (you
should be able to open the dat file if the module is not compressed)
Hope that helps.
In His Service,
On Jan 19, 2008, at 11:47 PM, RLRANDALLX at aol.com wrote:
> DM, Sabastien,
> Thanks for your references on encoding. I have read about encodings
> and Now I need a practical example of making it thru the module
> process. Let's say I have an alpha with an accent (I use PSPad to
> get the codes in right) and I have this in XXX.imp. When I bring
> this into NotePad with UTF-8 as encoding type (Format is also Greek
> script) it looks just fine. Then I run it thru "imp2ld XXX.imp XXX
> 2" to get XXX.dat and XXX.idx. No errors, no problems. The
> XXX.conf file has Encoding=UTF-8. But when I fire up BibleCS and
> look at XXX in the LD pane I see a box where I am expecting an
> accented alpha. Unfortunately I know of no accented Greek text that
> I can reverse engineer to see where I am going wrong. Without clear
> answers at this point
> I have resigned to include only unaccented Greek text. If there are
> better tools out there to ensure I am on the right track please let
> me know.
> In His Grace,
> >On Jan 18, 2008, at 5:24 AM, Sebastien Koechlin wrote:
> >> On Thu, Jan 17, 2008 at 11:58:10PM -0500, RLRANDALLX at aol.com wrote:
> >>>>> I'm trying to display Unicode Greek in RawLD ThML with 1.5.9
> >>>>> BibleCS.
> >>>>> Does anyone know what the .conf file should look like?
> >>>>> "Encoding=Unicode or "Encoding=UNICODE" does NOT work. I just
> >>>>> get open
> >>>>> squares where the letters should have accents.
> >>>> Should be UTF-8, "unicode" is usually for internal
> >>>> only
> >>>> and "unicode" in itself is ambiguous.
> >> Unicode is not an encoding.
> >> As encoding is a common source of problems, I tried to write a
> >> text
> >> about it. As english is not my native language, someone should
> >> probably
> >> correct it.
> >> http://www.crosswire.org/wiki/index.php/Encoding
> >I've added links to your excellent page from http://www.crosswire.org/wiki/index.php/DevTools:Modules
> > both in the section on Encoding and in Related Links.
> E-mail: RLRandallX at aol.com
> Start the year off right. Easy ways to stay in shape in the new year.
> sword-devel mailing list: sword-devel at crosswire.org
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the sword-devel