[sword-devel] Devanagari text displays different in SWORD than in the source IMP file
tchase at maf.org
Tue Sep 8 10:31:57 MST 2009
Another related issue with regards to zwj and zwnj is how these characters
are handled by the front end applications with regards to searching for
words. Why would it make a difference? Some words in devanagari script can
be spelled with or without the zwj character and still be valid. I did a
small test to see how SPW and BPB handle the searching. For BPB it made no
difference if the zwj was included or not. In both cases I got 33 hits for
the word in my test module for MARK in Nepali. For SPW I got zero hits
without the zwj and 9 hits when I included the zwj which is how the word was
keyed in the source.
BPB brings up hits that are not exact matches but very close while SPW will
only bring up hits with exact matches that include the zwj.
I found an interesting web page that shows how various search engines handle
the zwj characters. Google finds these words with or without the zwj.
Thought it might add something more to the discussion.
BPB - BP Bible
SPW - SWORD Project for Windows
From: Chris Little [mailto:chrislit at crosswire.org]
Sent: Tuesday, September 08, 2009 2:05 PM
To: SWORD Developers' Collaboration Forum
Subject: Re: [sword-devel] Devanagari text displays different in SWORD than
in the source IMP file
Before anyone starts making authoritative statements about ZWJ or ZWNJ
in various modules and their reflexes in front ends, I would like to see
some sort of proof that this is even relevant to the problem.
If ZWNJ is present in the module, it isn't being changed by Sword or by
BibleCS. If you copy text from BibleCS and paste it into an editor that
renders things correctly, such as BabelPad or Notepad, you get back the
correct rendering--so it's not inserting, deleting, or changing codepoints.
My own feeling is that the problem lies in the renderer used by various
front ends. And specific to BibleCS, I suspect we can fix the issue by
compiling in a more recent version of C++Builder (which I'll try to do,
when I get a chance, unless Troy beats me to it).
Font choice is important. You have to use a font with the correct font
tables. (Graphite tables would work, but OpenType are entirely
sufficient for this kind of Indic application.) However, the fonts named
in the initial post and my testing further to that report demonstrate
that even fonts with good OT tables won't render correctly in BibleCS's
David Haslam wrote:
> Zero Width Joiner and non-Joiner
> We should gather the evidence we have collected about zwj & zwnj (in e.g.
> Devanagari) by adding a new row in the table on this wiki page.
> Ideally, the new row should be below the one for Complex Scripts. This
> help clarify the situation for everyone. A single footnote should provide
> the background and give the explanation.
> -- David Haslam
sword-devel mailing list: sword-devel at crosswire.org
Instructions to unsubscribe/change your settings at above page
More information about the sword-devel