[sword-devel] The poor man's interlinear

Peter von Kaehne refdoc at gmx.net
Mon Sep 17 13:31:29 MST 2012

While I will never wish to stop anyone from creating themselves more
work than necessary (as long as they do not take my taxes or tithes) I
remain in awe over the work created here and described as necessary, yet
being entirely unnecessary. And it prejudices me heavily against working
ever with the organisation who enquired from you. 

There is an existing conversion route. The route is called legacy font.

It converts a bizarre binary into something readable.

Every custom font item has specific rules which lead to its creation. 

Every custom font item has, we one adequate and correct unicode
representation. If there are more than one graphical representations
(like Cyrrilic and Latin 'a') then this is made irrelevant by the
language being assigned a certain area in unicode chosen for it (Latin,
Cyrillic, Arabic, whatever).

So the rule which leads to the selection of custom font items can be
without risk of error, duplication or indeed "corner cases" select a
single unicode item.

There may be a small number of select and well defined exceptions to
above rules 

1) Some custom font items may have several conversion rules leading to
it. This is only relevant if the custom font encorporated compromises
which can now get undone.
2) Some custom font items may after all not have a unicode equivalent.
This is unlikely, but in odd languages not impossible. Language specific
ligatures, language specific diacritics etc are the likely candidates in
"normal" scripts. These can be assigned empty unicode spaces and then
offered a custom font (after all)

All these matters require careful analysis, none require wholesale text
comparisons by eye.


On Mon, 2012-09-17 at 11:29 -0700, David Haslam wrote:
> Having pressed the matter further with my good friend at MissionAssist, here
> is his response:
> ---
> This sums up what NRSI told me when I began to look at machine checking of
> old vs new:
> "In doing automated checking, one has to be careful not to rely on processes
> which give a false impression of accuracy. For example, some people have
> proposed converting a file to Unicode, and then converting it back to
> legacy, and comparing the original legacy file to the final version. But
> that only tells you that the conversion table is reversible, not that it is
> accurate. A comparison which relies on the same mapping table as was used to
> do the conversion will only tell you whether the rules of the mapping table
> were applied as written. In general, comparison of two data sets is useful
> only if the two data sets were created by independent paths."
> ---
> My remarks follow:
> Having met one of the programmers (during the EMDC) who works for SIL's NRSI
> (on implementing the Graphite Engine), I hold them in the highest regard for
> their technical knowledge and skills. 
> So yes, the interlinear arrangement originally requested does serve only one
> purpose: to provide /additional /confirmation that the visual appearance of
> text in the Unicode version matches that of the original with the legacy
> font. 
> Some details have been omitted in this reply.
> Aside:  DM himself should recognize the truth of the statement "In general,
> comparison of two data sets is useful only if the two data sets were created
> by independent paths." That's precisely the background and underlying
> philosophy for the KJV2006 project.
> David
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/The-poor-man-s-interlinear-tp4650950p4651038.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

More information about the sword-devel mailing list