[sword-devel] Locale differences

Peter von Kaehne refdoc at gmx.net
Wed Sep 12 21:59:06 MST 2012

On 12/09/12 22:49, DM Smith wrote:>
> I few years ago I scraped all the modules (GBF, ThML and OSIS; Bibles
> and Commentaries) for their references. I then ran them through SWORD
> and through JSword and compared what they interpreted the reference
> to be.
> When they differed, I took a look at the input to understand the
> differences. IIRC, in every difference the input was ambiguous (e.g.
> v10 for this chapter's 10-th verse or Jud found in the book of
> Judith).
> Looking further at this list, the book names have always been
> English. Probably because a reference of "1 Moses 2:2" in a module
> might work only on occasion.
> BTW, GBF doesn't and ThML might not have OSISrefs. (ThML can, but
> there's no guarantee.)
> And even if we converted all of our modules to have clean references,
> there's no requiring a user to upgrade to the newer module. And some
> modules, may not be currently available, but are in user's
> collections. Who knows what they contain.
> So far it has been easiest to have a single routine.

OK, this illuminates something I raised a few years ago. I can not at
the moment find the thread, otherwise I would raise it.

Basically we have/had a fair bunch of buggy modules with references. The
references were lazily formed and various frontends have various ways of
coping with this.

In OSIS <reference>Some_arbitrary_reference</reference> (or whatever the
ThML/GBF equivalents are.

No module of this kind should be accepted - and no module of this kind
is currently getting submitted. I wrote a couple of years ago xreffix,pl
to deal with this.

A decent reference looks like

<reference osisRef="[osisref]">Some_arbitrary_reference</reference>.

In ThML decent references should look also different to above:


The "lazy" version works sometimes. It works usually in English, but
fails there too when references are ambiguous, e.g when they require a
scope which needs to be known (like "v.10"), they do not work by
necessity with different punctuation as separators, they totally fail
with much non-English stuff (mostly dt pubctuation/separators).

The "lazy" version should not be tolerated in our repos, I have
resubmitted over the years, once I realised the problem, all modules I
had submitted with poorly formed references. Others should also be updated.

For OSIS modules the solution is there - xreffix.pl will parse and
rewrite OSIS xml and create good references, including those who require
a scope to be unambiguous. I would expect that ThML modules can be
similarly treated. And GBF should be burned anyway.

In general, I think none of the programmes should try and parse free
text references in modules. Unlike failed attempts at parsing user
input, which get corrected by the user who learns to construct his
references properly, parsing free text references makes our modules look


More information about the sword-devel mailing list