[sword-devel] Locale differences
dmsmith at crosswire.org
Thu Sep 13 06:12:24 MST 2012
On Sep 13, 2012, at 12:59 AM, Peter von Kaehne <refdoc at gmx.net> wrote:
> On 12/09/12 22:49, DM Smith wrote:>
>> I few years ago I scraped all the modules (GBF, ThML and OSIS; Bibles
>> and Commentaries) for their references. I then ran them through SWORD
>> and through JSword and compared what they interpreted the reference
>> to be.
>> When they differed, I took a look at the input to understand the
>> differences. IIRC, in every difference the input was ambiguous (e.g.
>> v10 for this chapter's 10-th verse or Jud found in the book of
>> Looking further at this list, the book names have always been
>> English. Probably because a reference of "1 Moses 2:2" in a module
>> might work only on occasion.
>> BTW, GBF doesn't and ThML might not have OSISrefs. (ThML can, but
>> there's no guarantee.)
>> And even if we converted all of our modules to have clean references,
>> there's no requiring a user to upgrade to the newer module. And some
>> modules, may not be currently available, but are in user's
>> collections. Who knows what they contain.
>> So far it has been easiest to have a single routine.
> OK, this illuminates something I raised a few years ago. I can not at
> the moment find the thread, otherwise I would raise it.
> Basically we have/had a fair bunch of buggy modules with references. The
> references were lazily formed and various frontends have various ways of
> coping with this.
> In OSIS <reference>Some_arbitrary_reference</reference> (or whatever the
> ThML/GBF equivalents are.
> No module of this kind should be accepted - and no module of this kind
> is currently getting submitted. I wrote a couple of years ago xreffix,pl
> to deal with this.
> A decent reference looks like
> <reference osisRef="[osisref]">Some_arbitrary_reference</reference>.
> In ThML decent references should look also different to above:
> The "lazy" version works sometimes. It works usually in English, but
> fails there too when references are ambiguous, e.g when they require a
> scope which needs to be known (like "v.10"), they do not work by
> necessity with different punctuation as separators, they totally fail
> with much non-English stuff (mostly dt pubctuation/separators).
> The "lazy" version should not be tolerated in our repos, I have
> resubmitted over the years, once I realised the problem, all modules I
> had submitted with poorly formed references. Others should also be updated.
> For OSIS modules the solution is there - xreffix.pl will parse and
> rewrite OSIS xml and create good references, including those who require
> a scope to be unambiguous. I would expect that ThML modules can be
> similarly treated. And GBF should be burned anyway.
> In general, I think none of the programmes should try and parse free
> text references in modules. Unlike failed attempts at parsing user
> input, which get corrected by the user who learns to construct his
> references properly, parsing free text references makes our modules look
I whole heartedly agree. However, the engines will still have to handle old modules.
It'd be good to have a module checker that would identify references that are not good enough (what ever that is).
Regarding the burning of GBF, my last count was 32 out of 351 modules in the CrossWire repository were GBF. I don't know how many have cross references, but the RX ... Rx is handled by the SWORD code.
More information about the sword-devel