[sword-devel] osis2mod output (Bisaya-Inunhan NT)

Jonathan Marsden jmarsden at fastmail.fm
Wed Apr 29 12:38:14 MST 2009


DM Smith wrote:

>>   re-versified III John 1:15      as III John 1:13

> This is informational, perhaps a warning. It is saying that III John 
> 1:15 is not in the versification that you chose and that 3 John 1:13 was 
> the last verse that it wrote to the module that was before 3 John 1:15.
> If your input had 3 John 1:14, as an individual verse, then I'd say you 
> have hit a bug. I found this bug in the latest RC and fixed it in head, 
> revision 2353. It finds not the right verse, but one before it.

Hmm, the XML input file has both 3John.1.14 and 3John.1.15 in it.  This 
is XML generated by my own scripts from the MS Word document originals, 
and those scripts have had very limited testing, so it might easily be 
my fault...

No, there really *is* a 3John 1:15 in the source Word document.  Looks 
like it splits what KJV and NIV consider 3 John 14 into two verses, at 
the sentence break.  Could this be a translator error??  If not, what 
versification schemes do have that verse, but are otherwise compatible 
with KJV versification?

More generally, how can a module developer find out which versification 
schemes have a verse X Y:Z for any given values of book X, chapter Y and 
verse Z?

I "cheated" and looked at include/canon*.h, and there are only 2 
versifications (KJV and Leningrad) available to me in SWORD 1.6.0RC2. 
Of those two, Leningrad has no NT at all, so by definition I have to use 
KJV for an NT input document -- right?

This makes the whole idea that I am using an incorrect versification 
scheme seem much less likely, to me.  There just aren't enough choices 
available (yet!) for it to be *possible* for me to make that mistake :)

> I recommend building from head (maybe there'll be another release 
> candidate soon)

OK.  Something seems slightly odd if my role as module creator requires 
newer versions of SWORD than my role as packager does, though :)

>>   Appending entry: Jude:

> This is also informational, perhaps a warning. It pairs with the 
> "re-versified ..." message and should not have read "Jude" under any 
> circumstance. It should have read "Appending entry: III John 1:14" This 
> too was fixed in revision 2353.

OK, thanks.

>>   re-versified Revelation of John 12:18   as Revelation of John 12:16
>>   Appending entry: Rev.13:

This is the same kind of deal, I think.  The original MS Word doc really 
does have a Rev.12.18 in it, which KJV and NIV do not.

>> (2) Enhancement request #2: osis2mod does not seem to exit with an 
>> exit code that reflects the worst issue found during the run -- I'm 
>> seeing an exit code of zero despite these messages.

> I'm not sure about this one. If a fatal error is encountered, it is 
> definitely appropriate to have a documented, non-zero exit code. If 
> there were "recoverable" problems (e.g. re-versification), I don't know 
> that it is appropriate to have a non-zero exit code. I have always 
> thought it was the UNIX tradition to use a non-zero exit code when the 
> results of the run could not be trusted.

Yes, so it comes down to whether the re-versification "can be trusted", 
or not, and my sense is that for most input data it cannot, because 
seeing any such messages at all indicates that an incorrect 
(mismatching) versification has been selected, and so osis2mod should be 
rerun with a different versification?

See http://www.catb.org/~esr/fetchmail/fetchmail-man.html for an example 
of a well known Unix utility that generates a wide variety of exit 
codes, not all of which indicate total failure or an untrustworthy 
result (exit codes 1 and 9 in particular).

>> (3) Enhancement request #3: If these messages could include a line 
>> number from the original OSIS input file, or a line or two of it at 
>> the point of the info/warning/error, that would really help.

> Giving context to the error is fine.
> 
> Input files are not required to have line breaks. Some have very long 
> lines and because of the potential richness of the markup the input 
> might be difficult to read.
> 
> I'm not sure how to give a better context.  Ideas welcomed.

Line number and character offset into that line, maybe?  Short and 
unambiguous:

WARNING: Line 1234: Offset 42: something unexpected happened

> The messages sometimes give "normalized" verse references and other 
> times give osisIDs. If the references were all osisIDs (this is OSIS 2 
> Module after all) then it would be trivial to find the offending spot 
> with something like:
> egrep 'osisID="[^"]*Matt.1.5' input.xml
> 
> Would that work?

Probably, now the off-by-one issue in the messages is fixed.

>> BOTTOM LINE: Is re-versifying these things "bad"?

> No. It means that the versification (probably the KJV) does not include 
> those two verses as indexable in the index, but to be faithful to the 
> text (not the versification) they are going to be appended to the 
> nearest, prior verse.

> At this time, they are an indication that another versification might be 
> better. I don't know if merely trying each of the available 
> versifications is a good idea or not. Asking here which to use based on 
> the output of the program might be a way to find the best versification.

OK.  I'll also punt this whole issue back to David Haslam, and see what 
he knows about the versification used in the original text.

>> Do I need to get osis2mod to run with no output before I can consider 
>> the module "clean"?

> No, but you need to understand them and decide whether you can live with 
> the results. If there were alot of re-versification messages then it 
> would be best to find a better v11n scheme.

Since the original documents have the "extra" verses, but only a couple, 
  my suspicion is that either they are a mistake in the original, or 
they are using a versification no-one else uses.

In either case, for real faithfulness to the original text, presumably 
one "should" create a versification scheme to exactly match that 
document, and use that.  Right now (as far as I know!) the printed copy 
of this particular NT has 3John 1:15 -- so for fidelity to that 
original, the SWORD module should have a 3John 1:15.  Logically it 
should not force things and stuff that verse into 3John 1:14, since that 
is not what the original text does.

But this kind of thing would need the ability for the library to accept 
dynamically generated custom versifications, which at the moment it does 
not seem able to handle?  Is this something planned for a future 
release?  Or am I looking at this "all wrong"?

>> Is there a Wiki page or other documentation that explains these 
>> messages that I should have read?

> The wiki page,www.crosswire.org/wiki/Osis2mod, should have it. That's a 
> great idea. I am using the wiki as the man page for the program.

> If I don't get to it soon, feel free to add to the page. Perhaps a 
> section on Errors, Warnings and Informational Messages with an example 
> message and another on Return Codes. I'll be glad to fill in the details 
> and keep it up to date.

I think   grep -C3 cout utilities/osis2mod.cpp   should find all the 
possible error message outputs, and then ignoring all the ones that are 
#ifdefed out unless you are debugging, it should be doable to get a list 
of all possible messages.  Maybe something to try tonight.

> Two other pages that might be of interest are:
> www.crosswire.org/wiki/OSIS_Bibles - gives best practices
> and
> www.crosswire.org/wiki/Alternate_Versification - explains av11n, though 
> this page is in it's early stages.

Yes, those I found (and read) earlier.  The OSIS_Bibles page in 
particular was useful when I was creating my scripts that convert the 
Word documents into OSIS on Monday night.

[Aside: I think it would be good if mkswordtar grabbed the CrossWire 
wiki contents, and the SWORD API primer, and included them both in the 
release source tarball... would a patch from me that did this have a 
good change of being accepted, if not for 1.6 then for 1.6.1?]

One more minor request: osis2mod currently does not seem to accept XML 
from stdin, either with just a module name parameter, or if you specify 
- as the input filename.  It would be good if one or other of those 
worked (for people like me who create scripts that generate XML on 
stdout, and want to pipe it into to things, e.g. xmllint or osis2mod :)

Thanks,

Jonathan



More information about the sword-devel mailing list