[sword-devel] osis2mod output (Bisaya-Inunhan NT)

DM Smith dmsmith at crosswire.org
Wed Apr 29 14:46:37 MST 2009

Jonathan Marsden wrote:
> DM Smith wrote:
>>>   re-versified III John 1:15      as III John 1:13
>> This is informational, perhaps a warning. It is saying that III John 
>> 1:15 is not in the versification that you chose and that 3 John 1:13 
>> was the last verse that it wrote to the module that was before 3 John 
>> 1:15.
>> If your input had 3 John 1:14, as an individual verse, then I'd say 
>> you have hit a bug. I found this bug in the latest RC and fixed it in 
>> head, revision 2353. It finds not the right verse, but one before it.
> Hmm, the XML input file has both 3John.1.14 and 3John.1.15 in it.  
> This is XML generated by my own scripts from the MS Word document 
> originals, and those scripts have had very limited testing, so it 
> might easily be my fault...
> No, there really *is* a 3John 1:15 in the source Word document.  Looks 
> like it splits what KJV and NIV consider 3 John 14 into two verses, at 
> the sentence break.  Could this be a translator error??
No it is not a translator's error. It is correct.
>   If not, what versification schemes do have that verse, but are 
> otherwise compatible with KJV versification?
AFAICT, this differs from the KJV in having 3 John 1:15 and Revelation 

The OSIS manual gives the NRSVA as the standard (it is NRSV "with 
Apocrypha"). Here is our copy of it:

> More generally, how can a module developer find out which 
> versification schemes have a verse X Y:Z for any given values of book 
> X, chapter Y and verse Z?
> I "cheated" and looked at include/canon*.h, and there are only 2 
> versifications (KJV and Leningrad) available to me in SWORD 1.6.0RC2. 
> Of those two, Leningrad has no NT at all, so by definition I have to 
> use KJV for an NT input document -- right?
Right NRSV and NRSVA are not there yet. Don't know if they ever will be. 
As you noted, the KJV is the only one available for you now.
> This makes the whole idea that I am using an incorrect versification 
> scheme seem much less likely, to me.  There just aren't enough choices 
> available (yet!) for it to be *possible* for me to make that mistake :)


>> I recommend building from head (maybe there'll be another release 
>> candidate soon)
> OK.  Something seems slightly odd if my role as module creator 
> requires newer versions of SWORD than my role as packager does, though :)
>>>   Appending entry: Jude:
>> This is also informational, perhaps a warning. It pairs with the 
>> "re-versified ..." message and should not have read "Jude" under any 
>> circumstance. It should have read "Appending entry: III John 1:14" 
>> This too was fixed in revision 2353.
> OK, thanks.
>>>   re-versified Revelation of John 12:18   as Revelation of John 12:16
>>>   Appending entry: Rev.13:
> This is the same kind of deal, I think.  The original MS Word doc 
> really does have a Rev.12.18 in it, which KJV and NIV do not.
>>> (2) Enhancement request #2: osis2mod does not seem to exit with an 
>>> exit code that reflects the worst issue found during the run -- I'm 
>>> seeing an exit code of zero despite these messages.
>> I'm not sure about this one. If a fatal error is encountered, it is 
>> definitely appropriate to have a documented, non-zero exit code. If 
>> there were "recoverable" problems (e.g. re-versification), I don't 
>> know that it is appropriate to have a non-zero exit code. I have 
>> always thought it was the UNIX tradition to use a non-zero exit code 
>> when the results of the run could not be trusted.
> Yes, so it comes down to whether the re-versification "can be 
> trusted", or not, and my sense is that for most input data it cannot, 
> because seeing any such messages at all indicates that an incorrect 
> (mismatching) versification has been selected, and so osis2mod should 
> be rerun with a different versification?
> See http://www.catb.org/~esr/fetchmail/fetchmail-man.html for an 
> example of a well known Unix utility that generates a wide variety of 
> exit codes, not all of which indicate total failure or an 
> untrustworthy result (exit codes 1 and 9 in particular).

I'll start by documenting the exit codes that we have. And the messages 
that are produced.

When it runs to completion, it might encounter several different kinds 
of conditions. For a non-zero exit, I'm inclined to use a bit map, if we 
go this route. Troy and Chris are primary users of it and I'd like their 
feedback to see if they script anything that depends upon the return codes.

>>> (3) Enhancement request #3: If these messages could include a line 
>>> number from the original OSIS input file, or a line or two of it at 
>>> the point of the info/warning/error, that would really help.
>> Giving context to the error is fine.
>> Input files are not required to have line breaks. Some have very long 
>> lines and because of the potential richness of the markup the input 
>> might be difficult to read.
>> I'm not sure how to give a better context.  Ideas welcomed.
> Line number and character offset into that line, maybe?  Short and 
> unambiguous:
> WARNING: Line 1234: Offset 42: something unexpected happened
The loop reads one character at a time. I can count the number of 
newlines in the file and the number of characters read since the last 
newline, and that will be close.

I have a question though, does someone have a portable C++ way to 
identify a new line? Mac, Windows and Unix use different combinations of 
\n and \r. At this time we don't care how the lines are ended in the 
file. As far as osis2mod cares, it is just a character that returns true 
to isspace(ch), that is, whitespace.

>> The messages sometimes give "normalized" verse references and other 
>> times give osisIDs. If the references were all osisIDs (this is OSIS 
>> 2 Module after all) then it would be trivial to find the offending 
>> spot with something like:
>> egrep 'osisID="[^"]*Matt.1.5' input.xml
>> Would that work?
> Probably, now the off-by-one issue in the messages is fixed.
>>> BOTTOM LINE: Is re-versifying these things "bad"?
>> No. It means that the versification (probably the KJV) does not 
>> include those two verses as indexable in the index, but to be 
>> faithful to the text (not the versification) they are going to be 
>> appended to the nearest, prior verse.
>> At this time, they are an indication that another versification might 
>> be better. I don't know if merely trying each of the available 
>> versifications is a good idea or not. Asking here which to use based 
>> on the output of the program might be a way to find the best 
>> versification.
> OK.  I'll also punt this whole issue back to David Haslam, and see 
> what he knows about the versification used in the original text.
>>> Do I need to get osis2mod to run with no output before I can 
>>> consider the module "clean"?
>> No, but you need to understand them and decide whether you can live 
>> with the results. If there were alot of re-versification messages 
>> then it would be best to find a better v11n scheme.
> Since the original documents have the "extra" verses, but only a 
> couple,  my suspicion is that either they are a mistake in the 
> original, or they are using a versification no-one else uses.
> In either case, for real faithfulness to the original text, presumably 
> one "should" create a versification scheme to exactly match that 
> document, and use that.  Right now (as far as I know!) the printed 
> copy of this particular NT has 3John 1:15 -- so for fidelity to that 
> original, the SWORD module should have a 3John 1:15.  Logically it 
> should not force things and stuff that verse into 3John 1:14, since 
> that is not what the original text does.
> But this kind of thing would need the ability for the library to 
> accept dynamically generated custom versifications, which at the 
> moment it does not seem able to handle?  Is this something planned for 
> a future release?  Or am I looking at this "all wrong"?

Peter gave a good response to this. The problem with a dynamically 
generate custom versification is that it won't be mappable. Mapping is 
important to parallel display and to using references generated against 
one versification with a Bible using another.

>>> Is there a Wiki page or other documentation that explains these 
>>> messages that I should have read?
>> The wiki page,www.crosswire.org/wiki/Osis2mod, should have it. That's 
>> a great idea. I am using the wiki as the man page for the program.
>> If I don't get to it soon, feel free to add to the page. Perhaps a 
>> section on Errors, Warnings and Informational Messages with an 
>> example message and another on Return Codes. I'll be glad to fill in 
>> the details and keep it up to date.
> I think   grep -C3 cout utilities/osis2mod.cpp   should find all the 
> possible error message outputs, and then ignoring all the ones that 
> are #ifdefed out unless you are debugging, it should be doable to get 
> a list of all possible messages.  Maybe something to try tonight.


>> Two other pages that might be of interest are:
>> www.crosswire.org/wiki/OSIS_Bibles - gives best practices
>> and
>> www.crosswire.org/wiki/Alternate_Versification - explains av11n, 
>> though this page is in it's early stages.
> Yes, those I found (and read) earlier.  The OSIS_Bibles page in 
> particular was useful when I was creating my scripts that convert the 
> Word documents into OSIS on Monday night.
> [Aside: I think it would be good if mkswordtar grabbed the CrossWire 
> wiki contents, and the SWORD API primer, and included them both in the 
> release source tarball... would a patch from me that did this have a 
> good change of being accepted, if not for 1.6 then for 1.6.1?]
> One more minor request: osis2mod currently does not seem to accept XML 
> from stdin, either with just a module name parameter, or if you 
> specify - as the input filename.  It would be good if one or other of 
> those worked (for people like me who create scripts that generate XML 
> on stdout, and want to pipe it into to things, e.g. xmllint or 
> osis2mod :)

This would be good. Not sure if it is portable to Windows. I'm thinking 
that if a '-' is found where the input file name is expected it would 
read standard in. I used to know how to do this and it will take me a 
bit of time to remember. A patch or a snippet of code would be great!

(I coded C++ from 1.0 up until 3.0 on a regular basis, but it has been 
too long since then, so help is appreciated!)

Jonathan, again thanks. It is new users who help make the process easier.

I have entered your 4 suggestions into Jira at:

That way I won't forget them.

In Him,

More information about the sword-devel mailing list