[sword-devel] USFM conformance in usfm2osis.py

Chris Little chrislit at crosswire.org
Wed Aug 1 23:54:53 MST 2012

On 08/01/2012 01:11 AM, David Haslam wrote:
> Three poorly defined uses of USFM verse tags that we often encounter are as
> follows:
> Spanned verses (translators use a verse range), thus
> \v 7-11 Text for these five verses
> or (worse still, a naughty use of the comma delimiter)
> \v 3,4 Text for two consecutive verses
> Split verses (translators using composite verse tags with parts a and b of
> the text), e.g.
> \v 19a Text for the first part of verse nineteen
> \v 19b Text for the second part of verse nineteen

Based on my experience encoding USX, I believe these are all invalid. 
They're definitely invalid from the perspective of USX, where the verse 
values must be sequential numerals. But we necessarily need to treat 
them all as valid because a reasonable reading of the USFM documentation 
give no indication that these should be invalid.

> btw. Peter and I have collected a substantial body of real world USFM suites
> which you could probably use for testing your conversion script.

That's likely to be quite helpful soon, when I get to the point of 
writing regression tests. I've currently got the Open Bible Translation, 
WEB, and RV happily running through the script and generating valid 
OSIS. There are still 19 tags that usfm2osis.pl handles which I haven't 
addressed in usfm2osis.py. So I'd like to add handling for all of those 
tags, so that the new utility is at least as capable as the old. Then I 
may run it on Michael's collection of documents and complete coverage of 
its set of tags.

Then, I'll begin collecting markup samples, verifying that the script 
generates valid & reasonable output against those samples, and writing 
tests against the generated output so that we can be certain that future 
changes to the script do not break it.

Somewhere down the line I may incorporate xreffix.pl-type functionality 
as an option for those who have installed Sword bindings.

I'll definitely save the example USFM markup you included here and would 
welcome additional examples of potentially problematic markup.


More information about the sword-devel mailing list