[sword-devel] Why is OSIS preferred? Was Re: usfm2osis.pl

Chris Little chrislit at crosswire.org
Tue Jul 1 06:06:09 MST 2008

>>    ** ThML is xml, but is layered upon HTML. It does not separate
>> presentation from content. Cross-references are ad-hoc.
> ThML is also still (I think) used by the greatest percentage of our
> modules (though that may be changed in the future).
> Separating presentation from content is a nice idea, but I'm not
> convinced that it is good in all cases.  What happens with OSIS when a
> Bible publisher wants to insist that certain constructs in their Bible
> are formatted in certain ways?

First, content labeled as ThML is often *not* XML--but ThML from CCEL 
probably is validated against their DTD. ThML is based on the Voyager 
Strict HTML DTD with a few TEI-inspired elements added, but naturally 
hardly anyone ever validates against the DTD.

ThML remains the markup of a large percentage of our content, but that 
percentage is declining. New Bibles will always be OSIS (or plain). New 
commentaries will always be OSIS. New Dictionaries will probably be TEI 
(sometimes OSIS). New GenBooks will preferrably be OSIS or TEI, but 
might appear in ThML.

The OSIS TC answer to the question of mandated rendering with particular 
markup is: use a stylesheet. The CrossWire answer is to use <hi/> for 
styling or put information in type/subType to indicate rendering. But 
the issue hasn't ever actually come up.

>> * OSIS is a growing, maturing standard, addressing the short-comings of
>> other popular formats.
> And adding some of its own (its complexity comes to mind here, though
> possibly that is intrinsic given what it is trying to cover).
> In my view adding milestoning and so forth left the path of strictly
> hierarchical XML.  It's still valid XML, but it's not really what XML
> was intended to do.  I don't know enough to comment on whether this
> was really necessary or if there is a better way to do it, but it does
> mean that valid OSIS XML may not be valid OSIS (this is true of most
> XML formats, in fact - OSIS just carries it further than most).

Simple things are simple to encode. Complex things are more difficult.

If you look at Bibles encoded in ThML, GBF, or Zefania, it is absolutely 
trivial to perform the conversion. You can probably encode an OSIS Bible 
from any of these formats using 1:1 element substitution., without any 

OSIS' improvement over these formats is in its ability to encode much 
more complex Bibles as well. Milestoning is a necessity to encode 
multiple, overlapping hierarchies, such as are present in Bibles. What 
do you do with a Bible where Rev 12:17 begins in Rev 12 and ends in Rev 
13? In OSIS, you encode it as:

<verse osisID="Rev.12.17" sID="Rev.12.17"/>
<chapter osisID="Rev.13">
<title>Chapter 13</title>
<div type="section">
<verse eID="Rev.12.17"/>
<verse osisID="Rev.13.1" sID="Rev.13.1"/>

In other formats, you have to compromise the text. The cost of complex 
textual structure is complex markup.


More information about the sword-devel mailing list