[sword-devel] Food for thought regarding OSIS and some of its alternatives...

Kahunapule Michael Johnson kahunapule at mpj.cx
Tue Feb 7 05:46:32 MST 2006

Troy A. Griffitts wrote:
> MPJ,
>     Hello my friend.  It's good to hear from you.  It seems like
> 2/3rds of your issues with OSIS are having to do with <q>.  May I
> suggest patience to review what comes out of the last OSIS meeting
> back in December.  We had a good hard look at practical uses of <q>,
> and believe me, your concerns have been heard.
Maybe, but my questions have not been answered, and the delay has been
exceedingly long. Even if those issues have fully been addressed, I
doubt the solution is even as good as USFX. In any case, OSIS will never
be simple enough to be as elegant, reliable, and accepted by programmers
as a common Bible file format standard should be. If it were made that
way, it wouldn't really be compatible with old OSIS texts. It would be
something else.
>     I agree OSIS allows too many legal ways to markup the same text.
>     I disagree that OSIS HAS TO BE too complex for people to use, or
> cannot fully capture any other Bible markup.
I'm not sure what you mean, here. OSIS is more complex than it should be
to represent all of the information that it can represent. It is more
complex than it needs to be, yet it still cannot (as of the latest
PUBLISHED, DOCUMENTED version) capture the most common Bible markup in
the world without losing information. There is a difference between
complexity and flexibility. You can have a flexible standard without
making it as complex as OSIS.
>   Worse case, you always have <seg> and x- attribute values.
Worst case, indeed. There is no guarantee that if you use them for
anything important that they will be used as intended.
>     I disagree that OSIS has slowed down development here.  It COULD
> HAVE slowed down development here if we tried to actually work on our
> osis2mod converter to handle a broader range of legal OSIS markup, but
> up to now, we pretty much encode our OSIS texts the way we want and
> that pretty much is defined by what our OSIS importer expects.
Yes, but you miss the point... you aren't making a fully valid OSIS
reader. If you did, it almost certainly would have slowed you down. As
it is now, you really can't be sure that you can read any OSIS text
anyone else might generate after reading the schema and documentation
themselves. This is hardly a good situation for a "standard" for
Scripture text markup. Sounds pretty subjective to me... maybe your file
will import. Maybe not. Sure, it is OSIS, but not our dialect...

You see, I did my best encoding OSIS texts a long time ago, and the
efforts were rejected as not fully conformant by programmers writing
OSIS readers that were also not fully conformant. It seems to me that
this fully conformant OSIS is a difficult mythological beast to find.
>     I think you hit the nail on the head when you talked about XML
> catering to a tree view, which is NOT what written document, like
> Bibles are marked up as.  This has posed the largest problem, in my
> opinion, for OSIS as an XML schema.
>     Basically, to sum up and offer a challenge.  How, in legal XML, do
> you markup multiple overlapping hierarchies like:
> paragraph markers
> verse markers
> chapter markers
> quote markers (nested)
> poetry lines
> linguistic annotation
> critical source apparatus
> We struggled with this and came up with what you suggest in you paper:
> milestones.  And I think we've tried our best to make the milestoning
> syntax straightforward:
> <verse osisID="Jn.1.1">
> In the beginning...
> </verse>
> or:
> <verse sID="uniqueID" osisID="Jn.1.1" />
> In the beginning...
> <verse eID="uniqueID" />
OSIS milestones are overly complex and prone to error when manually
coded. That is compounded by the fact that OSIS waffles on what is
primary (book/chapter/verse or book/section/paragraph or poetry
stanza/verse) in terms of XML encoding, allowing the choice of milestone
or container at the whim of the encoder. The decoder must deal with a
multitude of possibilities, including bizarre situations like
overlapping verses that don't actually happen in any Bible text I'm
aware of. (I'm not talking about alternate verse marking systems applied
to the same text, but one verse marking system overlapping its own
verses.) It all looks like an afterthought patchwork to me. I can (and
did) do better than OSIS.
> I agree that you lose much of the advantage of XML processing tools
> that depend on a DOM tree hierarchy.
> I don't use XSLT, so I don't really care :)
> I do, however, use Java and C++ to parse OSIS just fine-- at least the
> VALID OSIS that we use here.  And I'm quite happy with it.  In fact
> we've built some pretty amazing tools to publish these feature-rich
> documents online, e.g. http://crosswire.org/study/
> We STILL support GBF and ThML, but I'm still on the OSIS bandwagon
> because I believe collaboration toward a common markup is invaluable.
> I'd rather support 1 complex markup than 3 different markups that all
> handle the easy stuff 3 different ways, and punt on the harder issues.
I'd rather support a simple markup that handles complex situations when
necessary, but makes all the easy stuff truly easy. Sure, a common
markup is valuable, but not when it is not the best available markup.
Persisting in promoting one markup that hasn't really caught on because
of its problems when a better one is available won't necessarily make it
more likely that you will get a widely-accepted standard. Indeed, you
could kill off a better standard only to see your favorite pet die of
its own birth defects. It is hard to dispassionately look logically at
your "baby" project and decide to go with something better...
> I appreciate and share your practical spirit.
Thank you.

> Kahunapule Michael Johnson wrote:
>>   Why Use OSIS When USFM and USFX Work Better?
>> /By (Kahunapule) Michael Johnson, //http://kahunapule.org/
>> <http://kahunapule.org/>
>> /6 February 2006/
>>     Conclusion
>> OSIS is a poor choice for a standard Scripture archiving, authoring,
>> and interchange format for members of the Forum of Bible Translation
>> Agencies. Its inadequacies can be patched, but it probably can't be
>> made truly good without violating backward compatibility constraints.
>> Using OSIS is likely to make software development efforts more costly
>> and slower than necessary. OSIS is not better than USFM, overall. I
>> present a viable XML alternative, below. It is likely that other
>> options that are better than OSIS exist or could be created. In the
>> mean time, OSIS should be considered experimental, and not used for
>> production uses like drafting, checking, publishing, or archiving of
>> Scripture unless USFM equivalents are kept up-to-date and stored
>> along side of the OSIS texts.

More information about the sword-devel mailing list