[osis-users] Unambiguous and Consistent OSIS for Interchange: Stand-off Markup

Troy A. Griffitts scribe at crosswire.org
Wed Jan 20 00:19:52 MST 2010


Weston Ruter wrote:
> ... Troy, as you've said before, you can't 
> actually use OSIS as your raw data format at CrossWire because an OSIS 
> document can be authored in many different ways and so there is much 
> more programming logic that is needed to handle all of the possible OSIS 
> styles.

Hey Weston,

Hope to have time for a thoughtful response to more of your suggestions, 
but just wanted to clear a couple things up first:

I hope I never implied that we can't/don't use OSIS internally as our 
primary markup standard.

I did say that since OSIS allows different ways to mark the same 
structure, we have an importer which attempts to accept any valid OSIS 
doc and _normalizes_ that doc into a form of OSIS we find easiest for 
our engine to process.  It is still OSIS, just a form of OSIS with all 
structures represented in a single way.

Even so, we still don't use any plain text format as our "raw data 
format".  We typically compress and index documents when they are 
imported into our engine.  You can ask our engine for OSIS, HTML, RTF, 
GBF, ThML, or plaintext and it will do its best to give you the data in 
the requested format.

None of this to argue against your point: OSIS has multiple ways to 
encode a single structure in a document.

The real answer to this is not technical.  I too am frustrated with 
this.  But many people working at many organizations were consulted when 
developing the OSIS specification.  They gave great insights to how they 
work.  Sometimes they even made demands with an ultimatum that they 
would absolutely not use the specification if a certain feature was not 
added to the spec.

OSIS could have been technically finished in less than a year.  It took 
us 3 years to get buy-in from all the participating organizations.

In the end, the purpose of OSIS was to build collaboration between 
organizations.  We could have developed a much easier to use technical 
specification which no one would have used, or conceded to demands to 
gain buy-in, and augment the specification with a 'best practices' doc 
which recommends a single specific method for encoding OSIS.  We chose 
the later.

Implementing code against the spec now, it makes our importer a pain in 
the butt to write, but in the end, we get what we want: a single OSIS 
style that our engine knows how to work with, and multiple supporting 
organizations producing OSIS documents.


Troy.



If we could define a single document structure, however, one
> that is a subset of the freedom that OSIS provides (perhaps taking cues 
> from OXES), we could then have an XML format for scripture that would be 
> suited for efficient interchange and application traversal.
> 
> Currently we have the problem of two overlapping hierarchies: BSP and 
> BCV. However, there could be potentially multiple versification systems, 
> so there could be even more than two overlapping hierarchies, probably 
> why the <p> element isn't currently milestonable. To get around the 
> problem of overlapping hierarchies, what if we introduced stand-off 
> markup into the equation? The words of scripture themselves could all be 
> located in a flat structure as siblings; then in the header there could 
> be multiple CONCUR sections (views) that list out the elements which 
> belong to the various parts of the hierarchies
> 
> For example, the current approach:
> 
> <p>
>     <verse osisID="Example.1.1" sID="Example.1.1" />
>     <w id="w1">Then</w>
>     <w id="w2">he</w>
>     <w id="w3">said</w><w id="p1">,</w>
>     <q marker="“" sID="Example.1.1.q1" />
>         <w id="w4">Let</w>
>         <w id="w5">us</w>
>         <w id="w6">go</w><w id="p2">...</w>
> </p>
> <p>
>     <w id="w7">but</w>
>     <verse eID="Example.1.1" />
>     <verse osisID="Example.1.2" sID="Example.1.2"/>
>     <w id="w8">don't</w>
>     <w id="w9">forget</w>
>     <w id="w10">your</w>
>     <w id="w11">backpack</w><w id="p3">.</w>
>     <q marker="”" eID="Example.1.1.q1" />
>     <verse eID="Example.1.2" />
> </p>
> 
> 
> 
> Could instead appear as (I'm making up these element names):
> 
> <concur>
>     <view type="verse" osisID="Example.1.1" xpointer="range(#w1, #w7)" />
>     <view type="verse" osisID="Example.1.2" xpointer="range(#w8, #q2)" />
>     <view type="quote" xpointer="range(#q1, #q2)" />
>     <view type="para"  xpointer="range(#w1, #p2)" />
>     <view type="para"  xpointer="range(#w7, #q2)" />
> </concur>
> <content>
>     <w id="w1">Then</w>
>     <w id="w2">he</w>
>     <w id="w3">said</w><w id="p1">,</w>
>     <w id="q1">“</w><w id="w4">Let</w>
>     <w id="w5">us</w>
>     <w id="w6">go</w><w id="p2">...</w>
>     <w id="w7">but</w>
>     <w id="w8">don't</w>
>     <w id="w9">forget</w>
>     <w id="w10">your</w>
>     <w id="w11">backpack</w><w id="p3">.</w><w id="q2">”</w>
> </content>   
> 
> By structuring a document like this, multiple overlapping hierarchies 
> can be cleanly defined, although they are separated from the underlying 
> content: this however, provides the benefit of clearing up the confusion 
> as to where the <verse>, <p>, and <q> elements should be placed: in the 
> concur section, they each can share references to the same content 
> elements and so their boundaries are specified at the exact same 
> location. This means that XML processors would be able to consistently 
> handle each of the hierarchies as they interweave throughout the content 
> data.
> 
> Efraim Feinstein and James Tauber introduced me to this approach to 
> structuring markup. See also: 
> http://www.tei-c.org/Guidelines/P4/html/NH.html#NHCO
> 
> Weston
> 




More information about the osis-users mailing list