[osis-users] Unambiguous and Consistent OSIS for Interchange: Stand-off Markup

Weston Ruter westonruter at gmail.com
Tue Jan 19 23:45:05 MST 2010


To follow up on my previous response on "OSIS with TEI and XHTML5 (was Fwd:
Standardizing on a Web Infrastructure and Web Service API for Scripture)":
http://www.crosswire.org/pipermail/osis-users/2010-January/000015.html

On Mon, Jan 18, 2010 at 12:01 PM, Troy A. Griffitts <scribe at crosswire.org>wrote:

> OSIS, though in someways unwieldy, gives me a finite set of tags to handle
> when writing software to parse OSIS.  If OSIS were to import TEI tags
> directly, I am sure there are plenty of global attributes and other aspects
> of the TEI specification regarding those attributes that I would have to
> handle as a software engineer, even though they will likely never be used or
> worth allowing for use against the time it would take me to implement code
> to handle all aspects of those TEI tags and their children/attributes.  And
> per my first question above, do you have a particular use case where it
> would be advantageous that the tags were actually imported from the TEI
> specification?
>

Having a finite set of tags and the benefits thereof also raises the issue
of the need for an unambiguous document structure, or rather a single
document structure. Troy, as you've said before, you can't actually use OSIS
as your raw data format at CrossWire because an OSIS document can be
authored in many different ways and so there is much more programming logic
that is needed to handle all of the possible OSIS styles. If we could define
a single document structure, however, one that is a subset of the freedom
that OSIS provides (perhaps taking cues from OXES), we could then have an
XML format for scripture that would be suited for efficient interchange and
application traversal.

Currently we have the problem of two overlapping hierarchies: BSP and BCV.
However, there could be potentially multiple versification systems, so there
could be even more than two overlapping hierarchies, probably why the <p>
element isn't currently milestonable. To get around the problem of
overlapping hierarchies, what if we introduced stand-off markup into the
equation? The words of scripture themselves could all be located in a flat
structure as siblings; then in the header there could be multiple CONCUR
sections (views) that list out the elements which belong to the various
parts of the hierarchies

For example, the current approach:

<p>
    <verse osisID="Example.1.1" sID="Example.1.1" />
    <w id="w1">Then</w>
    <w id="w2">he</w>
    <w id="w3">said</w><w id="p1">,</w>
    <q marker="“" sID="Example.1.1.q1" />
        <w id="w4">Let</w>
        <w id="w5">us</w>
        <w id="w6">go</w><w id="p2">...</w>
</p>
<p>
    <w id="w7">but</w>
    <verse eID="Example.1.1" />
    <verse osisID="Example.1.2" sID="Example.1.2"/>
    <w id="w8">don't</w>
    <w id="w9">forget</w>
    <w id="w10">your</w>
    <w id="w11">backpack</w><w id="p3">.</w>
    <q marker="”" eID="Example.1.1.q1" />
    <verse eID="Example.1.2" />
</p>



Could instead appear as (I'm making up these element names):

<concur>
    <view type="verse" osisID="Example.1.1" xpointer="range(#w1, #w7)" />
    <view type="verse" osisID="Example.1.2" xpointer="range(#w8, #q2)" />
    <view type="quote" xpointer="range(#q1, #q2)" />
    <view type="para"  xpointer="range(#w1, #p2)" />
    <view type="para"  xpointer="range(#w7, #q2)" />
</concur>
<content>
    <w id="w1">Then</w>
    <w id="w2">he</w>
    <w id="w3">said</w><w id="p1">,</w>
    <w id="q1">“</w><w id="w4">Let</w>
    <w id="w5">us</w>
    <w id="w6">go</w><w id="p2">...</w>
    <w id="w7">but</w>
    <w id="w8">don't</w>
    <w id="w9">forget</w>
    <w id="w10">your</w>
    <w id="w11">backpack</w><w id="p3">.</w><w id="q2">”</w>
</content>

By structuring a document like this, multiple overlapping hierarchies can be
cleanly defined, although they are separated from the underlying content:
this however, provides the benefit of clearing up the confusion as to where
the <verse>, <p>, and <q> elements should be placed: in the concur section,
they each can share references to the same content elements and so their
boundaries are specified at the exact same location. This means that XML
processors would be able to consistently handle each of the hierarchies as
they interweave throughout the content data.

Efraim Feinstein and James Tauber introduced me to this approach to
structuring markup. See also:
http://www.tei-c.org/Guidelines/P4/html/NH.html#NHCO

Weston
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/osis-users/attachments/20100119/7c2aa0df/attachment.html>


More information about the osis-users mailing list