[osis-users] Unambiguous and Consistent OSIS for Interchange: Stand-off Markup

Weston Ruter westonruter at gmail.com
Sun Jan 24 14:53:22 MST 2010


To follow up again, here is the Open Siddur project's writeup on the XML
schema their came up with (JLPTEI) and why they didn't go with OSIS. The
problem of concurrent hierarchies was a major concern:

> The primary question then becomes: which structure should be encoded? Prose
> can be divided into paragraphs and sentences, poetic text can be divided
> into line groups and verse lines, lists into items and lists, etc. Many
> parts of the *siddur* have more than one structure on the same text! XML
> assumes that a document has a pure hierarchical tree structure. This
> suggests that XML is not an appropriate encoding technology for the *
> siddur*. At the same time, XML encoding is nearly universally standard and
> more software tools support XML-based formats than other encoding formats.
> One of the primary innovations of JLPTEI is its particular encoding of
> concurrent structural hierarchies. While the idea is not novel, the
> implementation is. The potential for the existence of concurrent structure
> is a guiding force in JLPTEI design.
>
> The disadvantage of JLPTEI's encoding solutions is that the archival form
> of the text is not immediately consumable by humans. We are forced to rely
> extensively on processing software to make the format editable and
> displayable. The disadvantage, however, is balanced by the encoding format's
> extensibility and conservation of human labor.
>
> The Open Siddur intends to work within open standards whenever possible. In
> choosing a basis for our encoding, we searched for available encoding
> standards that would suit our purposes. We seriously considered using Open
> Scripture Information Standard <http://bibletechnologies.net/> (OSIS), an
> XML format used for encoding bibles. It was quickly discovered that
> representations of some of the more advanced features required to encode the
> liturgy (such as those discussed above) would have to be "hacked" on top of
> the standard. The Text Encoding Initiative <http://www.tei-c.org/> (TEI)
> XML format is a de-facto standard within the digital humanities community.
> It is also is specified in well-documented texts, is actively supported by
> tools, and has a large community built around its use and development.
> Further, the standard is deliberately extensible using a relatively simple
> mechanism. The TEI was therefore a natural choice as a basis for our
> encoding.
>


More information about the osis-users mailing list