No subject


Sun Jan 3 17:45:15 MST 2010


On Sun, Jan 24, 2010 at 12:37 AM, Weston Ruter <westonruter at gmail.com>wrote=
:

> Attached is an example of what the ESV could look like as the result of a
> web service API response for 1 John 5:7-8, including virtual elements and
> stand-off markup. The relevant fragment:
>
> <concurrent>
>     <!--
>     @virtual can be "start", "end", "both", or "none" (default)
>     target attribute used by Open Siddur; Efraim Feinstein notes range()
>     is a TEI-defined XPointer scheme:
>     http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SATS
>     Alternative would be to use @sID and @eID
>     -->
>     <p virtual=3D"both" target=3D"#range(w6200500701, w6200500812)"
> /><!--sID=3D"w6200500701" eID=3D"w6200500706b"-->
>     <verse osisID=3D"1John.5.7" target=3D"#range(h6200500601, p6200500706=
)"
> /><!--sID=3D"w6200500701" eID=3D"p6200500706"-->
>     <verse osisID=3D"1John.5.8" target=3D"#range(w6200500801, p6200500812=
)"
> /><!--sID=3D"w6200500801" eID=3D"p6200500812"-->
> </concurrent>
> <content><!-- isn't @scope=3D"1John.5.7-1John.5.8" redundant here? -->
>     <title ID=3D"h6200500601" canonical=3D"false" virtual=3D"true">Testim=
ony
> Concerning the Son of God</title>
>     <w ID=3D"w6200500701">For</w>
>     <w ID=3D"w6200500702">there</w>
>     <w ID=3D"w6200500703">are</w>
>     <w ID=3D"w6200500704">three</w>
>     <w ID=3D"w6200500705">that</w>
>     <w ID=3D"w6200500706">testify</w><w ID=3D"p6200500706">:</w>
>     <w ID=3D"w6200500801">the</w>
>     <w ID=3D"w6200500802">Spirit</w>
>     <w ID=3D"w6200500803">and</w>
>     <w ID=3D"w6200500804">the</w>
>     <w ID=3D"w6200500805">water</w>
>     <w ID=3D"w6200500806">and</w>
>     <w ID=3D"w6200500807">the</w>
>     <w ID=3D"w6200500808">blood</w><w ID=3D"p6200500808">;</w>
>     <w ID=3D"w6200500809">and</w>
>     <w ID=3D"w6200500810">these</w>
>     <w ID=3D"w6200500811">three</w>
>     <w ID=3D"w6200500812">agree</w><w ID=3D"w6200500812">.</w>
> </content>
>
>
>
>
> On Thu, Jan 21, 2010 at 9:40 AM, Weston Ruter <westonruter at gmail.com>wrot=
e:
>
>> Troy:
>>
>> I did say that since OSIS allows different ways to mark the same
>>> structure, we have an importer which attempts to accept any valid OSIS =
doc
>>> and _normalizes_ that doc into a form of OSIS we find easiest for our e=
ngine
>>> to process.  It is still OSIS, just a form of OSIS with all structures
>>> represented in a single way.
>>>
>>
>> Thank you for clarifying this, and also for sharing some of this history
>> behind the development of OSIS.
>>
>> [We chose to] augment the specification with a 'best practices' doc whic=
h
>>> recommends a single specific method for encoding OSIS.
>>>
>>
>> I don't think I have seen this best practices doc. Is this something you
>> use internally at CrossWire as part of your importer script? Could you
>> direct me to it? I like the approach you took, allowing varying OSIS
>> encodings but recommending only one of them. This is similar to the
>> development of XHTML 1.0 dialects, where you are allowed to use the
>> Transitional doctype, but the Strict doctype is recommended. Doing this =
for
>> OSIS could answer the need for an unambiguous single markup language. Th=
e
>> best practices document would need to contain the practices that are
>> endorsed by at least the majority of players; the others could abstain a=
nd
>> still use their preferred (deprecated) dialect of OSIS. Along with this =
best
>> practices doc, an official normalizer script that translates OSIS into t=
he
>> recommended encoding would be great.
>>
>> I look forward to your thoughts about stand-off markup encoding of OSIS,
>> especially how well it might serve as the new recommended way to
>> unambiguously encode OSIS.
>>
>> Thanks!
>> Weston
>>
>>
>> 2010/1/19 Troy A. Griffitts <scribe at crosswire.org>
>>
>> Weston Ruter wrote:
>>>
>>>> ... Troy, as you've said before, you can't actually use OSIS as your r=
aw
>>>> data format at CrossWire because an OSIS document can be authored in m=
any
>>>> different ways and so there is much more programming logic that is nee=
ded to
>>>> handle all of the possible OSIS styles.
>>>>
>>>
>>> Hey Weston,
>>>
>>> Hope to have time for a thoughtful response to more of your suggestions=
,
>>> but just wanted to clear a couple things up first:
>>>
>>> I hope I never implied that we can't/don't use OSIS internally as our
>>> primary markup standard.
>>>
>>> I did say that since OSIS allows different ways to mark the same
>>> structure, we have an importer which attempts to accept any valid OSIS =
doc
>>> and _normalizes_ that doc into a form of OSIS we find easiest for our e=
ngine
>>> to process.  It is still OSIS, just a form of OSIS with all structures
>>> represented in a single way.
>>>
>>> Even so, we still don't use any plain text format as our "raw data
>>> format".  We typically compress and index documents when they are impor=
ted
>>> into our engine.  You can ask our engine for OSIS, HTML, RTF, GBF, ThML=
, or
>>> plaintext and it will do its best to give you the data in the requested
>>> format.
>>>
>>> None of this to argue against your point: OSIS has multiple ways to
>>> encode a single structure in a document.
>>>
>>> The real answer to this is not technical.  I too am frustrated with thi=
s.
>>>  But many people working at many organizations were consulted when
>>> developing the OSIS specification.  They gave great insights to how the=
y
>>> work.  Sometimes they even made demands with an ultimatum that they wou=
ld
>>> absolutely not use the specification if a certain feature was not added=
 to
>>> the spec.
>>>
>>> OSIS could have been technically finished in less than a year.  It took
>>> us 3 years to get buy-in from all the participating organizations.
>>>
>>> In the end, the purpose of OSIS was to build collaboration between
>>> organizations.  We could have developed a much easier to use technical
>>> specification which no one would have used, or conceded to demands to g=
ain
>>> buy-in, and augment the specification with a 'best practices' doc which
>>> recommends a single specific method for encoding OSIS.  We chose the la=
ter.
>>>
>>> Implementing code against the spec now, it makes our importer a pain in
>>> the butt to write, but in the end, we get what we want: a single OSIS s=
tyle
>>> that our engine knows how to work with, and multiple supporting
>>> organizations producing OSIS documents.
>>>
>>>
>>> Troy.
>>>
>>>
>>>
>>>
>>> If we could define a single document structure, however, one
>>>
>>>> that is a subset of the freedom that OSIS provides (perhaps taking cue=
s
>>>> from OXES), we could then have an XML format for scripture that would =
be
>>>> suited for efficient interchange and application traversal.
>>>>
>>>> Currently we have the problem of two overlapping hierarchies: BSP and
>>>> BCV. However, there could be potentially multiple versification system=
s, so
>>>> there could be even more than two overlapping hierarchies, probably wh=
y the
>>>> <p> element isn't currently milestonable. To get around the problem of
>>>> overlapping hierarchies, what if we introduced stand-off markup into t=
he
>>>> equation? The words of scripture themselves could all be located in a =
flat
>>>> structure as siblings; then in the header there could be multiple CONC=
UR
>>>> sections (views) that list out the elements which belong to the variou=
s
>>>> parts of the hierarchies
>>>>
>>>> For example, the current approach:
>>>>
>>>> <p>
>>>>    <verse osisID=3D"Example.1.1" sID=3D"Example.1.1" />
>>>>    <w id=3D"w1">Then</w>
>>>>    <w id=3D"w2">he</w>
>>>>    <w id=3D"w3">said</w><w id=3D"p1">,</w>
>>>>    <q marker=3D"=93" sID=3D"Example.1.1.q1" />
>>>>        <w id=3D"w4">Let</w>
>>>>        <w id=3D"w5">us</w>
>>>>        <w id=3D"w6">go</w><w id=3D"p2">...</w>
>>>> </p>
>>>> <p>
>>>>    <w id=3D"w7">but</w>
>>>>    <verse eID=3D"Example.1.1" />
>>>>    <verse osisID=3D"Example.1.2" sID=3D"Example.1.2"/>
>>>>    <w id=3D"w8">don't</w>
>>>>    <w id=3D"w9">forget</w>
>>>>    <w id=3D"w10">your</w>
>>>>    <w id=3D"w11">backpack</w><w id=3D"p3">.</w>
>>>>    <q marker=3D"=94" eID=3D"Example.1.1.q1" />
>>>>    <verse eID=3D"Example.1.2" />
>>>> </p>
>>>>
>>>>
>>>>
>>>> Could instead appear as (I'm making up these element names):
>>>>
>>>> <concur>
>>>>    <view type=3D"verse" osisID=3D"Example.1.1" xpointer=3D"range(#w1, =
#w7)" />
>>>>    <view type=3D"verse" osisID=3D"Example.1.2" xpointer=3D"range(#w8, =
#q2)" />
>>>>    <view type=3D"quote" xpointer=3D"range(#q1, #q2)" />
>>>>    <view type=3D"para"  xpointer=3D"range(#w1, #p2)" />
>>>>    <view type=3D"para"  xpointer=3D"range(#w7, #q2)" />
>>>> </concur>
>>>> <content>
>>>>    <w id=3D"w1">Then</w>
>>>>    <w id=3D"w2">he</w>
>>>>    <w id=3D"w3">said</w><w id=3D"p1">,</w>
>>>>    <w id=3D"q1">=93</w><w id=3D"w4">Let</w>
>>>>    <w id=3D"w5">us</w>
>>>>    <w id=3D"w6">go</w><w id=3D"p2">...</w>
>>>>    <w id=3D"w7">but</w>
>>>>    <w id=3D"w8">don't</w>
>>>>    <w id=3D"w9">forget</w>
>>>>    <w id=3D"w10">your</w>
>>>>    <w id=3D"w11">backpack</w><w id=3D"p3">.</w><w id=3D"q2">=94</w>
>>>> </content>
>>>> By structuring a document like this, multiple overlapping hierarchies
>>>> can be cleanly defined, although they are separated from the underlyin=
g
>>>> content: this however, provides the benefit of clearing up the confusi=
on as
>>>> to where the <verse>, <p>, and <q> elements should be placed: in the c=
oncur
>>>> section, they each can share references to the same content elements a=
nd so
>>>> their boundaries are specified at the exact same location. This means =
that
>>>> XML processors would be able to consistently handle each of the hierar=
chies
>>>> as they interweave throughout the content data.
>>>>
>>>> Efraim Feinstein and James Tauber introduced me to this approach to
>>>> structuring markup. See also:
>>>> http://www.tei-c.org/Guidelines/P4/html/NH.html#NHCO
>>>>
>>>> Weston
>>>>
>>>>
>>>
>>
>

--0016e64dca66083724047df0160c
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

To follow up again, here is the Open Siddur project&#39;s writeup on the XM=
L schema their came up with (JLPTEI) and why they didn&#39;t go with OSIS. =
The problem of concurrent hierarchies was a major concern:<br><blockquote s=
tyle=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204=
); padding-left: 1ex;" class=3D"gmail_quote">


<p>The primary question then becomes: which structure should be encoded?
  Prose can be divided into paragraphs and sentences, poetic text can be
 divided into line groups and verse lines, lists into items and lists,=20
etc. Many parts of the <i>siddur</i> have more than one structure on the
 same text!  XML assumes that a document has a pure hierarchical tree=20
structure.  This suggests that XML is not an appropriate encoding=20
technology for the <i>siddur</i>.  At the same time, XML encoding is=20
nearly universally standard and more software tools support XML-based=20
formats than other encoding formats.  One of the primary innovations of=20
JLPTEI is its particular encoding of concurrent structural hierarchies.=20
 While the idea is not novel, the implementation is.  The potential for=20
the existence of concurrent structure is a guiding force in JLPTEI=20
design.
</p><p>The disadvantage of JLPTEI&#39;s encoding solutions is that the=20
archival form of the text is not immediately consumable by humans. We=20
are forced to rely extensively on processing software to make the format
 editable and displayable.  The disadvantage, however, is balanced by=20
the encoding format&#39;s extensibility and conservation of human labor.
</p><p>The Open Siddur intends to work within open standards whenever=20
possible.  In choosing a basis for our encoding, we searched for=20
available encoding standards that would suit our purposes.  We seriously
 considered using <a href=3D"http://bibletechnologies.net/" title=3D"http:/=
/bibletechnologies.net/" rel=3D"nofollow" target=3D"_blank">Open Scripture =
Information Standard</a> (OSIS), an XML=20
format used for encoding bibles.  It was quickly discovered that=20
representations of some of the more advanced features required to encode
 the liturgy (such as those discussed above) would have to be &quot;hacked&=
quot;=20
on top of the standard.  The <a href=3D"http://www.tei-c.org/" title=3D"htt=
p://www.tei-c.org" rel=3D"nofollow" target=3D"_blank">Text=20
Encoding Initiative</a> (TEI) XML format is a de-facto standard within=20
the digital humanities community.  It is also is specified in=20
well-documented texts, is actively supported by tools, and has a large=20
community built around its use and development.  Further, the standard=20
is deliberately extensible using a relatively simple mechanism.  The TEI
 was therefore a natural choice as a basis for our encoding.
</p></blockquote>From &lt;<a href=3D"http://wiki.jewishliturgy.org/JLPTEI" =
target=3D"_blank">http://wiki.jewishliturgy.org/JLPTEI</a>&gt;<br><br><div =
class=3D"gmail_quote">On Sun, Jan 24, 2010 at 12:37 AM, Weston Ruter <span =
dir=3D"ltr">&lt;<a href=3D"mailto:westonruter at gmail.com" target=3D"_blank">=
westonruter at gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Attached is an ex=
ample of what the ESV could look like as the result of a web service API re=
sponse for 1 John 5:7-8, including virtual elements and stand-off markup. T=
he relevant fragment:<br>


<br><span style=3D"font-family: courier new,monospace;">&lt;concurrent&gt;<=
/span><br style=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;!--</span=
><br style=3D"font-family: courier new,monospace;"><span style=3D"font-fami=
ly: courier new,monospace;">=A0=A0=A0 @virtual can be &quot;start&quot;, &q=
uot;end&quot;, &quot;both&quot;, or &quot;none&quot; (default)</span><br st=
yle=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 target attrib=
ute used by Open Siddur; Efraim Feinstein notes range()</span><br style=3D"=
font-family: courier new,monospace;"><span style=3D"font-family: courier ne=
w,monospace;">=A0=A0=A0 is a TEI-defined XPointer scheme:</span><br style=
=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <a href=3D"ht=
tp://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SATS" target=3D"_=
blank">http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SATS</a>=
</span><br style=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 Alternative w=
ould be to use @sID and @eID</span><br style=3D"font-family: courier new,mo=
nospace;"><span style=3D"font-family: courier new,monospace;">=A0=A0=A0 --&=
gt;</span><br style=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;p virtual=
=3D&quot;both&quot; target=3D&quot;#range(w6200500701, w6200500812)&quot; /=
&gt;&lt;!--sID=3D&quot;w6200500701&quot; eID=3D&quot;w6200500706b&quot;--&g=
t;</span><br style=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;verse osi=
sID=3D&quot;1John.5.7&quot; target=3D&quot;#range(h6200500601, p6200500706)=
&quot; /&gt;&lt;!--sID=3D&quot;w6200500701&quot; eID=3D&quot;p6200500706&qu=
ot;--&gt;</span><br style=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;verse osi=
sID=3D&quot;1John.5.8&quot; target=3D&quot;#range(w6200500801, p6200500812)=
&quot; /&gt;&lt;!--sID=3D&quot;w6200500801&quot; eID=3D&quot;p6200500812&qu=
ot;--&gt;</span><br style=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">&lt;/concurrent&gt;</sp=
an><br style=3D"font-family: courier new,monospace;"><span style=3D"font-fa=
mily: courier new,monospace;">&lt;content&gt;&lt;!-- isn&#39;t @scope=3D&qu=
ot;1John.5.7-1John.5.8&quot; redundant here? --&gt;</span><br style=3D"font=
-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;title ID=
=3D&quot;h6200500601&quot; canonical=3D&quot;false&quot; virtual=3D&quot;tr=
ue&quot;&gt;Testimony Concerning the Son of God&lt;/title&gt;</span><br sty=
le=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;w ID=3D&q=
uot;w6200500701&quot;&gt;For&lt;/w&gt;</span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 &lt;w ID=3D&quot;w6200500702&quot;&gt;there&lt;/w&gt;</span><br styl=
e=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;w ID=3D&q=
uot;w6200500703&quot;&gt;are&lt;/w&gt;</span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 &lt;w ID=3D&quot;w6200500704&quot;&gt;three&lt;/w&gt;</span><br styl=
e=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;w ID=3D&q=
uot;w6200500705&quot;&gt;that&lt;/w&gt;</span><br style=3D"font-family: cou=
rier new,monospace;"><span style=3D"font-family: courier new,monospace;">=
=A0=A0=A0 &lt;w ID=3D&quot;w6200500706&quot;&gt;testify&lt;/w&gt;&lt;w ID=
=3D&quot;p6200500706&quot;&gt;:&lt;/w&gt;</span><br style=3D"font-family: c=
ourier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;w ID=3D&q=
uot;w6200500801&quot;&gt;the&lt;/w&gt;</span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 &lt;w ID=3D&quot;w6200500802&quot;&gt;Spirit&lt;/w&gt;</span><br sty=
le=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;w ID=3D&q=
uot;w6200500803&quot;&gt;and&lt;/w&gt;</span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 &lt;w ID=3D&quot;w6200500804&quot;&gt;the&lt;/w&gt;</span><br style=
=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;w ID=3D&q=
uot;w6200500805&quot;&gt;water&lt;/w&gt;</span><br style=3D"font-family: co=
urier new,monospace;"><span style=3D"font-family: courier new,monospace;">=
=A0=A0=A0 &lt;w ID=3D&quot;w6200500806&quot;&gt;and&lt;/w&gt;</span><br sty=
le=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;w ID=3D&q=
uot;w6200500807&quot;&gt;the&lt;/w&gt;</span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 &lt;w ID=3D&quot;w6200500808&quot;&gt;blood&lt;/w&gt;&lt;w ID=3D&quo=
t;p6200500808&quot;&gt;;&lt;/w&gt;</span><br style=3D"font-family: courier =
new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;w ID=3D&q=
uot;w6200500809&quot;&gt;and&lt;/w&gt;</span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 &lt;w ID=3D&quot;w6200500810&quot;&gt;these&lt;/w&gt;</span><br styl=
e=3D"font-family: courier new,monospace;">



<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 &lt;w ID=3D&q=
uot;w6200500811&quot;&gt;three&lt;/w&gt;</span><br style=3D"font-family: co=
urier new,monospace;"><span style=3D"font-family: courier new,monospace;">=
=A0=A0=A0 &lt;w ID=3D&quot;w6200500812&quot;&gt;agree&lt;/w&gt;&lt;w ID=3D&=
quot;w6200500812&quot;&gt;.&lt;/w&gt;</span><br style=3D"font-family: couri=
er new,monospace;">



<span style=3D"font-family: courier new,monospace;">&lt;/content&gt;</span>=
<div><div></div><div><br><br><br><br><div class=3D"gmail_quote">On Thu, Jan=
 21, 2010 at 9:40 AM, Weston Ruter <span dir=3D"ltr">&lt;<a href=3D"mailto:=
westonruter at gmail.com" target=3D"_blank">westonruter at gmail.com</a>&gt;</spa=
n> wrote:<br>



<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Troy:<div><br><bl=
ockquote style=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0p=
t 0pt 0.8ex; padding-left: 1ex;" class=3D"gmail_quote">



I did say that since OSIS allows different ways to mark the same
structure, we have an importer which attempts to accept any valid OSIS
doc and _normalizes_ that doc into a form of OSIS we find easiest for
our engine to process. =A0It is still OSIS, just a form of OSIS with all
structures represented in a single way.<br></blockquote></div><div><br>Than=
k you for clarifying this, and also for sharing some of this history behind=
 the development of OSIS.<br><br><blockquote style=3D"border-left: 1px soli=
d rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;" class=
=3D"gmail_quote">




[We chose to] augment the specification with a &#39;best practices&#39; doc=
 which recommends
a single specific method for encoding OSIS.<br></blockquote>=A0<br>I don&#3=
9;t think I have seen this best practices doc. Is this something you use in=
ternally at CrossWire as part of your importer script? Could you direct me =
to it? I like the approach you took, allowing varying OSIS encodings but re=
commending only one of them. This is similar to the development of XHTML 1.=
0 dialects, where you are allowed to use the Transitional doctype, but the =
Strict doctype is recommended. Doing this for OSIS could answer the need fo=
r an unambiguous single markup language. The best practices document would =
need to contain the practices that are endorsed by at least the majority of=
 players; the others could abstain and still use their preferred (deprecate=
d) dialect of OSIS. Along with this best practices doc, an official normali=
zer script that translates OSIS into the recommended encoding would be grea=
t.<br>




<br>I look forward to your thoughts about stand-off markup encoding of OSIS=
, especially how well it might serve as the new recommended way to unambigu=
ously encode OSIS.<br><br>Thanks!<br>Weston<br><br></div><br><div class=3D"=
gmail_quote">




2010/1/19 Troy A. Griffitts <span dir=3D"ltr">&lt;<a href=3D"mailto:scribe@=
crosswire.org" target=3D"_blank">scribe at crosswire.org</a>&gt;</span><div><d=
iv></div><div><br><blockquote class=3D"gmail_quote" style=3D"border-left: 1=
px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"=
>




Weston Ruter wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
... Troy, as you&#39;ve said before, you can&#39;t actually use OSIS as you=
r raw data format at CrossWire because an OSIS document can be authored in =
many different ways and so there is much more programming logic that is nee=
ded to handle all of the possible OSIS styles.<br>





</blockquote>
<br>
Hey Weston,<br>
<br>
Hope to have time for a thoughtful response to more of your suggestions, bu=
t just wanted to clear a couple things up first:<br>
<br>
I hope I never implied that we can&#39;t/don&#39;t use OSIS internally as o=
ur primary markup standard.<br>
<br>
I did say that since OSIS allows different ways to mark the same structure,=
 we have an importer which attempts to accept any valid OSIS doc and _norma=
lizes_ that doc into a form of OSIS we find easiest for our engine to proce=
ss. =A0It is still OSIS, just a form of OSIS with all structures represente=
d in a single way.<br>





<br>
Even so, we still don&#39;t use any plain text format as our &quot;raw data=
 format&quot;. =A0We typically compress and index documents when they are i=
mported into our engine. =A0You can ask our engine for OSIS, HTML, RTF, GBF=
, ThML, or plaintext and it will do its best to give you the data in the re=
quested format.<br>





<br>
None of this to argue against your point: OSIS has multiple ways to encode =
a single structure in a document.<br>
<br>
The real answer to this is not technical. =A0I too am frustrated with this.=
 =A0But many people working at many organizations were consulted when devel=
oping the OSIS specification. =A0They gave great insights to how they work.=
 =A0Sometimes they even made demands with an ultimatum that they would abso=
lutely not use the specification if a certain feature was not added to the =
spec.<br>





<br>
OSIS could have been technically finished in less than a year. =A0It took u=
s 3 years to get buy-in from all the participating organizations.<br>
<br>
In the end, the purpose of OSIS was to build collaboration between organiza=
tions. =A0We could have developed a much easier to use technical specificat=
ion which no one would have used, or conceded to demands to gain buy-in, an=
d augment the specification with a &#39;best practices&#39; doc which recom=
mends a single specific method for encoding OSIS. =A0We chose the later.<br=
>





<br>
Implementing code against the spec now, it makes our importer a pain in the=
 butt to write, but in the end, we get what we want: a single OSIS style th=
at our engine knows how to work with, and multiple supporting organizations=
 producing OSIS documents.<br>




<font color=3D"#888888">
<br>
<br>
Troy.</font><div><div></div><div><br>
<br>
<br>
<br>
If we could define a single document structure, however, one<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
that is a subset of the freedom that OSIS provides (perhaps taking cues fro=
m OXES), we could then have an XML format for scripture that would be suite=
d for efficient interchange and application traversal.<br>
<br>
Currently we have the problem of two overlapping hierarchies: BSP and BCV. =
However, there could be potentially multiple versification systems, so ther=
e could be even more than two overlapping hierarchies, probably why the &lt=
;p&gt; element isn&#39;t currently milestonable. To get around the problem =
of overlapping hierarchies, what if we introduced stand-off markup into the=
 equation? The words of scripture themselves could all be located in a flat=
 structure as siblings; then in the header there could be multiple CONCUR s=
ections (views) that list out the elements which belong to the various part=
s of the hierarchies<br>





<br>
For example, the current approach:<br>
<br>
&lt;p&gt;<br>
 =A0 =A0&lt;verse osisID=3D&quot;Example.1.1&quot; sID=3D&quot;Example.1.1&=
quot; /&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w1&quot;&gt;Then&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w2&quot;&gt;he&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w3&quot;&gt;said&lt;/w&gt;&lt;w id=3D&quot;p1&quot=
;&gt;,&lt;/w&gt;<br>
 =A0 =A0&lt;q marker=3D&quot;=93&quot; sID=3D&quot;Example.1.1.q1&quot; /&g=
t;<br>
 =A0 =A0 =A0 =A0&lt;w id=3D&quot;w4&quot;&gt;Let&lt;/w&gt;<br>
 =A0 =A0 =A0 =A0&lt;w id=3D&quot;w5&quot;&gt;us&lt;/w&gt;<br>
 =A0 =A0 =A0 =A0&lt;w id=3D&quot;w6&quot;&gt;go&lt;/w&gt;&lt;w id=3D&quot;p=
2&quot;&gt;...&lt;/w&gt;<br>
&lt;/p&gt;<br>
&lt;p&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w7&quot;&gt;but&lt;/w&gt;<br>
 =A0 =A0&lt;verse eID=3D&quot;Example.1.1&quot; /&gt;<br>
 =A0 =A0&lt;verse osisID=3D&quot;Example.1.2&quot; sID=3D&quot;Example.1.2&=
quot;/&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w8&quot;&gt;don&#39;t&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w9&quot;&gt;forget&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w10&quot;&gt;your&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w11&quot;&gt;backpack&lt;/w&gt;&lt;w id=3D&quot;p3=
&quot;&gt;.&lt;/w&gt;<br>
 =A0 =A0&lt;q marker=3D&quot;=94&quot; eID=3D&quot;Example.1.1.q1&quot; /&g=
t;<br>
 =A0 =A0&lt;verse eID=3D&quot;Example.1.2&quot; /&gt;<br>
&lt;/p&gt;<br>
<br>
<br>
<br>
Could instead appear as (I&#39;m making up these element names):<br>
<br>
&lt;concur&gt;<br>
 =A0 =A0&lt;view type=3D&quot;verse&quot; osisID=3D&quot;Example.1.1&quot; =
xpointer=3D&quot;range(#w1, #w7)&quot; /&gt;<br>
 =A0 =A0&lt;view type=3D&quot;verse&quot; osisID=3D&quot;Example.1.2&quot; =
xpointer=3D&quot;range(#w8, #q2)&quot; /&gt;<br>
 =A0 =A0&lt;view type=3D&quot;quote&quot; xpointer=3D&quot;range(#q1, #q2)&=
quot; /&gt;<br>
 =A0 =A0&lt;view type=3D&quot;para&quot; =A0xpointer=3D&quot;range(#w1, #p2=
)&quot; /&gt;<br>
 =A0 =A0&lt;view type=3D&quot;para&quot; =A0xpointer=3D&quot;range(#w7, #q2=
)&quot; /&gt;<br>
&lt;/concur&gt;<br>
&lt;content&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w1&quot;&gt;Then&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w2&quot;&gt;he&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w3&quot;&gt;said&lt;/w&gt;&lt;w id=3D&quot;p1&quot=
;&gt;,&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;q1&quot;&gt;=93&lt;/w&gt;&lt;w id=3D&quot;w4&quot;=
&gt;Let&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w5&quot;&gt;us&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w6&quot;&gt;go&lt;/w&gt;&lt;w id=3D&quot;p2&quot;&=
gt;...&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w7&quot;&gt;but&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w8&quot;&gt;don&#39;t&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w9&quot;&gt;forget&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w10&quot;&gt;your&lt;/w&gt;<br>
 =A0 =A0&lt;w id=3D&quot;w11&quot;&gt;backpack&lt;/w&gt;&lt;w id=3D&quot;p3=
&quot;&gt;.&lt;/w&gt;&lt;w id=3D&quot;q2&quot;&gt;=94&lt;/w&gt;<br>
&lt;/content&gt; =A0 <br>
By structuring a document like this, multiple overlapping hierarchies can b=
e cleanly defined, although they are separated from the underlying content:=
 this however, provides the benefit of clearing up the confusion as to wher=
e the &lt;verse&gt;, &lt;p&gt;, and &lt;q&gt; elements should be placed: in=
 the concur section, they each can share references to the same content ele=
ments and so their boundaries are specified at the exact same location. Thi=
s means that XML processors would be able to consistently handle each of th=
e hierarchies as they interweave throughout the content data.<br>





<br>
Efraim Feinstein and James Tauber introduced me to this approach to structu=
ring markup. See also: <a href=3D"http://www.tei-c.org/Guidelines/P4/html/N=
H.html#NHCO" target=3D"_blank">http://www.tei-c.org/Guidelines/P4/html/NH.h=
tml#NHCO</a><br>





<br>
Weston<br>
<br>
</blockquote>
<br>
</div></div></blockquote></div></div></div><br>
</blockquote></div><br>
</div></div></blockquote></div><br>

--0016e64dca66083724047df0160c--



More information about the osis-users mailing list