[osis-core] Segmenation.

Steve DeRose osis-core@bibletechnologieswg.org
Wed, 19 Jun 2002 15:26:20 -0400


At 12:36 PM -0500 06/05/02, Todd Tillinghast wrote:
>There are several types/classes of hierarchies that could be segmented
>using our schema.  Stepping back and looking at the big picture, it
>seems to me that we need to determine which types/classes of hierarchies
>can be segmented and which elements within each logical hierarchy can be
>segmented.
>
>Assuming that there can be more than one hierarchy segmented
>simultaneously there needs to be clear guidelines that detail which
>elements go together to reconstitute the logical elements that were
>originally segmented.  And it would be helpful to there were "best
>practices" regarding the identifiers used for xxxID, next, and previous
>attributes of the segmented elements.

Yup; the rules for reconstituting are actually a really interesting 
current research problem -- so I'd like to punt on it if we can for 
now.

I like the best-practice idea though -- how about this:

A) All parts of a broken verse get the same verse ID.

B) The next and previous chain will use the verseID, but with "_a" 
"_b" and so on tacked on, in document order.

C) Other element types that must be broken will label their next/prev chain by
the nearest verse identifier, "_", the element type, another "_", and a letter.

Just the first thing that came to mind....

>
>The trickiest piece seems to be lowest level container of "actual"
>scripture text.  If we say that "actual" scripture text must always be
>directly contained by <verse> (or within <abbr>, <foreign>,
><inscription>, <name>, or a simple <q> contained within <verse>) then
><verse> elements will ALWAYS hold the identifiers that allow us to
>reconstitute "pure" verses that were segmented.  However, as it stands
>it is POSSIBLE and even NATURAL to encode "actual" scripture text in
><lineGroup>/<line>, <q>, <list>/<item>, <p>, and <blockQuote> with out
>any <verse> elements at all (or with a mixture including some <verse>
>elements).
>
>If we identify a "role" for elements that is "lowest level container of
>'actual' scripture", then when reconstituting the text into logical
>verses, elements acting in this "role" could be identified INDEPENANT of
>their element name.  This would allow any of element acting in this
>"role" to act the same as a <verse> element for the purpose of
>identification.  In fact that is what we have said we would like to do
>with <p> when it is exactly one verse.  This would eliminate the COMMON
>cases where you see.

Interesting.... Are there some elements that would only *sometimes* 
be the lowest, though? Hmmm. So if we allowed verseID on a lot of 
things, people could save the double markup....

What do people think on this? Patrick also called it interesting; did 
I miss any other replies?

This would be a new way of using markup -- in effect, any element 
that had a non-empty "verseID" attribute would be held to be a verse. 
Kind of like architectural forms, except that it goes by having the 
attribute name, rather than by value. I'm slightly inclined not to go 
for it, but mainly for non-technical reasons like minimizing 
last-minute changes, and the fact that it is a pretty novel 
construct. It would certainly save some markup for people doing this 
by hand....

>
><line><verse verseID="...">...</verse></line>
>and
><p><verse verseID="...">...</verse></p>
>
>replace them with
><line verseID="...">...</line>
>and
><p verseID="...">...</p>
>
>but does not prevent
><p>
>	<verse verseID="a">...</verse>
>	<verse verseID="b">...</verse>
>	<verse verseID="c">...</verse>
>	<verse verseID="d">...</verse>
></p>
>
>This does not PRECLUDE the more complicated cases where there are
>multiple hierarchies segmented simultaneously.
>
><p pID="s" next="t">
>	<verse verseID="x">...</verse>
>	<verse verseID="a" verseNext="b">...</verse>
></p>
><p pID="t" prev="s" verseID="b" versePrev="a">...</p>
>
>For an element to take on this proposed "role" they would simple assign
>a value to their "verseID" attribute and the appropriate "verseNext" and
>"versePrev" attributes.  If the same element were segmented through
>their participation in another logical hierarchy then the element
>specific xID attribute and next/prev attributes would be assigned
>appropriate values.
>
>SUMMARY:  There are a lot of elements that naturally take on the "role"
>of "lowest level container of 'actual' scripture".  In order to simplify
>allow a discrete set of elements to all perform the same role as
><verse>.  When reconstituting, simply go to the next element with an
>attribute verseID with a value equal to the current nodes verseNext and
>a versePrev with a value that matches the current nodes verseID.  Other
>segmentation would require the element name to be the same as it other
>parts.  (This makes a special case out of <verse> which simplifies
>encoding and element construction.)
>
>PROPOSAL:  Create an abstract type that defines the attributes and
>possible child elements of an element acting in the proposed "role".
>Derive all elements that can act in this role from this element.
>
>Todd


-- 

Steve DeRose -- http://www.stg.brown.edu/~sjd
Chair, Bible Technologies Group -- http://www.bibletechnologies.net
Email: sderose@speakeasy.net
Backup email: sderose@mac.com, sjd@stg.brown.edu