[osis-core] references and self-ids Part 3 - Proposal

Todd Tillinghast osis-core@bibletechnologieswg.org
Wed, 10 Jul 2002 12:42:04 -0600


Comments below.

Todd

> -----Original Message-----
> From: owner-osis-core@bibletechnologieswg.org [mailto:owner-osis-
> core@bibletechnologieswg.org] On Behalf Of Steve DeRose
> Sent: Wednesday, July 10, 2002 11:21 AM
> To: osis-core@bibletechnologieswg.org
> Subject: Re: [osis-core] references and self-ids Part 3 - Proposal
> 
> >The follow are a list of proposed ideas related to references:
> >(Each item is largely independent from the others.)
> >
> >1) I think declaring in the header the reference systems or reference
> >system/work pairs used in a document is a great idea.  (This is not
> >really a proposal.)
> 
> I rather like it too; though by 'declare' here, I would mean declare
> what one is in use, rather than define the entire system right there;
> I think we should have a separate schema and conventions for
> explicitly defining reference systems.

Yes.  By 'declare' I mean all the necessary information to describe
which reference system or reference system/work pair is in use later in
the document.

> 
> >
> >2) I would like to propose that when creating a reference or a
self-id
> >that work ALWAYS be a part of the attribute that is contains the
> >reference itself if not defaulted.  Since we are relying on
techniques
> >other than valuators for validating the rest of the reference why not
> >make the work and the reference itself a single attribute.  This will
> >leave room for validation based on "derived" schema, since a regular
> >expression can be written for a single data type but can not be
written
> >to act contingently based on the values of two (or more) attributes.
> 
> There's a point there; I think we separated it out in order to
> provide separate defaulting, and to minimize the complexity of the
> (already-messy) attribute value. But not that we've moved the
> complexity of 'work' to the header, I don't see much problem in
> packing the name of the work into the attribute. Maybe with a colon
> after it?
> 
> Thoughts?

Yes to the colon separator.  If the colon separator is not present then
the default reference system as indicated in the header is implied.  

> 
> >
> >Although having the work as a separate attribute provides to
opportunity
> >to match occurrences with matching definitions in the header, the
> >validation is weak and the opportunity for richer validation for the
> >entire reference out weights the value of simple work validation.
> 
> I'm not sure I buy that particular argument, but since I buy the
> conclusion that doesn't bother me.
> 
> >
> >
> >3) I would like to propose that grain exist as an option for all
> >references AND self-ids.
> 
> Why for self-ids? So far, grain has by definition meant a
> machine-processable pointer of finer granularity than identifiers
> could express. Surely it wouldn't make sense to tag points in the
> text as being "5 code points into the verse" or "the first match of
> 'the' in the verse", would it? What sort of cases do you have in mind?

The primary case is when a verse is segmented.  If a verse is split into
three pieces for example they should each have a unique id.  In the
examples of Matt.13 I have adopted a convention of having the first
piece carry the non-grain adorned reference (osisID="Matt.13.14").  The
second verse piece would have an id that includes a grain of some form
(osisID=Matt.13.14@char:44(This)).  Naturally the next and prev
attributes would then include the appropriate and unique identifiers.
This convention provides a reliable mechanism to match up the pieces of
a segmented verse and also allows for the location of the start of the
verse by the non-grain adorned identifier.  

> 
> >
> >4) I would like to propose that all references AND self-ids be
allowed
> >to optionally be ranges.  This is needed to self-id verses that are
not
> >broken into distinct verses in the translation being encoded and a
> >single <verse> element represents several logical verses in the
defined
> >reference system.  (If we do not adopt this concept for self-ids then
we
> >will be forced to create a reference system for each minor variation
> >that individual translations take on the basic reference systems.)
> 
> It sounds to me as if your goal is to make identifiers and references
> be the same -- why is that useful?

Although I do thing it is useful it neither the basis or my reasoning
nor my goal.

> 
> I see why it would be handy to identify a block-oriented translation
> at a looser (range-like) level; but it does complicate verse-finding
> code substantially, doesn't it? You have to do a fairly complicated
> range-intersection algorithm to be able to know that
> Matt.1.3-Matt.1.7 relates to Matt.1.6-Matt.1.12; indeed, I'm not sure
> the result is well-defined (mathematical intersection is, but the
> user's intent may vary in whether open vs. close-ended intersections
> count, and so on for the other 13 or so cases that come up....)
> 
There are two case:
Case 1) Multiple verses as defined by the reference system are
translated into a single verse element.  In this case is indeed
troublesome. Following are options:
	a) We define a new reference system for this translation and
define a new verse id that is Matt.1.6-12 which maps to several other
verses in the more common reference system it seeks to largely comply
with.
	b) We allow osisID to be a list of identifiers that are not
ranges.  This would allow for exact match searching without the complex
range-intersection algorithm, but might also lead to abuses.  It is
possible that encoders would try to support multiple reference systems
by including multiple ids on every verse.  
	c) Allow the osisID to be a range and force the use of
range-intersection algorithms.
	d) Force the use of the first verse identifier to be the osisID.
The verses after the first one would not be found when searched for.  It
is possible to have "shadow" child elements that carry the remaining
verseIDs as an osisID that allow the verse to be found by id and then
back up to the parent for the actual verse.


Case 2) Higher level elements that contain verses like <div>,
<lineGroup>, <p>, <q>, etc...   In the cases of elements that hold whole
chapters and whole books the non-range reference works fine.  But for
the rest of the cases it is not possible.  I don't have a problem with
allowing osisID only on verses, but I thought that it was an alternative
to encode an entire Bible without any verse elements.  We would all <p>
to have either an osisID in the same meaning as <verse> and also have a
range allowable identifier in the same fashion as <div>.  This would
mean that <div> would only have the range allowable identifier.

(Given the options I favor a combination of 1.b and what is proposed in
case 2.  How does xpath deal with finding a single token within an
attribute that is a list?)

Thoughts?

> I still think we can avoid a fair amount of mess by insisting that
> identifiers are butt-simple, and only references can get more
> sophisticated. The only cost is to the encoders of those loose
> versions, and that can be easily hidden by having software expand the
> identifiers as needed.
> 
> >
> >This could allows for self-ids for higher level containers like <p>
and
> ><div>.  (I am "Matt.13.10-Matt.13.17" as seen in the Matt.13 encoding
I
> >send out Sunday.)
> >
> >5) I would like to propose that we adopt one of the two following
> >options for ALL references and self-ids:
> >
> >5a) OPTION 1:
> >reference or self-ID: [referencePrefix:]ref[@grain][-ref[@grain]]
> >where referencePrefix is defined in the header and behaves like a
> >namespace prefix that defines a reference system or a reference
system
> >and a specific work.
> >
> >
> >Example:
> ><references>
> >	<referencePrefixDefault referenceSystem="Bible.NRSV"
> >work="Bible.TEV"/>
> >	<referencePrefix refID="KJV" referenceSystem="Bible.KJV.1612"/>
> >	<referencePrefix refID="NRSV" referenceSystem="Bible.NRSV"/>
> >	<referencePrefix refID="FrTEV"
> >referenceSystem=Bible.FrenchReferenceSystem" work="Bible.TEV"/>
> >	<referencePrefix refID="TEV" referenceSystem=Bible.NRSV"
> >work=Bible.TEV"/>
> >	<referencePrefix refID="sorted"
>
>referenceSystem="Bible.Todd.SequentiallyIDedSortedByTextValueOfVersesOf
T
> >EV" work="Bible.TEV"/>
> ></references>
> >...
> ><verse osisID="Gen.1.1">...</verse> <!--implies
> >"Bible.NRSV(Bible.TEV):Gen.1.1" -->
> >
> ><reference ref="KJV:Gen.1.1">Gen 1:1 using the KJV 1612 reference
system
> >but not specifying any work.</reference>
> >
> ><reference ref="Gen.1.1">a reference to the default reference system
and
> >work if specified.</reference>
> >
> ><reference ref="sorted:16">a reference to the sixteenth verse from
the
> >TEV when the verses are sorted based on the text value of the
> >verses.</reference>
> >
> ><reference ref=Bible.FrenchReferenceSystem(Bible.TEV):Ps.55.22">a
> >reference expressed using the French reference system AND specifying
the
> >TEV version.</reference>
> >
> >5b) OPTION 2:
> >reference or self-ID:
> >[referenceSystem[(work)]:]ref[@grain][-ref[@grain]]
> >
> >Provide a richer syntax for "references" and require the full
reference
> >system or reference system and work if not the default.
> >
> >Example:
> ><references>
> >	<referenceSystemDefault
> >referenceSystem="Bible.NRSV(Bible.TEV)"/>
> >	<referenceSystem referenceSystem="Bible.KJV.1612"/>
> >	<referenceSystem referenceSystem="Bible.NRSV"/>
> >	<referenceSystem referenceSystem=Bible.NRSV(Bible.TEV)"/>
> >	<referenceSystem
> >referenceSystem=Bible.FrenchReferenceSystem(Bible.TEV)"/>
> >	<referenceSystem
>
>referenceSystem="Bible.Todd.SequentiallyIDedSortedByTextValueOfVersesOf
T
> >EV(Bible.TEV)"/>
> ></references>
> >...
> ><verse osisID="Gen.1.1">...</verse> <!--implies
> >"Bible.NRSV(Bible.TEV):Gen.1.1" -->
> >
> ><reference ref="Bible.KJV.1612:Gen.1.1">Gen 1:1 using the KJV 1612
> >reference system but not specifying any work.</reference>
> >
> ><reference ref="Gen.1.1">a reference to the default reference system
and
> >work if specified.</reference>
> >
> ><reference
>
>ref="Bible.Todd.SequentiallyIDedSortedByTextValueOfVersesOfTEV(Bible.TE
V
> >):16">a reference to the sixteenth verse from the TEV when the verses
> >are sorted based on the text value of the verses.</reference>
> >
> ><reference ref=Bible.FrenchReferenceSystem(Bible.TEV):Ps.55.22">a
> >reference expressed using the French reference system AND specifying
the
> >TEV version.</reference>
> 
> 
> 5b seems better in principle, since it breaks things apart more and
> is les likely to require patching later. I'm still uncomfortable with
> making identifiers as complicated as references, though. What's the
> main motivating factor?

Since we need to allow self-ids from more than one reference system then
we must specify the reference system on the self-ids that are not from
the default reference system.  As a result we must have a self-id that
contains reference system in the same fashion as a reference.  Might as
well use the same syntax.

> >
> >
> >6) I would like to propose that the FORM but NOT the VALUES within a
> >reference AND self-id be validated in the core.
> 
> 
> Yup; I'd actually thought we had already settle on that, but in any
> case I'd like to now. (hey, we have well-formed vs. valid
> references....)
> 
> >
> >Example:
> >number: any integer
> >XMLName: a valid XML name
> >basicOSISIDStructure: one or more XML name separated by periods
> >referenceSystem: basicOSISIDStructure
> >work: basicOSISIDStructure
> >ref: basicOSISIDStructure
> >grain: char.number | enum.XMLName | word.number
> >referenceStructure:
[referenceSystem[(work)]:]ref[@grain][-ref[@grain]]
> >self-id: referenceStructure
> >reference: referenceStructure
> >
> >7) remove osisWork, cite, and outCite from globalAttributes (leaving
> >osisID to be self-id as described in #6)
> 
> I think so, but I'm not feeling quite sure (the terms are getting
> muddled for me a bit at this point).
> 
> >
> >8) Make osisID of some type other than xs:string.
> 
> Right.
> 
> 
> >
> >9) Use the reference element for all other pointing needs.  (using
the
> >reference structure described in #6)
> 
> what does this proposal mean? I don't think I get it.
> 

Elements and attributes within elements other than <reference> only
self-identify.  If we want to say that an element is related to some
reference then a child reference element would need to be created with
an informative type or type/subtype pair to indicate what its purpose
is.  

If the element is annotating then create an reference with the
type="annotation".

For what ever the need that brought us to add the cite attribute to
globalAttributes, add a reference element as a child.

 
> >
> >Todd
> 
> 
> --
> 
> Steve DeRose -- http://www.stg.brown.edu/~sjd
> Chair, Bible Technologies Group -- http://www.bibletechnologies.net
> Email: sderose@speakeasy.net
> Backup email: sderose@mac.com, sjd@stg.brown.edu