[osis-core] references and self-ids Part 3 - Proposal

Steve DeRose osis-core@bibletechnologieswg.org
Wed, 10 Jul 2002 13:20:35 -0400


>The follow are a list of proposed ideas related to references:
>(Each item is largely independent from the others.)
>
>1) I think declaring in the header the reference systems or reference
>system/work pairs used in a document is a great idea.  (This is not
>really a proposal.)

I rather like it too; though by 'declare' here, I would mean declare 
what one is in use, rather than define the entire system right there; 
I think we should have a separate schema and conventions for 
explicitly defining reference systems.

>
>2) I would like to propose that when creating a reference or a self-id
>that work ALWAYS be a part of the attribute that is contains the
>reference itself if not defaulted.  Since we are relying on techniques
>other than valuators for validating the rest of the reference why not
>make the work and the reference itself a single attribute.  This will
>leave room for validation based on "derived" schema, since a regular
>expression can be written for a single data type but can not be written
>to act contingently based on the values of two (or more) attributes.

There's a point there; I think we separated it out in order to 
provide separate defaulting, and to minimize the complexity of the 
(already-messy) attribute value. But not that we've moved the 
complexity of 'work' to the header, I don't see much problem in 
packing the name of the work into the attribute. Maybe with a colon 
after it?

Thoughts?

>
>Although having the work as a separate attribute provides to opportunity
>to match occurrences with matching definitions in the header, the
>validation is weak and the opportunity for richer validation for the
>entire reference out weights the value of simple work validation.

I'm not sure I buy that particular argument, but since I buy the 
conclusion that doesn't bother me.

>
>
>3) I would like to propose that grain exist as an option for all
>references AND self-ids.

Why for self-ids? So far, grain has by definition meant a 
machine-processable pointer of finer granularity than identifiers 
could express. Surely it wouldn't make sense to tag points in the 
text as being "5 code points into the verse" or "the first match of 
'the' in the verse", would it? What sort of cases do you have in mind?

>
>4) I would like to propose that all references AND self-ids be allowed
>to optionally be ranges.  This is needed to self-id verses that are not
>broken into distinct verses in the translation being encoded and a
>single <verse> element represents several logical verses in the defined
>reference system.  (If we do not adopt this concept for self-ids then we
>will be forced to create a reference system for each minor variation
>that individual translations take on the basic reference systems.)

It sounds to me as if your goal is to make identifiers and references 
be the same -- why is that useful?

I see why it would be handy to identify a block-oriented translation 
at a looser (range-like) level; but it does complicate verse-finding 
code substantially, doesn't it? You have to do a fairly complicated 
range-intersection algorithm to be able to know that 
Matt.1.3-Matt.1.7 relates to Matt.1.6-Matt.1.12; indeed, I'm not sure 
the result is well-defined (mathematical intersection is, but the 
user's intent may vary in whether open vs. close-ended intersections 
count, and so on for the other 13 or so cases that come up....)

I still think we can avoid a fair amount of mess by insisting that 
identifiers are butt-simple, and only references can get more 
sophisticated. The only cost is to the encoders of those loose 
versions, and that can be easily hidden by having software expand the 
identifiers as needed.

>
>This could allows for self-ids for higher level containers like <p> and
><div>.  (I am "Matt.13.10-Matt.13.17" as seen in the Matt.13 encoding I
>send out Sunday.)
>
>5) I would like to propose that we adopt one of the two following
>options for ALL references and self-ids:
>
>5a) OPTION 1:
>reference or self-ID: [referencePrefix:]ref[@grain][-ref[@grain]]
>where referencePrefix is defined in the header and behaves like a
>namespace prefix that defines a reference system or a reference system
>and a specific work.
>
>
>Example:
><references>
>	<referencePrefixDefault referenceSystem="Bible.NRSV"
>work="Bible.TEV"/>
>	<referencePrefix refID="KJV" referenceSystem="Bible.KJV.1612"/>
>	<referencePrefix refID="NRSV" referenceSystem="Bible.NRSV"/>
>	<referencePrefix refID="FrTEV"
>referenceSystem=Bible.FrenchReferenceSystem" work="Bible.TEV"/>
>	<referencePrefix refID="TEV" referenceSystem=Bible.NRSV"
>work=Bible.TEV"/>
>	<referencePrefix refID="sorted"
>referenceSystem="Bible.Todd.SequentiallyIDedSortedByTextValueOfVersesOfT
>EV" work="Bible.TEV"/>
></references>
>...
><verse osisID="Gen.1.1">...</verse> <!--implies
>"Bible.NRSV(Bible.TEV):Gen.1.1" -->
>
><reference ref="KJV:Gen.1.1">Gen 1:1 using the KJV 1612 reference system
>but not specifying any work.</reference>
>
><reference ref="Gen.1.1">a reference to the default reference system and
>work if specified.</reference>
>
><reference ref="sorted:16">a reference to the sixteenth verse from the
>TEV when the verses are sorted based on the text value of the
>verses.</reference>
>
><reference ref=Bible.FrenchReferenceSystem(Bible.TEV):Ps.55.22">a
>reference expressed using the French reference system AND specifying the
>TEV version.</reference>
>
>5b) OPTION 2:
>reference or self-ID:
>[referenceSystem[(work)]:]ref[@grain][-ref[@grain]]
>
>Provide a richer syntax for "references" and require the full reference
>system or reference system and work if not the default.
>
>Example:
><references>
>	<referenceSystemDefault
>referenceSystem="Bible.NRSV(Bible.TEV)"/>
>	<referenceSystem referenceSystem="Bible.KJV.1612"/>
>	<referenceSystem referenceSystem="Bible.NRSV"/>
>	<referenceSystem referenceSystem=Bible.NRSV(Bible.TEV)"/>
>	<referenceSystem
>referenceSystem=Bible.FrenchReferenceSystem(Bible.TEV)"/>
>	<referenceSystem
>referenceSystem="Bible.Todd.SequentiallyIDedSortedByTextValueOfVersesOfT
>EV(Bible.TEV)"/>
></references>
>...
><verse osisID="Gen.1.1">...</verse> <!--implies
>"Bible.NRSV(Bible.TEV):Gen.1.1" -->
>
><reference ref="Bible.KJV.1612:Gen.1.1">Gen 1:1 using the KJV 1612
>reference system but not specifying any work.</reference>
>
><reference ref="Gen.1.1">a reference to the default reference system and
>work if specified.</reference>
>
><reference
>ref="Bible.Todd.SequentiallyIDedSortedByTextValueOfVersesOfTEV(Bible.TEV
>):16">a reference to the sixteenth verse from the TEV when the verses
>are sorted based on the text value of the verses.</reference>
>
><reference ref=Bible.FrenchReferenceSystem(Bible.TEV):Ps.55.22">a
>reference expressed using the French reference system AND specifying the
>TEV version.</reference>


5b seems better in principle, since it breaks things apart more and 
is les likely to require patching later. I'm still uncomfortable with 
making identifiers as complicated as references, though. What's the 
main motivating factor?

>
>
>6) I would like to propose that the FORM but NOT the VALUES within a
>reference AND self-id be validated in the core.


Yup; I'd actually thought we had already settle on that, but in any 
case I'd like to now. (hey, we have well-formed vs. valid 
references....)

>
>Example:
>number: any integer
>XMLName: a valid XML name
>basicOSISIDStructure: one or more XML name separated by periods
>referenceSystem: basicOSISIDStructure
>work: basicOSISIDStructure
>ref: basicOSISIDStructure
>grain: char.number | enum.XMLName | word.number
>referenceStructure: [referenceSystem[(work)]:]ref[@grain][-ref[@grain]]
>self-id: referenceStructure
>reference: referenceStructure
>
>7) remove osisWork, cite, and outCite from globalAttributes (leaving
>osisID to be self-id as described in #6)

I think so, but I'm not feeling quite sure (the terms are getting 
muddled for me a bit at this point).

>
>8) Make osisID of some type other than xs:string.

Right.


>
>9) Use the reference element for all other pointing needs.  (using the
>reference structure described in #6)

what does this proposal mean? I don't think I get it.

>
>Todd


-- 

Steve DeRose -- http://www.stg.brown.edu/~sjd
Chair, Bible Technologies Group -- http://www.bibletechnologies.net
Email: sderose@speakeasy.net
Backup email: sderose@mac.com, sjd@stg.brown.edu