[osis-core] references and self-ids Part 1 - Assumptions/statements

Steve DeRose osis-core@bibletechnologieswg.org
Wed, 10 Jul 2002 13:03:57 -0400


At 09:57 PM -0600 07/08/02, Todd Tillinghast wrote:
>I have read through the Patrick's most recent posting as well as several
>of the other postings.
>
>Just to make sure we are all on the same page I put together the
>following statements with I believe are TRUE and represent our intent
>with respect to references.  If you don't believe any of these
>statements to be true please post a response.
>(I am using the term reference in the broadest sense to mean either self
>identification or to mean an external reference that is either work
>specific or work abstract.)
>1) The simple part of the reference (ex Gen.1.1) has no meaning outside
>of the context of at least a reference system.

If you really mean "reference system" here, as in, we can't know that 
this is not the essay on Gennesee Gin, part 1, poem 1, then sure. If 
you mean versification scheme, then I'd say it's meaningful, but not 
fully specified -- most of the time not knowing the v. scheme won't 
matter.

>2) Whether validated or not a reference system defines the set of valid
>references.

OK

>2) There may be zero or more specific works that are compliant with and
>use a given reference system.

Yes.

>3) In some cases a reference system and a specific work are equivalent,
>either because a specific work defines the reference system or because
>there is only one instance of the work.

If by equivalent you mean, predictable from each other, or in a 
one-to-one relationship, then yes.

>4) It is possible to create a reference to a specific work using a
>reference system that it does not support.  (The
>translation/transformation would be left to software to resolve.)

Yes (actually, probably left to a mapping table to specify, and then 
for software to resolve.

>5) Although a reference system is required, the specific work is
>optional.

Right.

>
>
>It seems that we desire to do the following:
>A) Self-identify text (mainly verses but also ranges of text, chapters,
>and books).
>B) Have the self-identifying identifiers be tied to a reference system
>OR a reference system and a work without having to explicitly state the
>reference system and work with each identifier.
>C) Be able to self-identify text from more than one reference system
>and/or work within a single document.
>D) Create references (not self-identification). 
>E) Create a reference to a range of text.
>F) Describe a reference at greater granularity than the reference system
>defines.  (grain)

I'd agree with all those. I'd also pin down a few more details, to 
some things like:

E1) A range may start and end at any reference-system-specified unit, 
or any grain within one.
E2) A range must be confined to a single work.
E3) A range (or for that matter any reference) may become meaningless 
when mapped from one edition of a work to another, for various 
reasoning such as the referenced text not being included in some 
editions, or a range's ends being re-ordered.
F1) Grains are not expected to map across editions (unless they are 
very close, such as successive minor edits of the same translation, 
like NIV-US-1999 vs. NIV-US-2001); thus in general any reference with 
a grain should also specify a particular work, and any reference 
mapped to another version should ignore a grain specification (or 
perhaps offer it with a warning or something like that).

I guess I'd also add:

A work is properly an abstract notion, roughly corresponding to a 
unique author/title pair, and may exist in many concrete editions, 
varying in language, writing system for that language, translator, 
edition, transcription, and so on.

Reference systems are assumed to be hierarchical, such that it is 
meaningful and reasonable to fall back by deleting tokens from the 
right (for example, dropping back from verses to chapters, or lines 
to pages, etc).

(I can imagine non-hierarchical systems, but allowing for them would 
either prevent such fallback, or require us to always flag which 
systems are and aren't. I'd rather make the assumption and loosen it 
someday later if we have to).


Also, I think we'll need consistent terminology for the various bits 
and pieces of all this. How about:

Reference: Data that specifies a contiguous location in a work or 
some version of a work. Logically, this includes work, starting 
identifier and grain, and ending identifier and grain.

Work: A title or version

Title: An abstract work of literature, which subsumes all editions, 
translations, transcriptions, and other forms of the work.

Version: A particular concrete instantiation of a title, in a 
particular language, a particular translation, transcription, 
edition, and so on.

Identifier: A canonical reference to a location in a work, specified 
as a series of dot-separated tokens that name successive hierarchical 
divisions, such as book/chapter/verse; volume/page/line; 
act/scene/speech; and so on.

Grain: A machine-interpretable string that specifies a location 
smaller than any identifier can specify for a given work. A grain 
location is specified relative to the nearest identifiable location, 
in term of counting Unicode code points, searching for a string 
match, or other generic means (generic in the sense that the 
interpretation of a grain specification does *not* depend on the 
particular reference system in use.

Unit reference: A reference to a work as a whole, a single identified 
unit in a work, or a particular grain within such a unit.

Range reference: A reference that is not a unit reference, and so 
must be specified by its starting and ending identifier and grain.

>Todd


-- 

Steve DeRose -- http://www.stg.brown.edu/~sjd
Chair, Bible Technologies Group -- http://www.bibletechnologies.net
Email: sderose@speakeasy.net
Backup email: sderose@mac.com, sjd@stg.brown.edu