[osis-core] Repost of Notes from Dallas meetings 1/2002

Steven J. DeRose osis-core@bibletechnologieswg.org
Wed, 27 Aug 2003 18:17:02 -0400


(this has an early bit about ref sys mapping files.....

------------------------------

Virtually entirely completely unorganized notes from BTG meetings, 
Dallas, Jan 2002.




Types of identifier components

Matthew
5
5a
A
iv
greek/hebrew letters?
unnumbered items (psalm prescriptions, etc).
derive all from XML Schema datatypes

A component shall be an XML Schema datatype.



Enumerated list of names

Integer (range)

Letter (range)
   No case distinctions

Sequence of the above


1-5, 5a, 5b, 6-50, A-F



<component-definition>
    <component-name>
       <short>bbn
       <descr>Bible Book Names</descr>



Steps to defining the Biblical stuff:



Define components:

OSIS-works (Bible, Josephus, Plato-Phaedrus,...)

Bible-book-names
   (layered Heb -> Cath -> Orth -> Prot?)

Chapter-numbers
   (1, 1a, A,...)

Verse-numbers
   (1, 1a, A,...)

Bible versification schemes
   (prot, cath, heb, niv, nasb....)



-------


Ref system definition (change scheme to system) consists of:

Name/ID of edition (possibly abstract)
    (heb, prot, cath, orth, niv, nasb...)

Display names (by language)

Description

declarations of predefined component types used
    (later decided components will all be OSIS-defined)

derivations of new component types used
    (not)

declaration of the aggregation (canonical identifier) form:

    List of component types
       (global: separated by dots)
       name, and whether optional

    where to get default for missing components
       Inherit down tree via attribute osis:compnoent-name='value' ?>
       <?OSIS work=bible edition=NIV book=Genesis ?>
          in header
          latest preceding PI?


declaration of scheme this one is based on


Provide OSIS-space attributes for each components, that inherit
    (down the tree)
    (through milsestone-pairs)

...

(attr per component is grody; single attr is ugly for inheritance)

maybe have just two attrs:

    osis:work     Bible.NIV
    osis:ref      Gen.1.1

osis:work level is globally defined by us, and typically inherited 
like xml:lang

osis:ref is the in-document locator, defined by the work's 
reference-system-dcl. This has to be given starting at the top; no 
defaulting.

This has advantage that you can define element types per work,that 
default the osis:work attribute so that you can be terse: 
<josephus-ref> vs. <bible-ref>


Don't try to validate number ranges via refsysdcl; somebody else's problem.

RSD can declare list of OSIS-definded component types, plus min level 
to be specified.


Mapping:

<corr from=ref to=ref/>

Anything not listed is assumed to match up.

Do we treat anything as being ordered? Like for mapping ps.1.h-1 to ps.1.1

Allow
    <corr from="Ps.1.1-Ps.1.20" to="Ps.1.2-Ps.1.21">

    This does stupid lexical iteration over the range.

    This applies is they renumbered the same stuff.

    If they re-order text and numers together, that's just rendering.

    If they re-assign text to different numbers, then *that's* a change.

    Can't do this across chapter boundaries, since wouldn't know last verse num

    Special case for fine-grain stuff like word/char?

(Prob if not special-cased in spec: lower level(s) require software 
counting; may need to distinguish in the ref system dcl file.

<ident-dcl min-level=book>
    <component name=book    type=OSIS:bbn/>
    <component name=chapter type=int/>
    <component name=verse   type=int_letter/>
    <component name=word    type=int   intrinsic=wordtoken/>   ???
</ident-dcl>

<map from=NIV> <!-- to is myself -->
    <corr from='Ps'       to='Ps'>                        <!-- implies 
corr of all below -->
    <corr from='Ps.1.H'   to='Ps.1.1'>
    <corr from='Ps.1.1-Ps.1.30'    to='Ps.1.2-Ps.1.31'>   <!-- insert/shift -->
    <corr from='Mk.16.8-Mk.16.30'  to='NIL'>              <!-- delete       -->
    <corr from='Gen.1.1-Gen.1.3'   to='Gen.1.1-3'>        <!-- merge        -->
    <corr from='NIL'               to='Gen.1.4a'>         <!-- insert       -->
       This doesn't say *where* inserted. Do we have to care?
       Anything not stated is assumed to correspond
</map>


(probably shouldn't use hyphen for range delim, conflicts with 
page-range, merged verse idents like in TEV, etc.)

For other languages, they can define their own (say) book names, and 
map to us, but we don't register a new set of booknames for them. For 
blind interchange everybody uses the normative names.

What about numbers? Tibetan digits (possibly not even base 10?)

Could do this as a localization hack:
    name to name
    number to number
    digit to digit

<lang-map lang1=EN lang2=FR>
    <element    l1=parafo    l2=p/>
    <attribute  of-element='*' l1='typo' l2='type'/>
    <attr-token of-element='*' of-attribute='*' l1='kjhhhk' l2='Genesis'>
    <attr-digit l1='i' l2='1'>
</lang-map>

Identifier spaces:

We provide list of works, E.g. journal title
They define types for identifiers (year, issue, date, page-range)
We can then validate loosely, but not strictly for numeric ranges.


How do we deal with distinctions:

Author/work: Josephus Antiquities (and edition?)
Bible/Edition/Book


------------------------------------------------------------------------------------------


2002-01-26 --

Can validate the whole punctuated strings, so can punctuate anyway.

How to punctuate?

a) require dots everywhere -- familiar/easy
a) require spaces everywhere -- easier to validate, conforms to NMTOKENS
b) let rsd declare punctuation between components -- flexible but 
seems excessive
c) anything goes, non-name, non-dot, non-hyphen are delimiter -- way flexible


For async elements (chapter and verse and any future),
    provide container elements and recommend but not require using it 
when possible
    provide start/end pair as well.


do we need DIVs for anything besides linegroups (and OT/NT)? Probably not.

include TEIform everywhere we can

(note types: see other mail)

add word-level annotation element with attributes:
    lemma
    strong's number
    part-of-speech
    morphology
       Include discourse markup?
       what about discontiguous lemmas (LOOK the word UP)? 
Contractions? "Functioning as"
    gloss lang...

    <word x-schemename:POS='N-NM-S'>
    problem: we're sort of constraining all namespaces;
      but we have the right to disallow extended attrs on our elements
    Let them use namespaces, or put schemename inside value?

    later: norm/reg, sic/corr, abbr/expan, translit

should we provide a way to indicate canonical status:
   canonical, apocyphal/deuterocanonical, OT vs. NT pseudepigrapha


should we represent somehow, what portion of the Bible is included in 
the document?

must it include at least one book?

Should we somehow indicate it's just a NT + Psalms?


*** Other projects we should do:

* A collection (and public call for) objects in real Bibles that we 
haven't covered. (cf)

* A collection of markup anomalies in Bible text (cf)

Accessibility information???


must identify the reference system(s) this text supports.
    refsDecl in header to identify the one (for now) used here.

Within notes (cf TEV Gen 1.1) like "The phrase 'in the beginning' 
refers....." Should this be XSEM's <refText> (which is documented as 
where the text should be generated, but their TEV actually encloses 
the text), or just be a type of q?

    Types (source?):
       NOTEREF   -- quoting the very text this note is about
       DOCUMENT  -- quoting elsewhere in this same document
       BIBLE     -- Bible Text in some other version
          (OT quoted in NT may want special formatting)
       READING   -- potential bible text alternative
       OTHER     -- non-Biblical stuff

Should this have a REF attribute on it, or should we embed a REF?
Put on same element to avoid scope ambiguity, and to express that 
this is a special kind of quote, not a coincidence.

should we do salute?
what's like it:
    salutation, poem, closing, letter, hymn refrain in hymn,


5 layout conventions:

    Italics for words not in original

    Divine name issues
       L small caps ORD in all caps if translated from yhwh
       NSRV distinctions of how written:

    When story was written in historical present, translated into past 
tense, they mark all
       verbs with a star in front.

    OT quotes in NT typically italicized, set as indented blocks

    OT quote + inserted word, different italics. (NKJV Matt 13:15 word "Their")

Should we treat these via TEI <supplied> (not quite the same) and 
other elements, or as types or reasons of <em>.


-- 
*****
        Please note new email: sderose@speakeasy.net
        Backup email address:  sderose@mac.com
*****