[osis-core] Thursday notes

Patrick Durusau osis-core@bibletechnologieswg.org
Thu, 22 May 2003 18:40:40 -0400


Guys,

The notes so far on Thursday.

Patrick

--
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Co-Editor, ISO 13250, Topic Maps -- Reference Model


|------------------------------------+------------------------------------|
|                                    |                                    |
|                                    |   Society of Biblical Literature   |
|                                    |                                    |
|------------------------------------+------------------------------------|
|                                    |                                    |
|                                    |   OSIS Schema & Best Practices     |
|                                    |   Issues                           |
|                                    |                                    |
|------------------------------------+------------------------------------|




Contents

   1. Schema Bugs, Errors, Fixes
     1.1. Dead Elements - Removal Suggested
        1.1.1. <cell>
        1.1.2. milestones
        1.1.3. Chapter and verse - allow variant to have section and
        pargraph?
   2. Content Model Issues
     2.1. <div> attributes
     2.2. Insert <divineName> in <catchWord>
     2.3. lang/script/ews
     2.4. <table> in <p> and <speech>
     2.5. osisID as list, pointing at with osisRef with grain
     2.6. osisRef as list?
     2.7. xml:lang?
   3. Best Practices
     3.1. Major Issues
        3.1.1. Levels of Encoding
        3.1.2. Milestones: Start and Stop
        3.1.3. Predominant hierarchy
        3.1.4. Quotes
        3.1.5. Text in Verses
        3.1.6. Verse splits
     3.2. Lesser Issues
        3.2.1. blockQuote vs. Speech
        3.2.2. Book Titles
        3.2.3. catchWord
        3.2.4. Complex or discontinuous text
        3.2.5. Continuing Paragraph
        3.2.6. Copyright pages
        3.2.7. Cross-References in <title>
        3.2.8. Dictionary
        3.2.9. <div> following <osisText>
        3.2.10. Dublin Core
        3.2.11. endings, multiple
        3.2.12. Footnotes
        3.2.13. Identifier with element
        3.2.14. Identity of books, works
        3.2.15. Introduction
        3.2.16. Introduction content
        3.2.17. Lines within a line group
        3.2.18. Major and minor divisions
        3.2.19. Matthew text example
        3.2.20. Milestone Pairs
        3.2.21. Milestone - remove x- extension attribute to milestone
        3.2.22. Misc. but common structures
        3.2.23. Non-canonical text and speech
        3.2.24. Notes
        3.2.25. Parallel passages
        3.2.26. Poetry
        3.2.27. Presentation Punctuation in References
        3.2.28. Reference encoding
        3.2.29. Reference to entire work
        3.2.30. Special Information
        3.2.31. Split
        3.2.32. Stanza
        3.2.33. Title Page
        3.2.34. Translator practices
        3.2.35. Verse value
        3.2.36. All text in verse element?
        3.2.37. Work related practices

1. Schema Bugs, Errors, Fixes

1.1. Dead Elements - Removal Suggested

1.1.1. <cell>

Has attributes of rows and columns (delete).

Decision: delete.

1.1.2. milestones

osisMilestoneSe, milestoneSe, osisMilestonePt, and milestonePt should be
deleted.

Decision: milestoneStart, milestoneEnd, should be globalWithoutType,
attribute value should be milestoneSE (which does not combine with
osisMilestoneSE to allow extension)

   applications must support BSP (BCV optional)
   Doucment (incl. bible texts) must include osisIDs
   Add chapter to miletoneSE "type" (Incl. X-)
   keep <verse> container
   Add <canonical> attr global, verses default to canonical attribute =
   yes, on note defaults to no, inheritance like lang, if true, requires
   osisID, no other element has a default for canonical, not on
   mileStoneEnd, mileStonePt has canonical
   Sections (divs) are recursive
   Create C elements, add c to milestoneSE type, rm c from div type
   Chapter must break in case of conflict, if crosses chapter, break
   chapter
   chapter is splittable, to split a chapter across a paragraph, need to
   use a milestone. in case of conflicts BCV gives way to milestones
   div does not have a splitID
   osisText cannot be split, div, head.* (all in header becomes
   non-splitable but only on head)

1.1.3. Chapter and verse - allow variant to have section and pargraph?

Should you have to pick one or the other consistently to use in the
document.

Should be consistent unless some reason to change. chapter as containers
and chapters as milestones within a single document.

2. Content Model Issues

2.1. <div> attributes

Add section, front, body, back, titlePage, introduction, index, preface,
afterword, colophon, coverPage, concordance, gazateer, commentary, maps,
entry (as in dictionary), imprimatur,  to type attribute on <div>.

Todd: add majorSection, minorSection, plus the ones above.

2.2. Insert <divineName> in <catchWord>

The <divineName> element is meant to control imposition of styling. If not
included in <catchWord>, must use ad hoc methods, leading to inconsistent
encoding and more complex stylesheets.

add q and divineName, (put all of verse in)

2.3. lang/script/ews

The lang vs. xml:lang issue is already identified. I think we should also
consider adding a script attribute at the same places where lang currently
is.  (Plenty of use cases exist Cyrillic vs. Latin for Serbian being the
most recognizable.)  I think I recall TEI having a  similar facility for
identifying script.

In terms of best practices for these attributes:

lang should be specified as RFC 3066 (currently the only mention of a
language RFC in the schema is a reference to 1766, which this obsoletes,
in the language element)

In addition, we should specify best practices for languages not covered by
ISO 639.  x-E-... was suggested previously as a best practice for
identifying languages included in the Ethnologue, but common practice at
SIL and according to LINGUIST List, seems to be to use x-SIL-...

Additionally, I would recommend we specify LINGUIST List's codes for
languages absent from ISO 639 and Ethnologue, using something like
x-LING-....  (Their codes are available here:
http://saussure.linguistlist.org/cfdocs/new-website/LL-WorkingDirs/forms/langs/GetListOfAncientLgs.cfm

http://saussure.linguistlist.org/cfdocs/new-website/LL-WorkingDirs/forms/langs/GetListOfConstructedLgs.cfm

)

If we choose to add a script attribute, ISO 15924 would be the appropriate
standard to follow, but it is not final.  Their pattern for codes is either
of [A-Z][a-z]{3} or [0-9]{3} (Codes can be found here:
http://www.evertype.com/standards/iso15924/document/dis15924.pdf)

I still don't know why ews is necessary, but it should at least be
confined to some set of standard values if such a thing exists.

2.4. <table> in <p> and <speech>

Should table be allowed in <p> and <speech>?

table and list in p and speech

try to combine speech and q together.

2.5. osisID as list, pointing at with osisRef with grain

Ex. osisID="Matt.1.2 Matt.1.3 Matt.1.4" would osisRef="Matt.1.2@ch14" be
the SAME as osisRef="Matt.1.3@ch14"

Problem is that the grain reference has to have a certain staring point and
that can only be the first osisID. Other issue is blind pointing, how do I
know if the author has used osisID as a list?

2.6. osisRef as list?

Should osisRef be allowed to be a list, like osisID? Would allow <note> to
be applied to discontinuous material, avoiding Todd's annotation extension.

2.7. xml:lang?

Should we replace lang with xml:lang?

3. Best Practices

3.1. Major Issues

3.1.1. Levels of Encoding

Category A: Bibles with just the scripture text and no notion of paragraphs
and organized with a book/chapter/verse hierarchy.

Category B: Bibles with paragraphs but no sections, with the paragraphs
held by chapter <div> elements.

Category C: Bibles with sections and paragraphs, where sections as <div
type="x-section"> elements contain paragraphs.

3.1.2. Milestones: Start and Stop

The use of milestone start and end elements

3.1.3. Predominant hierarchy

What is the best hierarchy to use for biblical texts?

3.1.4. Quotes

Strategy for quotes (not likely to be the predominant hierarchy)

3.1.5. Text in Verses

Strategy to put ALL scripture text within a <verse> element.

3.1.6. Verse splits

<verse> element split in <lg>, <list>, and <table> and should the schema be
changed

Chris: (Personally I think splidIDs are a bad thing in every circumstance
where  I've been forced to use them.  They force text to be encoded in an
extrememly unnatural manner.)

Allowing <l> inside of <verse> and allowing <l> to not require <lg> seems
like it would solve the line-related part of the problem.

It seems that issue 3.2.25. Stanza was the reason <lg> was created, wasn't
it?

Isn't <lg> just a special version of <p> for lines?

3.2. Lesser Issues

3.2.1. blockQuote vs. Speech

Guidelines on usage?

3.2.2. Book Titles

For <title> elements, use the type attributes "short" for the short title
like "Matthew", "mainTitle" for the main title, and "subTitle" for any sub
titles for the book.  (The same could be applied to testaments and book
groups.)

3.2.3. catchWord

Catchword (unbalanced quotes, <divineName>, etc..) Chris: Also consider
inserting <hi> in <catchWord>. This issue comes up in the TEV.

3.2.4. Complex or discontinuous text

Marking AddEsther where chapters interrupt other chapters and alternant
reference systems are present.

3.2.5. Continuing Paragraph

How to best encode a continuing paragraph after a block quote of line
group.

3.2.6. Copyright pages

How to deal with the copyright page and the related <work> element

3.2.7. Cross-References in <title>

Should cross-references following a <title> be placed in a child <title>
element?

Example:

<div type="x-section" osisRef="Matt.3.1-Matt.3.12">

       <title type="section">The Preaching of John the Baptist

       <title type="cross-ref"><reference

       osisRef="Mark.1.1-Mark.1.8" type="parallelPassage">Mark

       1.1-8</reference> <reference osisRef="Luke.3.1-Luke.3.18"

       type="parallelPassage">Luke 3.1-18</reference> <reference

       osisRef="John.1.19-John.1.28" type="parallelPassage">John

       1.19-28</reference></title></title>



3.2.8. Dictionary

How to encode a dictionary and other content at the end of a Bible

3.2.9. <div> following <osisText>

How to best organize top level structure for introduction sections,
mini-dictionaries, glossaries, maps

3.2.10. Dublin Core

What should the DC elements in <work> look like for a document that is a
portion of the entire work (ie. a single book, single chapter, set of
books, range of verses, several sections from different books).

3.2.11. endings, multiple

Multiple endings (marking container elements and osisIDs)

3.2.12. Footnotes

Footnotes (<rdg> and superscripted numbers)

3.2.13. Identifier with element

The question I have is how do associate an identifier with an element. For
example if I wanted to say that a paragraph or other block of text is about
"anger".

The best I can come up with is a <note> element with an osisRef to desired
text.  (Similar to a cross reference)

This will work with scripture text with a well defined reference system,
but for non-Biblical text that often does not have osisIDs this becomes an
issue.  (This would likely be resolved if XPath/XPointer like syntax were
allowed in a reference or reference like element.) (Todd)

3.2.14. Identity of books, works

How to identify a book of Esther (Esther vs. Additions to Esther vs. Greek
Esther).  (Goes with 3.2.4. Complex or discontinuous text, somewhat.)

How to identify books of Ezra/Nehemiah/Esdras.

Depending on these... potentially, how to identify 1-2Kgs/1-2Chr vs.
1-4Kgdms.

How to identify books that occur multiply within a work (e.g. Esther in
NRSVA & others; Psalms in Vulgate; Joshua, Judges, Daniel, & Tobit in
Rahlfs')

3.2.15. Introduction

Should the introduction to a book be marked as a <> or a <list>?

3.2.16. Introduction content

The text found at the front of a bible, testament, book group, or book.
Contain this type of content in a <div type="x-introduction"> element.

3.2.17. Lines within a line group

Use type="q", type="q2" (or similar type names) and a set of other
standardized types to indicate the specific nature of the <l> element.

3.2.18. Major and minor divisions

Major and minor divisions in the text.

3.2.19. Matthew text example

Should Matt.1.2-Matt.1.6a be encoded as osisID="Matt.1.2 Matt.1.3 Matt.1.4
Matt.1.5 Matt.1.6 Matt.1.6a" or osisID="Matt.1.2 Matt.1.3 Matt.1.4 Matt.1.5
Matt.1.6"?  The logic being that "a" is simply a TYPOGRAPHIC mechanism to
indicate that there are two blocks of text with Matt.1.6 in them!  I
believe the latter form is CORRECT even though I have argued for the
alternative in the past.

3.2.20. Milestone Pairs

Some guidelines on how to use milestone pairs for chapters, verses, and
quotes and how to be consistent in the use of milestones vs elements.

3.2.21. Milestone - remove x- extension attribute to milestone

To make consistent with new milestone

3.2.22. Misc. but common structures

glossary, map, mini-dictionary, Thompson Chain Reference (Todd), see
http://www.zondervan.com/media/pdfs/0310912229.pdf for example (I have a
local copy for the meeting. pld)

3.2.23. Non-canonical text and speech

How to encode non-canonical text associated with the start of a speech.
(<seg type="speechStart">She Speeks</seg>)

3.2.24. Notes

A "best practices" guideline for type attributes that indicate the type of
note. Chris: Add cross-reference osisNotes type (unless there seems to be a
better practice)

3.2.25. Parallel passages

Use a <div type="parallelPassage"> with strictly a set of <reference>
elements and the related display text as children.

3.2.26. Poetry

How to encode lines of poetry:  Line breaks, multiple translator specified
line splitting alternatives.  What is presentation and what is data?

3.2.27. Presentation Punctuation in References

The best way to encode a series of references with various presentation
punctuation.

3.2.28. Reference encoding

In OSIS documents that are not Bibles, it is a common to see a quote of
scripture text followed by the reference.

Does it make sense to add an optional osisRef attribute to <q> and
<milestoneStart> to accommodate this frequent issue?

Example: <q  osisRef="Matt.20.28">The Son of Man did not come to be served,
but to serve . . . </q>

rather than  <q>The Son of Man did not come to be served, but to serve . .
.<reference osisRef="Matt.20.28">Matthew  20:28</reference></q> (Todd)

3.2.29. Reference to entire work

Introduction to CEV has references to entire works. How to encode? (Todd)

3.2.30. Special Information

How to preserve special information related to verse numbers that can not
be represented with an osisID ("*1") at Bible.CEV.Gen.49.1. (Is this really
a note?)

3.2.31. Split

Only split AND only use the attribute "splitID" for the following elements:
<verse>, <div type="chapter">, <div type="x-section">, and <p>.

3.2.32. Stanza

How to best encode "stanza" in the Psalms.

3.2.33. Title Page

How to deal with the title page

3.2.34. Translator practices

How to encode the idea created by the translator that leads to a blank line
being rendered?  (This would be additional spacing than would normally
exist between two paragraphs to emphasize a shift in thought.)

3.2.35. Verse value

Use the n attribute to indicate the verse value to present when more than
one value is present in an osisID.  (Eg.  <verse n="1-2" osisID="Matt.1.1
Matt.1.2">)

3.2.36. All text in verse element?

should all canonical text be in encoded in a verse element?

3.2.37. Work related practices

Work related practices for different scenarios. (I am thinking primarily
about the "thisWork" element.)

Society of Biblical Literature | SBL

Date:   Author: .
© Society of Biblical Literature.