[osis-core] Very rough notes from meetings of 2004-01-30

Steven J. DeRose osis-core@bibletechnologieswg.org
Fri, 30 Jan 2004 18:53:26 -0500


On fonts --

Note: Current doc says XML declaration must be provided "exactly as 
shown" -- this is an error, because there was no intent to limit 
encodings t UTF-8.

Separate two concerns strongly:

	Document Encoding/markup

	Character Encoding/Fonts

Font problems:

    Old texts with unknown encodings, such as hacked upper-half 8-bit fonts

    Texts in languages requiring non-Unicode characters

Much Japanese (and other?) software just defaults the PUA to its own 
choice, and just won't let you add any characters that will actually 
ever get rendered.

UBS and SIL have a committee to develop a standardized set of PUA additions.

Question: Does it suffice to have a font(s) for that encoding?

Todd: UBS moving toward requiring Uniscribe implementations of 
OpenType for everything.

Kees: Not really

Largely a documentation issue -- which is what the joint committee does

pld: We have a format element in the header. We can state that best 
practice is that the "format" element must be written in Latin-1.

[sjd: Problem: getting right glyph, vs. processing the glyph right (kern etc.)

Dennis:

(caveat: doesn't speak for all SIL, exists diversity; but speaks for NRSI).

SIL commitment to unicode; but hard road/issues.

Big concern over empowering minority languages:

	Need data that lasts (e.g., future revisions)

	Uniformity over ~1200 projects

Developing a font (due May) = Doulos SIL -- font meant to handle all 
current char needs.

Want to move these chars out of PUA eventually.

scripts.sil.org search PUA see article by Peter Constabile

Exist problems w/ rendering PUA problems -- ultraXML (3B2 branch) has potential

Graphite purposes: flexibility that Uniscribe doesn't have 
(hard-coded script knowledge).

Both Graphite and OpenType can co-exist in font info.

Also provides prototype of how to deal with minority languages

Linux needs to learn a lot -- they're so open that minority languages 
end up not supported.

What are the specific Uniscribe problems:

Behaviors hardcodes behavior by languages

"gsub table" per language; but for some they hard-coded. also wasn't 
enabled for Roman scripts.

similar gpos for placing diacritics

Uniscribe also has limited power -- have to enumerate all cases, 
can't just write an algorithm.

Some minority languages use a majority writing system -- almost. But 
they've used it in a unique way for centuries.

Graphite has conditional expressions, loops, etc. rather than.

OpenType depends on the second layer, that Uniscribe provides. 
InDesign has a replacement for InDesign.

Urdu updrifting placement

Question: do the PUA chars include diacritics?

Exists a graphite-based Mozilla

Can OSIS help?

Talk to John Hirst about getting Bible Forum involved in PUA 
agreement. Add to doc.

Joint meeting for printing issues? Informative, short.... 
Documentation and disclosure

Can OSIS endorse this process?
*** sjd to write one-page endorsement letter

How do we get people to move on to Unicode/PUA?
*** sjd & pld to write couple page case for going this way, for 
translators/users



Task: pld/sjd write proposal to if possible add small subset of TEI 
app crit stuff, to do variant readings.
Cover cases 1, 2, 3, 5, 6 as types of rdgGrp/rdg.

Punt on 4? Or add some kind of "effective-editions" attribute; or 
overload rdg (hopefully not)

Special 'poetry': qh/qc/qr/qm/qs/qcs

If you can identify cause, do it as type on l:
    Add enum types for selah, discourse-marker,

If you can't identify reliably, use <seg type='x-...'> to maintain 
the ambiguity.

NIV seems to have discourse-markers distinguished by center vs. right-just.
We had considered it at last meetings -- pushed off to LAWG (since 
discourse is LA)?

Maybe just add an element? <discourse type=''>  <discourse-marker 
type=''> <opener>/<closer>
<boundary-marker>  <dbm type=''> (discourse-boundary-marker) open/close/selah

Todd: possibility of adding "u-" types for quasi-official q types 
(center/right...).

Advantage: encourage people to keep the distinction around, but 
without fully dignifying it. Vs. <hi> or <seq>. Usually set as lines, 
but RSV runs them in.

Add <l type='selah'> ? Or leave as seg with some enum types. Do we 
ever kill it?

seg now has OTpassage now.

Task: Add <dbm type='open|close|x-'> Caveat: not for true linguistic 
discourse markup, for which see LAWG.

And that's what he said; Thus saith the God; Joel 3:21. How do we 
know when we see it?

Should we add div type=epistle for letters

Selah: Psalsm, Lam 1:15, Habakkuk 3.... all in line groups -- 
proposal: create <selah>, or <l type='selah'>; allow in lg and l. 
Could have something in content other than "selah".


--- quaker polls on discourse markers and selah (see notes from Kirk)

Versification systems:

"Bible" as "the abstract system"

There is no one abstract system, only choices among exisiting ones 
(or making a new ones).

First: must add a document genre to yesterday's list for refsysdcl.

sjd: Don't call it Bible, we'll confuse people between refsys and bible.

Who do we pick? UBS has a set of 6.

Need to produce machine readables, but meantime can

Task: Kees send list of the 6 names, and the files listing what each contains.

pld: Add those names to the manual

pld: Add referencesystem genre

The UBS: original, lxx, vulgate, english, russian prot, russian orth, 
dutch trad.

sjd: figure out a file syntax.


Question on SFM formatting info: have helped with the <l 
type='us...'> additions.

But Jim A send a bunch of formatting cases.

EG: XSem had ability to mark what a blockquote was: letter/prayer/....

Where do we record this? type/subtype? Or div? but then can't go in paragraph.

Task: work with Jim A on the syntactic choices

Multiple versification systems overlapping: Allow, but state that 
verses from the same system must not overlap. Add example to manual.

verse/chapter starts: add to schema

footnote consolidation: 2 problems:

1: not duplicating footnote text

2: putting same footnote ref number at several ref points, all 
pointing to same footnote in footnote area.

1 can be handled for free with XML entities

2 is a rendering issue?

Could facilitate 2 with more workup (like in example).

Didn't seem to be enthusiasm right now to address this.

Title in work is required, but needs to be maxOccurs=unbounded

For title parts: nest 'em

Should we deprecate title type=sub in favor of just using nesting?

subtitle can be orthogonal to title nesting, so keep both.

continued title? pld: try to find why we ever did this. intervening stuff.

3 remaining nathan issues:

index entries: got that, we think.

way to tag biblical book names -- mainly in book intros; also "and 
lo, are not the rest of his deeds written in the book of the kings of 
Israel?"; also "somewhere in the scroll of Isaiah" --- tenatative: 
use reference.

highlighting "nd" of 2nd.... <hi type='ordinal'> -- add super sub to 
<hi> types.
-- 

Steve DeRose -- http://www.derose.net
Chair, Bible Technologies Group -- http://www.bibletechnologies.net
Email: sderose@acm.org  or  steve@derose.net