[osis-core] DOM? PLEASE READ.

Thu, 23 May 2002 15:09:14 -0400

Troy,

Replies below:

Troy A. Griffitts wrote:

<snip>

> Right, my point I didn't make too well was that I believe (who am I) 
> that current DOM definition is messed up for representing text 
> documents.  Sorry; can I still be a member of osis-core?
>
Of course!!! I just publically announced a frontal assault on the tree 
assumption of ISO 8879 as something that should be reformed out of the 
standard! Can't get much more disenchanted with the conventional tree 
than I am! I still consider myself a legitimate member of the team and 
certainly you are as well! (In addition to being a very gifted friend!)

>
>>> At one point you all had me convinced that almost everything should 
>>> be a
>>> milestone.  Now you're trying to convince me that we should have NO
>>> milestones.  I'm so distraught. :)
>>>
>> You were present when we discussed this change in Rome but let me try 
>> to reconstruct some of the discussion. I think it was Eric who 
>> pointed out that while they used milestones (a lot) in XSEM, that 
>> actually the times when boundaries are crossed are actually quite few 
>> in comparison with the number of times that they don't. Therefore, we 
>> had this elaborate structure of opening and closing milestones to 
>> deal with a small number of cases. Steve pointed out that using the 
>> prev/next solution from TEI could easily be used with XSLT to 
>> generate the required presentation. Eric suggested that we use the 
>> key/keyRef mechanism to validate the references between the two parts 
>> (more on that follows). I thought it was a fairly elegant solution 
>> and reduced the number of elements and would be easier to teach to 
>> users for the small number of cases where it is actually an issue.
>
>
> Honestly-- thought it was probably my fault for sleeping, or zoning, 
> or something-- I have no rememberance of any of this.  You sure it 
> wasn't after Sunday afternoon?

I don't remember when we discussed it, could probably reconstruct from 
the notes.

>
>
>
>>> I don't think a good argument is that it makes things a little easier
>>> when using standard xml transformation tools.  REASON:  None of our 
>>> real
>>> users will use standard XML transformation tools for a whole Bible. 
>>> They currently can't, unless they own a Cray with 5 gigs of memory.  
>>> And
>>> even if they could, we're not helping simplify the problem; instead, 
>>> all
>>> we seem to be doing is making the simple case work easier, but adding
>>> more complexity to the fully marked up cases.
>>>
>> None of our users write filters to transform hundreds of texts from 
>> text format into various presentation formats either. ;-) Well, only 
>> a small minority of them. ;-)
>
>
> Actually, in my mind, our users ARE exactly these people.  They are 
> Bob, and me, and the sfm-to-OSIS-converter-guy at Wycliffe, and anyone 
> else writing software to deal with texts in this markup.  In fact, I 
> specifically remember Bob backing and defending our decision to use 
> milestones.

If you look at Steve's notes you will see that Bob wanted shadow 
milestones, not ones with start and end semantics. In other words, for 
XPointer, HyTime, etc., you want to mark the beginning of the stream and 
then use pointer mechanisms to say how much of it to get.

>
>
>>> When we start adding modules like translator markup, analytical markup,
>>> etc., we're gonna have total hacks all over to get around this 
>>> "crossing
>>> containment" problem.
>>>
>> Actually not, at least in my opinion. First of all, the crossing 
>> containment situation is the minority (I would guess far less than 5% 
>> of the cases, actually probably less than that.)
>
>
> I'm trying to imagine the finished picture here-- once we start adding 
> more modules.
>
> Imagine a base text marked up simply, with osis core.  OK.
> Then add my <w> tags for strongs numbers and morph.  Probably still OK.
> Then add translators comments.
> Then Kirk's linguistic annotation
> Then publisher preferences (display 'hints'?)
> What else are we gonna add?...
>
> Think NON-XML here: with containment which can cross boundaries, this 
> doesn't scare me.
>
>
>> Second, the translator and analytic markup cases will probably rely 
>> more upon pointing mechanism since they layer onto the text as 
>> encoded, rather than being part of the text layer itself. (Yes, 
>> commentary is part of the text (or tree) but as you and Todd have 
>> often pointed out, they are not part of the "biblical text" that we 
>> are concerned with encoding. I may be pointing to text that crosses 
>> boundaries but that is a different issue and not one that requires a 
>> "hack" to solve.)
>
>
> External markup pointing into the base doc?  I understand your 
> argument.  I think it might be a good workaround for the containment 
> problem in XML.  I don't think anyone knows how or will use such a 
> mechanism for quite some time, if not ever, if the XML spec finally 
> get's adjusted or extended to support such content. 

No, actually I was thinking more along the lines of putting a note in 
the text that refers to the material in the text to which it applies. 
You don't normally find linguistic note inline in the text but somewhere 
else in the document with a note reference in the text that points to 
the relevant material. We tend to (myself included) to overlook that 
sort of pointing in a printed text because we have been taught (or 
learned) to automatically and transparently resolve it. We don't even 
realize that is what it happening unless we really try hard to notice it.

>>> I don't like our new approach, currently, and want to be convinced
>>> otherwise, if you guys will indulge my inquiry.
>>>
>> Assuming that we are going to finish an XML schema, I am not sure 
>> what other approach reduces the complexity of our markup,
>
>
> Well, the complexity issue isn't different with milestones.
>
>
>> deals with the minority of cases by requiring more markup only for 
>> those cases (a good thing in my opinion,
>
>
> Agreed, if indeed you still feel this is the minority of the cases in 
> a fully marked up OSIS text.
>
Annotating the text does not really add any hierarchies to the base 
text, which is where overlap is a problem. Overlap is a problem when I 
am trying to impose varying hierarchies within one text, the bible text 
as you and Todd have referred to it, as opposed to non-bibical 
materials. A footnote, for example, would never have an overlapping 
structure since we don't normally write footnotes with overlapping 
structures. (Excluding medieval commentaries here but I can think of 
ways to deal with those as well.)

>
>> harder cases are the only ones that require more work), and 
>> represents the artificial boundaries that are familiar to readers and 
>> others that are desired by translators.
>
>
> didn't understand 'artificial'?  I would think anything we want to 
> allow an author to contain is a logical container.  Not sure your 
> point here.

Sorry, was a reference to the modern tread to use paragraphs, which is 
probably closer to the reading of the original but different from the 
customary verse structure. Would be easier if we could just drop the 
verses and use some other reference mechanism but suspect it is too 
firmly embedded for any system to not use it and still be popular with 
users.

>
>
> Thanks again for humoring me.  I understand it might be too late to 
> even think about these things, but it seems kindof odd that our entire 
> base approach of marking up has changed in the 3 weeks since the osis 
> conference.  This is our foundation for the spec.  I'm afraid of on 
> what we're building.

It has taken 3 weeks because I simply lacked the time to crank out the 
latest version. The milestones were removed while we were still in Rome 
(in the versions on my laptop). Guilty to offense of not getting things 
out quicker and getting the group more information.

Will try to resolve the issues that Chris raised in his posts as well.

Note that Jonathan Robie approached me about doing a paper at Extreme 
Markup using XQuery to resolve the markup I have proposed into standard 
verses using standard XML software. (I am doing another late breaking 
news on concurrent markup and if both are accepted, will have two papers 
on it at the conference.)

I will be trying to send the schema later today (for me) and hope that 
before I get back to the States you will have something you can test 
with real texts.

Don't feel bad about disagreeing over the best approach! One of the 
reasons why TEI is almost unuseable is that rather than resolve such 
disagreements, they simply put in every requested approach in the 
standard. Not the best approach, in my opinion.

Keep those comments coming in!

Patrick

>
>     -Troy.

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu