[osis-core] Element Review: <div>

Harry Plantinga osis-core@bibletechnologieswg.org
Wed, 19 Jun 2002 11:37:22 -0400


Patrick,

It occurred to me last evening that the <div> model I was
proposing probably wouldn't be workable in the OSIS schema.
<div> is probably going to be used for multiple purposes --
one, to divide up a document into books, chapters, sections,
and similar structural divisions, like a table of contents,
and two, as a generic <p>-level container, use for example
to make a bunch of paragraphs centered. The latter type of
use wouldn't work with the content model I proposed.

I don't think normal table-of-contents-like usage would
allow a situation like this:

  <div type="chapter">
     <title>My First Chapter</title>
        <p>some content</p>
            <div>
                <title>Section 1</title>
                    <p>some more content</p>
            </div>
        <p>back to first div</p>
  </div>

In a normal print Table of Contents, if you had entries

Chapter 1.
  Section 1.
  Section 2.

you wouldn't be able to have some paragraphs after the start
of Section 2 that belong to chapter 1 but not section 2. Once
there is a new TOC entry, all previous entries are closed. I think
that the tail-recursion model models accurately the way tables
of contents work.

ThML splits those two functions -- generic paragraph-level
containers and modeling the TOC structure -- with <div>
elements and also <div1>, <div2>, <div3>, etc., which perform
the table-of-contents-like function. That way one can be a
little more strict about the way the table-of-contents
stucture containers are used.

If you prefer unbounded recursion to a fixed number of TOC
levels, the table-of-contents function could also be handled
with a new container, e.g. <section>, which could contain a
random mix of paragraphs and <div>s and use the tail recursion
content model.

Of course, adding a whole new element at this point is an
unattractive option.  If it were not added, we could get by.
I think what I would do if I were spec'ing an OpenOffice Writer
OSIS helper is to reserve special div types for table of contents
entries.  E.g.

<div type="toc1" divTitle="Chapter 1. Hello world">
 <div type="toc2" divTitle="Section 1. Hello">...</div>
 <div type="toc2" divTitle="Section 2. World">...</div>
</div>

In saving a Writer document, I would use toc entries to generate
<div>s like this (The content model would de facto end up
matching the tail recursion model).  In reading in an OSIS
document into Writer, if there were <div type="tocn"> elements,
I'd have to split them into pieces to make them conform to the
tail recursion model.

SUMMARY. I believe that the tail-recursion model is the right
one for the table-of-contents function, but it would not allow
for use of <div> as a generic paragraph container. The table-of-
contents function might be important enough to have its own tag,
but that function could also be squeezed into the existing
<div> as described above by overloading with special type attributes.

(That's the nice thing about a type attribute. You can model anything
with it.  Need tables?  How about <div type="table"> <div type="tr">
<div type="td">...)

-Harry

-----Original Message-----
From: owner-osis-core@bibletechnologieswg.org
[mailto:owner-osis-core@bibletechnologieswg.org]On Behalf Of Patrick
Durusau
Sent: Wednesday, June 19, 2002 9:36 AM
To: osis-core@bibletechnologieswg.org
Subject: Re: [osis-core] Element Review: <div>


Harry,

Harry Plantinga wrote:

>Patrick,
>
>You are right.  I didn't see the div option in the choice
>element. I would remove it.  What I am trying to accomplish
>is to have a model like this:
>
><div>
>  [title, head, paragraph-like stuff]
>  [zero or more divs]
></div>
>
>This gives us what a computer scientist would call tail
>recursion, which is easier to handle.
>
Another attempt to see if I understand the model! ;-)

<div type="chapter">
    <title>My First Chapter</title>
        <p>some content</p>
            <div>
                <title>Section 1</title>
                    <p>some more content</p>
            </div>
</div>

would be permitted, but:

<div type="chapter">
    <title>My First Chapter</title>
        <p>some content</p>
            <div>
                <title>Section 1</title>
                    <p>some more content</p>
            </div>
        <p>back to first div</p>
</div>

would not?

Assuming this is a correct representation of the proposed model (please
correct if not!), I am not sure that it is an adequate model for
encoding a wide range of texts. If we were authoring the texts, sure,
that model would work but I am not certain that we don't have texts that
exhibit the behavior seen in the second example.

Also not certain that tail recusion has carried the day in terms of
processing XML files. See
www.xmlpitstop.com/XMLJournal/Article6-November2001/ November2001.pdf
 for a brief discussion of tail recursion. Beyond that, it is also not
clear that usefulness of particular parsing techniques should drive our
markup practices, although from an implementers standpoint (and as a
user) that is something that should be kept in mind.

Comments?

Patrick

>
>-Harry
>
>-----Original Message-----
>From: owner-osis-core@bibletechnologieswg.org
>[mailto:owner-osis-core@bibletechnologieswg.org]On Behalf Of Patrick
>Durusau
>Sent: Tuesday, June 18, 2002 4:50 PM
>To: osis-core@bibletechnologieswg.org
>Subject: Re: [osis-core] Element Review: <div>
>
>
>Harry,
>
>Thanks for the note because that was not what I understood you to be
>suggesting.
>
>Harry Plantinga wrote:
>
>>Just to make sure the proposal is clear, what I am proposing is
>>to change the schema from
>>
>><xs:element name="div">
>><xs:complexType mixed="true">
>> <xs:sequence>
>>  <xs:element ref="div" minOccurs="0" maxOccurs="unbounded/>
>>  <xs:choice minOccurs="0" maxOccurs="unbounded"> [lots of stuff here]
>></xs:choice>
>> </xs:sequence>
>>
>>to
>>
>><xs:element name="div">
>><xs:complexType mixed="true">
>> <xs:sequence>
>>  <xs:choice minOccurs="0" maxOccurs="unbounded"> [lots of stuff here]
>></xs:choice>
>>  <xs:element ref="div" minOccurs="0" maxOccurs="unbounded/>
>> </xs:sequence>
>>
>Not sure how the suggested solution differs from:
>
><xs:element name="div">
>  <xs:complexType mixed="true">
>    <xs:choice minOccurs="0" maxOccurs="unbounded">
>        Lots of stuff but certainly:
>        <xs:element ref="div"/>
>    </xs:choice>
></xs:element>
>
>If you are going to have div as an optional element inside of the first
>choice, I am not sure what we are gaining from the second optional div?
>Or am I missing something real obvious?
>
>I am assuming that we allow divs to be siblings in the osisText
>container. (or was a sibling relationship the intent of your second
>optional div?)
>
>osisText then becomes
>
><xs:choice>
><xs:element ref="header" minOccurs="0"/>
><xs:element ref="div" minOccurs="1" maxOccurs="unbounded"/>
></xs:choice>
>
>Patrick
>
>
>>
>>-----Original Message-----
>>From: owner-osis-core@bibletechnologieswg.org
>>[mailto:owner-osis-core@bibletechnologieswg.org]On Behalf Of Patrick
>>Durusau
>>Sent: Tuesday, June 18, 2002 3:46 PM
>>To: osis-core@bibletechnologieswg.org
>>Subject: Re: [osis-core] Element Review: <div>
>>
>>
>>Chris, Harry, Guys,
>>
>>So, instead of forcing:
>>
>><div>
>><div>
>>
>>The voiced preference is for:
>>
>><div>
>>stuff (to use the technical term) ;-)
>><div>
>>
>>?
>>
>>Looks like a problem between keyboard and chair to me. ;-)
>>
>>Unless someone submits killer analysis between now and tomorrow morning,
>>consider it done! Look for it in osisCore_1test13.xsd.
>>
>>BTW,
>>
>>Comments on Harry's echoing of Todd's proposal to collapse
>>front/body/back (they all have the same content models) into div and
>>direct people to use the type attribute?
>>
>>I must confess a vague unease with the proposal but I have no principled
>>argument to make against it. I suspect my reluctance is due to long
>>association with more complex encodings where that distinction makes a
>>difference in content models. Since ours does not, hard to see a reason
>>to not collapse into div other than as syntactic sugar. Not a
>>particularly compelling reason, even to me.
>>
>>Show of hands? (Sans any stones please!)
>>
>>Patrick
>>
>>Harry Plantinga wrote:
>>
>>>I would like to second this comment -- I think the content
>>>model should be changed to have the other stuff, followed
>>>by optional nested <div>s.  I consider this to be an
>>>important change.
>>>
>>>The proposed content model, with the nested <div>s as the last
>>>elements in a <div>, is very commonly used in real texts,
>>>as in Chris's example below. On the other hand, I can't think
>>>of any situations where you'd really need to have a nested
>>>div followed by some <p>s, a <divineName>, a <title>, etc.
>>>
>>>Second, (and here's my ulterior motive,) if any nested <div>s
>>>have to be last in a <div>, the content model is easier to
>>>parse.  You can parse it with start indicators only, no end
>>>indicators needed. If end indicators are omitted, and you are
>>>using the current content model, parsing problems arise in a
>>>situation like this:
>>>
>>><div divTitle="Chapter 1">
>>><div divTitle="Section 1">
>>>  <p> some stuff here</p>
>>>  <p>I want this paragraph to be in Chapter 1 but not
>>>   in Section 1. How do I do it? I can't!
>>>
>>>Why do we care about omit-endtag in XML, you ask? I'm glad you
>>>asked.  We care because we are hoping to edit these documents with
>>>a non-XML application such as OpenOffice Writer. With the proposed
>>><div> content model, you can use the built-in Table of Contents
>>>facility -- or paragraphs of style "div" -- to identify the
>>>start of each div. With the current content model, you'd also
>>>have to insert indicators for the the end of each div, which would
>>>be less intuitive and more difficult.
>>>
>>>-Harry
>>>
>>>
>>>-----Original Message-----
>>>From: owner-osis-core@bibletechnologieswg.org
>>>[mailto:owner-osis-core@bibletechnologieswg.org]On Behalf Of Chris
>>>Little
>>>Sent: Monday, June 17, 2002 2:15 AM
>>>To: osis-core@bibletechnologieswg.org
>>>Subject: Re: [osis-core] Element Review: <div>
>>>
>>>
>>>I'm frequently finding text portions that would best be marked as:
>>>
>>><div>
>>>	<title>Some title</title>
>>>	<div>
>>>		<title>sub-section 1 title</title>
>>>			some CDATA
>>>	</div>
>>>	<div>
>>>		<title>sub-section 2 title</title>
>>>			some CDATA
>>>	</div>
>>></div>
>>>
>>>However, because all divs must precede any other element in a div, this
>>>is illegal.  I don't understand why this is the case and it seems like
>>>div should just be added to the choice in div, and the sequence be
>>>
>removed.
>
>>>--Chris
>>>
>>>
>>>Patrick Durusau wrote:
>>>
>>>><div> has: <blockQuote>, <div>, <figure>, <lineGroup>, <list>,
>>>><milestone>, <p>, <q>, <seg>, <speech>, <title>, <verse>.
>>>>
>>>>Question: So we don't have phrase level markup in a <div>? That is how I
>>>>see the content model for <div>, with things like <abbr>, <speaker> and
>>>><w> (to take only three examples) as always occurring in larger
elements.
>>>>
>>>>Patrick
>>>>
>>--
>>Patrick Durusau
>>Director of Research and Development
>>Society of Biblical Literature
>>pdurusau@emory.edu
>>
>>
>
>--
>Patrick Durusau
>Director of Research and Development
>Society of Biblical Literature
>pdurusau@emory.edu
>
>

--
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu