[sword-devel] HELP! Need your feedback on XML Markup Language

Mike Sangrey sword-devel@crosswire.org
Fri, 17 Aug 2001 12:17:34 -0400


Regarding the problem of overlapping hierarchies:

Yes, indeedy!  This is a problem.

This is, at least from a linguist's viewpoint, one of the major issues
with Sword and/or the XML markup being used.  It is VERY verse
oriented.  Please don't read that as being harsh.  I don't intend
that.  And, since I'm a Software Engineer, I understand how tricky
getting different Biblical texts and associated Bible tools lined up
with each other can be.

However, the XML markup--
   <verse>...</verse>

is very unfortunate.  But why?

>From a linguistic viewpoint, verse structure (except where the
original Biblical text exhibits verse structure, eg. Psalms) conveys
semantic information which simply is not there.  What that means is
that a person studying their Bible will derive meaning information
from the text (because of the verse structure) which was not there.
Yes!  That's right. I'm saying the versification has added meaning to
the Bible.  The reason this is so is because people process meaning in
"packages" so it is very important to the understanding of the meaning
that the meaning of the original be packaged up correctly.  I can get
into this more, but, linguistically, versifying--IF IT STRUCTURES THE
TEXT--is a big problem.  (BTW, if you want to try something
interesting, pull the text of a small epistle like Ephesians into a
word processor, remove chapter and verse numbers, paragraph according
to the Greek NT, print it out and read it.  It reads more smoothly.)

Now, back to the problem.  For example, we run into problems like
this:
   <para>
      <verse>
         stuff
   </para>
   <para>
         stuff
      </verse>
      <verse>
         stuff
      </verse>
   </para>

Which, of course, doesn't work.  Quotes are similar.

What should be done?

The XML schema must be modeled after how the data really works.  Or,
to put it differently, a decision needs to be made regarding the
structure of the data--that is, which markup expresses hierarchical
relationships and which markup simply tags data.  Semantically, these
are two different things.

In effect, what this means, is an XML schema must, IMO, constrain markup
to look something like this:

   <para>
      <verse chap="01" verse="03"/>
        stuff
   </para>
   <para>
        stuff
      <verse chap="01" verse="04"/>
        stuff
   </para>

A markup with <versestart/> and <verseend/> is also possible.

Why this markup structure?

Because verse markup is nothing more than tagging a location.  It
carries no information regarding the actual structure of the data.
The XML markup must keep separate the markup which expresses the
semantics of the structure of the text itself FROM the markup which
expresses the semantics of the presentation.  The former are things
like `sentence', `paragraph', `section' and `subsection', and
`discourse'.  The later encompasses things like the versification
scheme and chapter markup, hints for location of margin notes, etc.

Lastly, for quotes, you will need to do something like <quotestart/>
and <quoteend/> since quotes that span paragraphs are handled
differently in different languages.  For example, English prefixes
each paragraph with a double quote character but the end quote
character appears at the end of the last paragraph.  The semantics of
quotes, therefore, are `quotestart' and `quoteend' (possibly
recursive) and the rendering engine needs to figure out what the
presentation looks like from that semantic information.

Well, that's a lot to digest.  I hope this helps in some way.  I want
to help reveal what I believe to be the real issues without fogging
things up in the process.

Has anyone looked at XSEM.xml as the solution?  It was done by SIL.
It solves all these problems.

May the Lord be benefitted.


-- 
Mike Sangrey
msangrey@BlueFeltHat.org
Landisburg, Pa.
                        "The first one last wins."
            "A net of highly cohesive details reveals the truth."