[sword-devel] XML idea: modular spec

Trevor Jenkins sword-devel@crosswire.org
Fri, 12 Oct 2001 08:49:13 +0000 (GMT)

On Thu, 11 Oct 2001, David Burry <dburry@tagnet.org> wrote:

[Keep in mind throughout my comments below tht I was a member of the ISO
committee that developed SGML.]

> I've been thinking for a long time about how to provide a reasonable
> storage/index mechanism, and still give the end user interface designer
> access to the complete the Bible in a variety of XML ways depending on the
> needs of the application.  There has been previous discussion on this list
> regarding this, I called it looking at the data in different "slices" and
> Patrick Durusau called it "concurrent markup"
> (http://www.sbl-site2.org/Extreme2001/Concur.html).

I don't remember this discussion because to me "concurrent markup" means
the use of SGML's CONCUR feature; something that is sadly omited in most
SGML implementations. Given that XML exists solely to make programmers'
lives easy and CONCUR is hard to do I don't expect to see it anytime soon
(read ever) in XML.

> What I mean is that, suppose the Bible were stored in a binary/text
> compressed and/or indexed format, but available for query _as_if_ it were
> in this kind of format:
> <version name="kjv">

If you draw your examples from <version name="cev"> or <version
name="tev"> then this nice regular structure breaks down. Both these
translations have occasions where several verses appear together. These
verses are printed with references like 12--13 or 25--28.  In one of the
long list of names in, I think, 1 Chronicles there's even a 28--60.

>    <book name="genesis">
>      <chapter>
>        <verse><paragraphmarker/>contents of verse 1</verse>

I suggest that there's an atribute on these elemenets for chapter and
verse numbers as appropriate. Granted that an SGML/XML application could
count them for use but some translations, eg The New English bIble or the
Complete Jewish Bible have moved some verse around; I think most occur in
Isaiah but I can't remember the exact passages.

>        <verse>contents of verse 2</verse>
>        <verse><paragraphmarker/>etc</verse>
>         ...
>      </chapter>

If the translation has been produced following the philosophy of "formal
equivalence" such markup would be usable. But for translations produced
under "meaning-based translation" then such a scheme may not work. The
CEV, which by the way is my translation of choice, is one such
meaning-based translation.

> (Notice I didn't put paragraphs inside chapters because in fact paragraphs
> can occasionally straddle chapter boundaries.)

This is where SGML CONCUR feature would be ideal. It could also cope with
the implicit problem of sentences that straddled more than one verse and
verses that straddle more than one sentence.

> You can see I'm proposing that the entire thing be duplicated 2 times for
> the simple example above, but it only has to be "vitrually" duplicated,

If the world were to reject the XML heresy and stick with the true faith
of SGML then this duplication would be unnecesary. :-))))))

> ... perhaps
> someone else has even already thought of and done stuff like this.  Anyone
> know of any?

I'm currently looking at XSEM (XML Scripture Encoding Model) from
SIL/Wycliffe as to whether it address the issues that absence of "CONCUR"
raises. XSEM might address all your requiremetns and more, given that it's
come from a group of professional Bible translators whoa re working with
the text 24/7. Anyone interested in this project can find more at
http://www.sil.org/computing/xsem/index.htm and by following the links can
download their DTD and other materials. They have some interesting ideas
but I'm not sure that I'll sit in church following the Bible reading on my
WAP phone. :-)

Regards, Trevor

British Sign Language is not inarticulate handwaving; it's a living language.
Support the campaign for formal recognition by the British government now!
Details at http://www.fdp.org.uk/ or http://www.bsl-march.co.uk/


<>< Re: deemed!