[sword-devel] Standardizing on a Web Infrastructure and Web Service API for Scripture

Steven DeRose sderose at speakeasy.net
Mon Dec 21 08:02:48 MST 2009

Much more I could say on this, and no doubt others will jump in; but let
me answer one key question that affects all the others:

OSIS *does* use a pre-existing XML vocabulary: OSIS is almost entirely a
pure subset of TEI. The extensions are tiny, and very specific to
Biblical materials (for example, a very specific encoding for Biblical

TEI has many millions of $, over 20 years, and many thousands of expert
hours of labor in it. It is almost universally used for serious encoding
of texts of literary, linguistic, and historical texts. This you can
easily verify via Google or at your local university. If someone wants a
grant to encode some important work, say from the National Endowment for
the Humanities, the Mellon Foundation, or other large-scale funders,
using anything *but* TEI is so unusual that they need to specifically
make a case for it in their proposals (certainly in a very specialized
case that can be done; but TEI has proven so valuable and so effective
that it better be a very specialized case before one gives up the huge
advantages of TEI). There are countless projects using TEI throughout,
thus lots of tools and expertise available. 

Also, a lot of the TEI data is data that has important connections to
the data OSIS people care about -- the collected works of important
theologians, historians, and philosophers; the Greek and Latin classics;
English and other literature that explores Biblical themes (Dostoevsky
and Milton, to name two of the most obvious examples). Few if any
serious projects relating to any of this, use HTML or XHTML for their
data. Of course most everybody delivers HTML to browsers; but it's
trivial to convert TEI to HTML or XHTML, and extremely non-trivial to go
the other way.

XHTML5 is a fine thing, obviously far better than HTML itself. But it
gives you no rules about the things that OSIS specifies. It gives you
almost no semantics for the things it defines (other than layout). And
it lacks tons of specific things: poetic markup, epistolary units of all
kinds, Biblical and other formal references schemes. TEI and OSIS
provide all this kind of stuff.

If you go with "XHTML5", you will inevitably find yourself re-inventing
OSIS-like conventions: What names/abbrevs will you use for books,
translations, and the like? How will you punctuate References? What
syntax will you use for range references? How will you represent the
various kinds of notes, and where will you place them? What will you do
when verses and paragraphs overlap? How will we distinguish the
canonical texts from notes, headings, and so on?

Countless such questions arise, and if you go with XHTML5 (or
XHTML349.2, for that matter), you will have to make up your own answer
to each. At that point, it shouldn't surprise you that every other
project comes up with a slightly different set of answers. And that
means that every time you pass data from project A to B, the developers
of either A or B (or both) have to write converters. Sounds like a waste
of time (= poor stewardship) to me. At the very start of a project many
of these questions may seem trivial or irrelevant; but as  your project
grows they'll all arise and you'll either make a decision; or you can
decide not to decide -- which is itself  a decision against consistency,
portability, and verifiability. 

It seems to me inaccurate to say that there is some massive range of
tools for XHTML but not for XML. There are lots of HTML tools, but if
you look at their output you'll find that they almost all produce HTML
so messy (often invalid, seldom XHTML, and sometimes not even
well-formed), that you'll either end up with data that can't be used in
much of anything *except* browsers, or you'll end up writing all that
conversion/cleanup code again. If I were a wagering man, I'd wager a lot
of money that you've already had to do some of that. If you've got the
development skills to modify open-source XHTML tools (which were you
thinking of?) to support your own extensions, then you could modify them
to do OSIS with little more work (and if you use XML tools, you get most
of that support for free with XML Schema, Schematron, etc.). 

Is there any XHTML5 tool out there that can't deal with arbitrary XML?
Not many; because it's a silly move on the developers' part to make one;
that's because the incremental work is trivial -- if you already support
styling tag X a certain way when X is a member of the fixed list of
XHTML tags, you already know how to support styling tag X when X is
*not* a member of that fixed list. There is also a vast range of general
XML tools out there, and in general they provide far more functionality
than HTML or XHTML tools (simply because you have to to not be laughed
out of the XML marketplace).

Steve DeRose

On Sat, 2009-12-19 at 00:22 -0800, Weston Ruter wrote:

> Thank you so much, Stephen. Your historical information is extremely
> helpful.
> Is anyone able to address the current state of OSIS and future plans
> for the standard? Namely, how is it currently addressing Stephen's
> points:
>      1. OSIS not being designed for delivery of partial documents,
>      2. Its large metadata overhead,
>      3. Ability to include “virtual” elements, as is required for
>         partial documents.
> Furthermore:
>         For the ESV Study Bible in 2008, we again considered using
>         OSIS as the primary XML format for the notes and quickly
>         decided to go with XHTML5 instead. There are so many more
>         tools for dealing with HTML designed to solve real-world
>         problems; it was more efficient to use HTML even though it
>         didn't map perfectly to our domain.
> This identifies a concern I have about OSIS and how it relates to
> other XML vocabularies, namely XHTML5. OSIS defines many elements (a,
> abbr, figure, header, table, date, div, hi, list, p, q, etc.) which
> are already assigned rich semantics and presentational logic in the
> XHTML namespace: why not reuse existing XML vocabularies instead of
> independently (re)defining them? If OSIS depended on XHTML:
>      1. It would make OSIS able to be directly embedded into (X)HTML
>         web pages and be properly understood by the browser: Bible
>         websites could extend their existing HTML websites with OSIS
>         markup to make them more semantically rich, readable both to
>         machines and web browsers. 
>      2. Existing WYSIWYG HTML editors could be more easily extended to
>         support the additional OSIS-specific markup. 
>      3. Having OSIS rely on XHTML would also greatly reduce the size
>         of the OSIS specification, and new authors would require much
>         less time to get up to speed because the spec would only
>         define the elements unique to scriptural markup.
> So I wonder if an OSIS 3.0 could then explicitly reference the
> relevant elements from other XML vocabularies, especially XHTML5?
> Thoughts?
> Is there anyone currently active at the Bible Technologies Group?
> Blessings,
> Weston
> 2009/12/16 Stephen Smith <stephen.smith at gmail.com>
>         There are several reasons why Crossway's XML differs from
>         OSIS:
>         1. As David Eyk notes, we created the existing XML documents
>         in May-
>         June 2002, when OSIS was still in flux. In particular, the
>         milestoning
>         process was much more complicated.
>         2. We were working from initial XML files provided by a vendor
>         and
>         didn't want to change them too much.
>         3. OSIS is paragraph-based, rather than verse-based, making it
>         difficult to meet our immediate need--loading the data into a
>         relational database.
>         4. At the time, OSIS had some mandatory structural elements
>         that we
>         weren't able to create.
>         5. I was hoping that someone else would take the XML from the
>         web
>         service and write an XSLT to transform it into OSIS so we
>         didn't have
>         to.
>         6. OSIS wasn't designed for delivery of partial documents: it
>         wasn't
>         immediately clear to me how to structure the metadata in a
>         response
>         when someone is only looking at, say, John 3:16. Further, the
>         metadata
>         overhead in such a request, as compared to the desired
>         content, was
>         prohibitive. Partial documents also require the use of
>         "virtual"
>         elements--you need to add beginning and ending paragraph tags
>         if
>         you're looking at a verse that appears in the middle of a
>         paragraph,
>         for example, and open/close quotes properly. I don't believe
>         that OSIS
>         has a handy facility for including these kinds of elements.
>         As for mapping the Crossway XML onto OSIS, it should be
>         straightforward. Everything we did with the ESV we did with
>         the goal
>         of producing a world-class OSIS ESV by 2012; I tried to do one
>         big
>         project per year to create metadata required by OSIS. Between
>         2002 and
>         2007, we created metadata and evolved the schema to map
>         cleanly to
>         OSIS--upgrading the quotation system, classifying footnotes,
>         adding
>         catchwords, categorizing names, identifying speakers of
>         quotes. All
>         this metadata uses OSIS vocabulary where possible. (Most of
>         this
>         metadata isn't available through the API.) Even after this
>         work, it
>         will still take many more hours to produce a document that
>         fully
>         conforms to OSIS at the "Scholarly" level defined in the spec.
>         The goal has always been to move away from the Crossway XML to
>         a
>         compliant OSIS document. I just never felt we could produce
>         documents
>         that conformed to the Scholarly OSIS Document / Trusted
>         Quality
>         requirements. I saw no point in releasing anything at a lower
>         conformance level unless, as I mentioned, someone wanted to
>         create an
>         interim XSLT. Further, as nearly all consumption of the ESV
>         API was
>         through the HTML format, there wasn't a lot of demand for the
>         XML.
>         For the ESV Study Bible in 2008, we again considered using
>         OSIS as the
>         primary XML format for the notes and quickly decided to go
>         with XHTML5
>         instead. There are so many more tools for dealing with HTML
>         designed
>         to solve real-world problems; it was more efficient to use
>         HTML even
>         though it didn't map perfectly to our domain.
>         I hope that answers your historical questions.
>         Stephen
>         On Dec 16, 4:02 am, Weston Ruter <westonru... at gmail.com>
>         wrote:
>         > Greetings Crossway, CrossWire, the Bible Technologies Group,
>         SBL, and
>         > esteemed members of the Bible+Tech community:
>         >
>         > I am researching data formats used to represent scripture—
>         including XML
>         > vocabularies, DB schemas, and *ad hoc* text file formats—
>         with the hope of
>         > contributing towards the development of a standard API that
>         is able to
>         > commonly represent all of the constructs used by each. With
>         such a standard
>         > API, the hope is that (web) developers would be able to
>         access scriptural
>         > data from the array of Bible societies (e.g. Bible.org)
>         using one
>         > standardized web service interface (i.e. that mashups of
>         multiple
>         > translations from different sources would become easy to
>         implement, for
>         > example: <http://pixelfaith.com/bible/#Luke/2>).
>         >
>         > I have been studying the Crossway XML format and I am
>         curious as to why
>         > Crossway didn't use OSIS. Were there any limitations in OSIS
>         that caused you
>         > to develop your own XML vocabulary? Furthermore, why has
>         development of OSIS
>         > seemed to have ceased with the last revision being over
>         three years ago (6
>         > March 2006)? Moving forward, has any discussion happened
>         regarding merging
>         > Crossway XML into an OSIS 3.0?
>         >
>         > More to the crux of my inquiry, has Crossway considered any
>         collaboration to
>         > standardize an API such as you provide to access the ESV? Or
>         is anyone aware
>         > of any such effort currently being worked on? I am aware
>         through Troy
>         > Griffitts of the web service API the CrossWire Bible Society
>         has developed
>         > in coordination with the development of OSIS, and I am in no
>         way wanting to
>         > supplant their excellent work. But I am interested in
>         looking at what a
>         > Web-centric API would look like built from the ground up
>         using the latest
>         > Internet standards with an eye for Ajax applications, web
>         mashups, and (most
>         > importantly) semantically Linked Data. (I would hope any
>         efforts in this
>         > area simply flow back into CrossWire's efforts for the next
>         version of their
>         > API, which could perhaps then be more widely adopted.)
>         >
>         > What OSIS seeks to do for markup, I would like to see done
>         with an API to
>         > give developers a standard way of accessing the data in the
>         texts on the
>         > Web. In other words and in short, I am interested in the
>         development a
>         > standardized web service API and Document Object Model (DOM)
>         for OSIS.
>         >
>         > I am presenting this topic at the BibleTech:2010 Conference.
>         >
>         > Obviously, any such standardization effort would have to be
>         a joint effort
>         > by all of us. Looking forward to hearing from you!
>         >
>         > Blessings and Merry Christmas!
>         > Weston Ruter
>         > OpenScriptures.org
>         --
>         You received this message because you are subscribed to the
>         Google Groups "Open Scriptures" group.
>         To post to this group, send email to
>         open-scriptures at googlegroups.com.
>         To unsubscribe from this group, send email to open-scriptures
>         +unsubscribe at googlegroups.com.
>         For more options, visit this group at
>         http://groups.google.com/group/open-scriptures?hl=en.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20091221/f045e884/attachment-0001.html>

More information about the sword-devel mailing list