[sword-devel] XML idea: modular spec

Troy A. Griffitts sword-devel@crosswire.org
Thu, 30 Aug 2001 19:28:00 -0700

Thank you again!  You and Patrick
(http://www.sbl-site2.org/Extreme2001/Concur.html) seem to be on the
same page.  This facet of the implementation seems to benefit in many
aspects with your approach.

The other facet of WHAT, to which your approach beneficially facilitates
dynamic change, is probably in what the Bible domain experts are
interested, and can contribute.

As techies, the HOW is where we find ourselves at home, and you have
some excellent suggestions.  You should consider joining one or more
working groups!

For OSIS to achieve its goals and for us to realize its benefit-- to
provide interchangeable solutions to organizations that meet the
challenges of producing Bible related texts-- at least the base 'what',
must also be defined, or all we've done is generally extend XML
methodologies to all domains (which isn't a bad thing either).


David Burry wrote:
> okey dokey...
> I kind of already mentioned this idea before, but here're some more detailed thoughts and explanations on it:
> XML is ideally suited to represent hierarchical slices of very complex data for exchange with other programs or display engines etc, but not necessarily to efficiently store that extra complex data in its complete form.  By "slices" I mean, let me illustrate:  suppose you have a 3D cube with 3 dimensional data in it, and you want to be able to access any of that data.  The final rendering of the data will be 2D (say for a flat picture), so you can take many different views of that data, all different, but none of them will completely (and efficiently!!) store that data, without duplicating your whole 2D spatial grid a bazillion times for that 3rd dimension!  And even then diagonal/rotated slices could only be approximated from that source data or else the whole thing done again another bazillion times for each possible rotation!  (Rotation of the slice being almost like a 4th dimension.)
> Ok, that's the visual/mathematical idea, now how it works with us:  If we **need** book/chapter/verse granularity for a particular end user application, it's best to give that app an XML representation that does exactly that, no more and no less.  No need for it to bother with verb tenses and historical contexts and fancy quoting dealies, etc, unless the app actually needs it and can handle it.  Likewise, suppose the application is something that reads the text aloud using some computer synthesized voices, you may want to do book/chapter & quote granularity, for instance, because that's what you need in that case.  Using the information about who's saying what it could say it using different voice sounds/accents, and give the ability to jump to any chapter but not any verse.  Another application that does complicated word analysis (i.e. verbs/nouns/predicates/tenses/etc) would want much more detailed info about each individual word, and it may require book/paragraph/sentence
> granularity above the word level **not** chapter/verse, even though it may still want chapter/verse markers (i.e. this time not part of the nested tree structure) so it can give visual indicators to the end user where they are.
> Above I've listed 3 different example slice types, and they do not necessarily mix well together in one single cohesive rigid XML format, but each **does** work quite well as its own independent XML representation of the same general data underneath.
> So the question is.... are you trying to define one single rigid everything-for-everyone-forever XML format or a more modular extensible approach that can represent it in any way needed at the moment?  You can probably see I'd very much prefer the latter  ;o)  Let the engine/library decide what it wants to use for ultimate storage underneath, it may not even be XML (and yet it may be XML if it wants, just that it may instead be some custom binary/etc format for speed and/or space optimization on disk/memory/etc).
> But if the library can transform this stored data into whatever XML format is required on the fly, then that would be really cool, as the end user app would only need light already-built robust tools to mine the data out that it needs since the "picture" of the data it's getting is already suited to its needs.  For instance, a simple XSLT could be employed to create any HTML rendition you want.
> The only drawback might be a way of the app declaring to the library what format it needs, perhaps pass it a DTD or something?  Xpointer and Xpath don't seem suitable for that...  Just that a DTD seems overkill to me, but perhaps not if the standard is kept simple...  but the app also should be passing the library reference ranges and search queries, so...  hmm
> Anyway, that's where I'm at with this idea right now, you can see in the last paragraph that the idea isn't complete yet, but I think it's enough of a start that it deserves merit.
> If this idea doesn't make it into SWORD, it will likely put it into my own existing side project eventually anyway, sample URL was sent earlier. Probably with my own evolving XML spec if others don't agree with me--not that doing my own is bad, it's a natural part of the evolution of these things for someone to jump in and try it first, and then if it works well for others to come on board and eventually for a better designed spec to be agreed upon that is designed by the community at large.  That's especially true for radical things that go against tradition! ;o)
> Dave
> At 05:07 PM 8/30/2001 -0700, Troy A. Griffitts wrote:
> >        But, since you seem to be so xml proper :) ...  WE NEED YOUR
> >(everyone's) FEEDBACK as to what tags should go into an XML markup
> >standard.