[sword-devel] XML idea: modular spec

David Burry sword-devel@crosswire.org
Fri, 12 Oct 2001 00:11:33 -0700

By the way, it has occurred to me in re-reading Patrick's page that he 
probably said the same thing I just did, only he said so much in such a way 
it's hard to absorb....  ;o)


At 11:55 PM 10/11/2001 -0700, David Burry wrote:
>I've been thinking for a long time about how to provide a reasonable 
>storage/index mechanism, and still give the end user interface designer 
>access to the complete the Bible in a variety of XML ways depending on the 
>needs of the application.  There has been previous discussion on this list 
>regarding this, I called it looking at the data in different "slices" and 
>Patrick Durusau called it "concurrent markup" 
>However!!! <light goes on>  I just thought of a great idea today about 
>this (I think, you tell me)....  What if the Bible were stored in 
>compressed and/or indexed form on disk, yet "virtually" 
>available/queryable as a large repetitious XML type object, from which you 
>could extract just the portion/format you need, with say, an XPath or 
>XQuery statement.
>What I mean is that, suppose the Bible were stored in a binary/text 
>compressed and/or indexed format, but available for query _as_if_ it were 
>in this kind of format:
><version name="kjv">
>   <book name="genesis">
>     <chapter>
>       <verse><paragraphmarker/>contents of verse 1</verse>
>       <verse>contents of verse 2</verse>
>       <verse><paragraphmarker/>etc</verse>
>        ...
>     </chapter>
>     ...
>     <paragraph><chaptermarker/><versemarker/>contents of verse 
> 1<versemarker/>contents of verse 2</paragraph>
>     <paragraph><versemarker/>etc</paragraph>
>     ...
>   </book>
>(Notice I didn't put paragraphs inside chapters because in fact paragraphs 
>can occasionally straddle chapter boundaries.)
>You can see I'm proposing that the entire thing be duplicated 2 times for 
>the simple example above, but it only has to be "vitrually" duplicated, 
>not actually recorded twice anywhere on disk nor in memory.  It allows you 
>to specify an XPath of 
>"/version[@name='kjv']/book[@name='genesis']/chapter/verse" to grab the 
>contents of all the verses in genesis in a verse-by-verse fashion with 
>paragraph markers, but 
>"/version[@name='kjv']/book[@name='genesis']/paragraph" to grab the same 
>contents in a paragraph-by-paragraph fashion with chapter/verse 
>markers.  It's great because a properly extended thing like this could 
>allow you to query the Bible and get your results in many different 
>chapter/verse/paragraph/sentence/word/etc forms!
>This would mean that we'd have to glue an XPath or XQuery parser into our 
>data store in a way it probably wasn't originally designed, so that we can 
>interpret the query first and then reconstitute the requested XML from our 
>data store without doing the entire extended duplicated XML tree.  But 
>it's certainly possible, and more and more of this kind of stuff is 
>getting more modularized like this so it can only get easier to do in 
>time... perhaps someone else has even already thought of and done stuff 
>like this.  Anyone know of any?
>thoughts?  comments?