[sword-devel] virtual modules

DM Smith dmsmith555 at yahoo.com
Sat Jan 21 15:22:54 MST 2006


Chris Little wrote:
>
> Troy made a comment to me when we were in Philadelphia for the last 
> OSIS conference about Sword (the library) nearing a feature-complete 
> state, where we've pretty much got the capability to do all the basic 
> stuff that anyone else is doing. Going forward, most of the work in 
> Sword (ignoring new module acquisitions/licensing and frontend work) 
> is going to be in the area of doing NEW things like this with our 
> existing data.
>
I think that there is more that can be done in the API.

One of the things I am planning to work on in JSword is the ability to 
work with OSIS directly. As I studied the various Sword Modules, they 
consist of a representation of the text and various indexes for the sake 
of performance into that representation. (Yes, a gross simplification!)

The indexes are a must. Performance would be horrible otherwise.

I think they could be created quite quickly using various XML parsing 
techniques, e.g. xml pull parser. Rather than creating a custom index, 
I'm thinking of creating a lucene index keyed on osisID, storing with 
that the start and length of the text in the original document. Also, I 
would like to figure out how to represent additional information when 
such a fragment is not well formed (for example a verse starts in one 
paragraph and ends in another).

Another advantage of such a scheme is that it goes a long way toward 
alternate versification. That is, given a user's input it can be 
converted fairly easily to osisIDs, and these can be used for lookup.

If I understand correctly, the osisIDs are to form a nesting hierarchy. 
If it weren't for the fact that an element with an osisID can start in 
one document element and finish outside of it, I think elements with 
osisIDs could be represented with begin and end tags and not milestoned.
That is,
<tag osisID="y" sID="y">... <tag eID="y">...<tag osisID="w" 
sID="w">...<tag eID="w">
and never
<tag osisID="y" sID="y">...<tag osisID="w" sID="w">... <tag 
eID="y">...<tag eID="w">

If it is truly nesting then it may be fairly straightforward to 
understand a non-bible.

(I am sure that I will find out more as I go along)



More information about the sword-devel mailing list