[sword-devel] Osis2mod transformations was Re: Alternate Versification

DM Smith dmsmith at crosswire.org
Mon Mar 16 10:50:02 MST 2009


Greg Hellings wrote:
> DM,
>
> On Mon, Mar 16, 2009 at 12:03 PM, DM Smith <dmsmith at crosswire.org> wrote:
>   
>> I've been looking at the code regarding Alternate Versification (aka av11n
>> and v11n; I've seen these abbreviations by Troy, Chris and others).
>>
>> It looks solid. The purpose of this note is to give it a big thumbs up.
>>
>> Basically here is what I see: (Chris, Troy, correct me where I am off base!
>> Please!)
>> Today (1.5.11 and earlier) speed is a major consideration and canon.h
>> provides for that. The core functionality of looking up a verse or intro is
>> to convert a verse key into an offset in the module's index. Without going
>> into it in great detail, the module, testament, book and chapter
>> introductions are addressable in the index, as well as each verse.
>>
>> In 1.5.12, canon.h no longer includes a fast lookup for this. Instead it
>> includes the KJV versification: books by name, number of chapters and number
>> of verses per chapter. The new VerseMgr takes this and dynamically builds
>> the old lookup table, hiding it behind it's API. The performance hit is
>> taken once each time the program is run for each versification scheme that
>> is requested.
>>
>> Chris has taken the CCEL versifications and wrote a perl program that uses
>> them as input to generate the same structure for each versification.
>>
>> Currently, the VerseMgr does not know about the different V11Ns. It looks
>> like that is all that is left for it.
>>
>> If I am understanding this correctly, this leads me to believe that GenBooks
>> are not going to be used, but rather regular Bible modules. If this is true,
>> it is a boon to commentaries as well, as commentaries are structured
>> internally as Bibles. And it gives us compressed modules. And it gives us
>> the speed of the Bible module (GenBook is very slow in comparison.)
>>
>> I had been concerned with GenBooks being used as osis2mod does
>> transformations and the gen book importer did not.
>>     
>
> Has there been any moves to reduce the amount of transformations
> osis2mod performs, so that the stored format is even closer to the
> import format (preferably lossless)?
The answer is yes and no.

The goal of the transformation is:
    To allow for any valid, well written OSIS input.

There are a couple of purposes to the transformations:
1) To position interverse material (currently headings) either as 
introduction to a book or chapter or appending to the prior verse or 
creating "pre-verse" heading for the following verse.
This is lossy. We have discussed, partially agreed upon a loss-less 
transformation. We have yet to implement it. (I think this should be 
part of the 1.5.12 release.)

2) To transform Book/Chapter/Section/Paragraph OSIS (which is best for 
OSIS authors) into BCV OSIS, which is best for applications.
    While this is lossless, it is not reversible.

3) To handle the Words of Christ in a way that works for a verse in 
isolation and also for the OSIS writer.
    (Verses are isolated in search results, in table cells used for 
parallel views, etc.)
    This is lossless, but not easily reversible.
>   What prevents all Bible modules
> from being stored in an inherently OSIS format internally with the
> indexes created by the engine simply leveraging certain points in an
> OSIS file, rather than in a separate binary format?  It seems that
> would be the most lossless format available, but I'm curious as to
> what technical issues might prevent that from being the most desirable
> method?
There are a variety of reasons that it would not work. Here are some.
1) The SWORD and JSword engines cannot handle all possible OSIS inputs 
without major changes.
2) For any verse or passage it might not be a well-formed XML fragment. 
Without the complete fragment, a compliant XML parser cannot be used. A 
tolerant parser has to make guesses, which might be wrong. We would need 
to expand the reference in order to get a well-formed fragment. This 
might be computationally expensive. Pre-computing the well-formed 
context of a verse is a possibility.
3) Well-formed fragments might not have sufficient context to display 
properly. For example, Matt 6 is the middle of the Sermon on the Mount 
but just reading the OSIS markup for that chapter might not make it obvious.



More information about the sword-devel mailing list