[sword-devel] osis2mod

Wed Apr 25 19:14:50 MST 2007

For the record, I think,

It is the responsibility of the module developer to ensure that the  
input to osis2mod is valid. Since there have been several versions of  
the OSIS spec (currently at 2.1.1) it might be a reasonable question  
as to which the minimum version we would accept. I'd go with 2.0 or  
later. As long as Chris is the "pumpkin holder" of module creation,  
it is not a big deal. But without validation being done by osis2mod,  
there is no way to ensure this.

Even with xml validation, it is very possible that an OSIS document  
is not valid OSIS. Part of this is due to the milestoneability of  
some elements, but no schema imparts semantics. So while schema  
validation is important, it is not sufficient. Osis2mod needs to  
ensure that the OSIS is sufficiently valid for the current front-ends.

Osis2mod modifies the input into a form that is acceptable to a Sword  
module. Thus the round trip from input to osis2mod and out again,  
will not match the original. For example, a module is verse based, so  
intra-verse material needs to be pre-appended or appended to a verse.

Currently osis2mod attempts to check two things:
1) that the document is well formed (this is far from a validity  
check). This was an error, causing the program to exit.
2) that each verse is well formed. This is a warning.

However, my suggestion that osis2mod use a real parser, was not  
predicated on the need for validation. But rather the need to support  
all well-formed inputs.

Perhaps, I am biased by Java, but I think it can be done without  
impacting program size significantly. In Java, the xml parser is an  
implementation of an interface. At runtime it is possible to specify  
an available implementation. I think that if we were to do something  
similar in C++, perhaps choosing a SAX interface, we could wrap  
XMLTag by it. And then one could link in either Xerces, Sword, or  
some other implementation. Then the size/performance cost would be  
appropriate for the use.

As for validation, one could have an external validator called by  
fork/exec on the input file. This would not increase the program size  
significantly.

In Him,
	DM

On Apr 25, 2007, at 6:47 PM, Chris Little wrote:

> DM and I have been chatting a bit off-list about the future/ 
> function of
> osis2mod and I thought maybe we should open up the discussion a bit.
>
> Right now osis2mod (the tool for converting OSIS Bibles to Sword Bible
> modules) does some mediocre validity checking as it builds its Sword
> database. We'll never really get it perfect this way since we aren't
> doing real schema validation.
>
> DM has suggested adding a real validating parser to osis2mod (by
> embedding something like xerces or libxml), so it could spit out an
> error message if you try to import invalid OSIS.
>
> I'm not totally convinced we should do that. When I prepare modules  
> from
> OSIS docs, I always perform validation in an external validator.
> (Personally I use Oxygen, but there are also XML Spy, MSV, topologi,
> Xerces, etc.)
>
> Do people feel that incorporating a real validator would make osis2mod
> easier to use?
>
> It could potentially cause the filesize to jump dramatically, so would
> that be acceptable?
>
> If we incorporate osis2mod into either front-ends or installmgr so  
> that
> users could import OSIS documents directly into Sword, would that
> support or detract from the case for embedding a full validator?
>
> --Chris
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page