dmsmith555 at yahoo.com
Wed Apr 25 19:14:50 MST 2007
For the record, I think,
It is the responsibility of the module developer to ensure that the
input to osis2mod is valid. Since there have been several versions of
the OSIS spec (currently at 2.1.1) it might be a reasonable question
as to which the minimum version we would accept. I'd go with 2.0 or
later. As long as Chris is the "pumpkin holder" of module creation,
it is not a big deal. But without validation being done by osis2mod,
there is no way to ensure this.
Even with xml validation, it is very possible that an OSIS document
is not valid OSIS. Part of this is due to the milestoneability of
some elements, but no schema imparts semantics. So while schema
validation is important, it is not sufficient. Osis2mod needs to
ensure that the OSIS is sufficiently valid for the current front-ends.
Osis2mod modifies the input into a form that is acceptable to a Sword
module. Thus the round trip from input to osis2mod and out again,
will not match the original. For example, a module is verse based, so
intra-verse material needs to be pre-appended or appended to a verse.
Currently osis2mod attempts to check two things:
1) that the document is well formed (this is far from a validity
check). This was an error, causing the program to exit.
2) that each verse is well formed. This is a warning.
However, my suggestion that osis2mod use a real parser, was not
predicated on the need for validation. But rather the need to support
all well-formed inputs.
Perhaps, I am biased by Java, but I think it can be done without
impacting program size significantly. In Java, the xml parser is an
implementation of an interface. At runtime it is possible to specify
an available implementation. I think that if we were to do something
similar in C++, perhaps choosing a SAX interface, we could wrap
XMLTag by it. And then one could link in either Xerces, Sword, or
some other implementation. Then the size/performance cost would be
appropriate for the use.
As for validation, one could have an external validator called by
fork/exec on the input file. This would not increase the program size
On Apr 25, 2007, at 6:47 PM, Chris Little wrote:
> DM and I have been chatting a bit off-list about the future/
> function of
> osis2mod and I thought maybe we should open up the discussion a bit.
> Right now osis2mod (the tool for converting OSIS Bibles to Sword Bible
> modules) does some mediocre validity checking as it builds its Sword
> database. We'll never really get it perfect this way since we aren't
> doing real schema validation.
> DM has suggested adding a real validating parser to osis2mod (by
> embedding something like xerces or libxml), so it could spit out an
> error message if you try to import invalid OSIS.
> I'm not totally convinced we should do that. When I prepare modules
> OSIS docs, I always perform validation in an external validator.
> (Personally I use Oxygen, but there are also XML Spy, MSV, topologi,
> Xerces, etc.)
> Do people feel that incorporating a real validator would make osis2mod
> easier to use?
> It could potentially cause the filesize to jump dramatically, so would
> that be acceptable?
> If we incorporate osis2mod into either front-ends or installmgr so
> users could import OSIS documents directly into Sword, would that
> support or detract from the case for embedding a full validator?
> sword-devel mailing list: sword-devel at crosswire.org
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel