[sword-devel] Validating ThML and OSIS modules

DM Smith dmsmith555 at yahoo.com
Tue Jan 6 07:18:31 MST 2009

On Jan 6, 2009, at 3:14 AM, Jonathan Morgan wrote:

> I feel that, rather than useless debates about whether certain
> people's modules are valid ThML or whether valid OSIS has been
> submitted to CrossWire or not, we should augment osis2mod, imp2mod and
> friends to use a validating XML parser.  That way we could be quite
> sure that anything that has been created is valid.
> I'm sure this has been discussed and rejected in past conversations.
> From memory, the arguments were along the lines of: Everyone should be
> validating before creating a module, so it will take extra time
> validating when creating a module.
> To be blunt, this argument doesn't make a lot of sense to me.  If you
> really care about validity in your modules, the only logical thing to
> do about it is to prevent invalid modules from being created (or at
> least make it a bit harder - I can create modules directly from C++ or
> Python, and they would not be so validated).  It also saves the extra
> validation step if it is done as part of the process (which also saves
> finding an extra validating tool, ...).
> It can be clearly seen that invalid modules do get created by certain
> parties, and that invalid OSIS documents do get sent to crosswire.
> This being acknowledged, surely preventing that is a good thing?  (I
> do assume that anyone sending an OSIS document will have run osis2mod
> on it and checked the module - if they haven't, then they are
> obviously in need of correction).
> There has also been discussion of a magic OSIS uploader tool that will
> do all the work for you, including validation.  This does not exist,
> and still will.  Making osis2mod and friends do the validation
> themselves will probably be easier to change, and it will ensure that
> all modules are properly validated, not just Crosswire modules.

The job of osis2mod, at this point in time, is to take valid OSIS  
modules as input and to normalize that into OSIS that the SWORD engine  
supports. This normalized OSIS is then chunked into introductions and  
verses and stored in a module.

The other job of osis2mod, is to do semantic validation of OSIS  
according to the expectations of the SWORD engine. Valid OSIS can be  
bad OSIS.

I had suggested that it would be easy to replace the home grown parser  
with an industry standard parser that would do validation as it went.  
This was rejected for a variety of reasons. I don't recall them all  
but I seem to remember:
1) What we have works and is simple.
2) What we have is home grown and we can re-license it as we see fit  
without encumbrances. Added 3-rd party software needs to allow for  
this relicensing. That is, it can't be GPL or GPL like.
3) The addition of 3-rd party software needs to be pluggable (or not  
be required for the basic function of the software), not create a  
dependency on the rest of the SWORD library, be light-weight in size  
and performance, work on all platforms that osis2mod is used, .....

I think that Xerces-C is a good fit. With a pluggable model, a user  
could supply one of their choice.

I'd do it, but I have too many other things on my plate. One of which  
is a major change to osis2mod's handling of inter-verse material.

In Him,

More information about the sword-devel mailing list