[sword-devel] Release-critical TODO items (updated mod2osis patch)

Greg Hellings greg.hellings at gmail.com
Mon Apr 27 17:09:02 MST 2009


On Mon, Apr 27, 2009 at 6:48 PM, Jonathan Marsden <jmarsden at fastmail.fm> wrote:
> DM Smith wrote:
>
>> I am. You can get the input text from www.crosswire.org/~dmsmith/kjv2006.
>
> Aha!  Thanks, I'll try it tonight.  BTW, wouldn't putting this URL somewhere
> in the kjv.conf file be both useful and appropriate?
>
>> Note, some of the transformations by osis2mod create a module that does
>> not contain valid OSIS. It is OSIS that SWORD requires. It pertains to the
>> preverse title markup.
>>
>> mod2osis has to undo those transformations at least in part.
>
> Ah, and if it doesn't undo the "stuff colophons into the previous verse"
> thing, then of course the resulting output is technically not valid OSIS.
>  Which I suspect is exactly the issue I am seeing.

You are correct.  It took me some time to figure that out about the
mod2osis tool myself.

>
>> That's why Greg has the comparison being a result of:
>> Run mod2osis to get base text
>> Run osis2mod to get a module with the base text
>> Run mod2osis on that module to see it creates the same base text.
>
> While that is an entirely reasonable and useful test, it is testing for
> "round-trip" capability, but it is *not* testing whether the output of
> mod2osis is actually OSIS.  Gregs test and mine are therefore complementary.

And one I wouldn't have thought of. ;) My intention had been to update
mod2osis so it worked for the round-trip.  My thinking was if a user
(of mod2osis, so really a module developer) sees a module with a
feature he/she wants, they could use mod2osis to pop out OSIS-like
output that SWORD could handle so they could replicate it.  Not
necessarily a method of creating valid OSIS.  Silly me, should have
thought to actually validate it.

>
> Overall, I'm not sure it is appropriate to be strongly pushing for the use
> of an officially defined standard (such as OSIS), and simultaneously to
> release software tools that generate "something rather like OSIS" which do
> not say anything about this "something rather like" issue in their
> documentation :)  My recommendation would be that CrossWire should either
> support the OSIS standard, or else clearly document their deviation from it.
>
> Longer term, this need for strange transformations looks to me like a
> problem that stems from an inadequate or incomplete underlying book
> representation in SWORD itself?  That may be something for SWORD 2.x, not
> 1.6 :)

I'm an advocate of this - but there is strong feeling among some
developers that we never want to break backwards compatibility with
installed modules.  Thus, the push to allow for, e.g., interverse
content in the actual module (and also OSIS header information, etc)
may never be realized.

>
>>> Is this just an outdated wiki page leading me astray, and if so, where
>>> can I find osisCore.2.5.xsd ?
>
>> The latest is 2.1.1. There is no such thing as 2.5.
>
> Hmmm, then we need to find out why mod2osis says that is what its output
> validates against.  I think it is just hardcoded into the mod2osis.cpp
> source.  If there is no 2.5, then that is a (trivial to fix) mod2osis bug.
>
>> A colophon is something that comes at the end of a book.
>
> Indeed :)
>
>> The input has:
>> ...
>> <verse>verse text</verse>
>> <div type="colophon">colophon text</div>
>> </chapter>
>
>> The module only stores verses so the colophon is appended to the last
>> verse.
>
> So this non-standard-ness of the colophon being inside the last verse is
> SWORD-created -- osis2mod-created, to be specific!  OK.  Light dawns...
> thanks.
>
> In that case, mod2osis needs to know about that SWORD-specific
> transformation done by osis2mod (and any others!), and "undo" it (or them),
> I would think.  Based on my very simple (and perhaps simplistic?) test, at
> the moment mod2osis does not seem to be doing that reverse transformation of
> colophons successfully.  And (IMO) that is a bug, because it means mod2osis
> does not generate OSIS standard output, just "something rather like" OSIS.

In the case of the <div type="colophon">...</div> it really shouldn't
be terribly hard to figure out.  The only reason I didn't fix it was
the following train of thought (1) I didn't know if a <div
type="colophon"> could contain a different type of <div>.  If it could
then (2) writing a regex to pull out the colophon div would be
non-trivial.  If it cannot then (3) writing a regex would be trivial,
but do we know if the colophon came only at the beginning or at the
end of the verse or maybe between chapters or even between books?

In short, I didn't know enough about the <div type="colophon"> object
to make a judgment call at that time.  I intended to research the OSIS
standard and SWORD's best-practice feelings on the matter and make a
further adjustment at a later time, if one could be made.  I just
haven't revisited mod2osis in a few months.  The round-trip, though
not valid OSIS, works.  osis2mod handles the mis-placed colophon fine,
it just warns about its improper location.  I figured this was "good
enough" for now.

>
> I don't yet know how easy this one will be to fix.  Probably just another
> state variable set when a colophon is encountered.  Then mod2osis outputs
> </verse>\n and then the colophon, and then the code that outputs the end of
> verse trailer tests the colophon state variable, and does not emit </verse>,
> but instead resets the "colophon" state, if the "colophon" state is true.
>  Mildly ugly, but probably not hard to code and test?  I'll give it a try if
> I have a chance tonight.

I'd love to collaborate with you on getting mod2osis to work.  After
fixing its OSIS validation issues, I'd like to expand the *OSIS
filters in the engine so it works with other non-OSIS original
documents.  My thought there was that if SWORD really is pushing
toward all OSIS in our modules, then having mos2osis produce valid,
round-trip equivalence for automated updating of older non-OSIS
modules would be a step toward easing that goal.

My second goal is to expand it to properly handle commentaries.  I
figure they are very close in actual internal construction to Bibles
enough that the jump should not be difficult and would be a natural
stepping stone toward my ultimate goal:

Finally, thirdly, I want mod2osis to produce well-behaved OSIS output
for GenBook modules.  For the same reasons as above: they would be
invaluable to a module creator without CrossWire having to setup a
hosting site for all sorts of sample OSIS files.  It would help to
convert old modules into OSIS, as well, if that's an objective, and it
would just be good for our tools to work as advertised.

I still have not heard one word, though, from the keepers of the code
on whether this is worthy of pursuing and whether they want it
admitted to the repository or not.  That's more or less why I got
bogged down after producing round-trip OSIS for Bibles.  I heard
nothing on the list from either those who were asking for OSIS
examples, those who were always pointing out that we only have one
example available, or those who are in charge of keeping track of the
code.  Since then I've had one or two people contact me privately to
ask for the patches so they could learn from the output, but that's
it.

So - Troy, Chris, DM - you three are the main banner holders, from
what I've seen, for the code, OSIS support and the transpose of this
tool.  Is this something you guys want to see and are willing to
accept patches for, or is it something you don't want to be bothered
about?

--Greg

>
> Jonathan
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>



More information about the sword-devel mailing list