[sword-devel] usfm2osis.pl

Peter von Kaehne refdoc at gmx.net
Mon Jul 9 03:49:24 MST 2012

> Von: David Haslam <dfhmch at googlemail.com>

> Hi Greg,
> One could just run a search and replace script or macro to convert the
> double angles in a copy of the SFM files to proper Unicode characters: «
> and
> ».

Yes, one can do that and Greg knows that. The question was not how to fix it, but whether to fix it in usfm2osis.pl or do the fix externally from that script.

Chris's view was to not fix it in usfm2osis.pl as it would cut out a valuable angle for catching errors in the USFM/SFM. Also, specifically, correcting certain errors solely from a point of XML compatibility, without asking a question "why is this there?" is no good.

I would agree with that. Both the decision and the reasons.

When I created title_cleanup.pl and xreffix.pl I did think - shortly about adding these fixes to usfm2osis.pl.

In the end I decided not to for following reasons:

1) usfm2osis.pl creates a working (not by necessity valid!) OSIS file on even a relatively dirty USFM file  - which is important to casual module makers for own use. E.g. translators who want to play with a phone/PC module while working on the translation. We know from contact with bibledit and Paratext users that this is a important use case. 

For all its shortcomings, osis2mod accepts remarkably bad input, so for creating a casual module, xml validation is not a necessity. Further, xml validation has arcane error messages (as Greg will confirm from a very recent private email to me and David), so if simply having a module roughly working is the target, we should not set up obstacles. 

2) usfm2osis.pl has no real dependencies - apart from Perl. No libsword-bindings compiling, no CPAN search, no repo search for the ubuntu names of CPAN modules. Again this is important to module makers of a less stringent bend than us.


More information about the sword-devel mailing list