[sword-devel] usfm2osis.pl

Greg Hellings greg.hellings at gmail.com
Sun Jul 8 22:43:59 MST 2012


Guys,

Was just running usfm2osis.pl across some files that my Aunt and Uncle
have given me to convert for the language they're working with through
Wycliffe. It ran great, saw no problems with it. When I tried to run
title_cleanup.pl across the output it revealed a minor issue... the
language they have used appears to use the "French style" of quotation
mark, but it is marked up in the SFM text as "<<" and ">>". A pair of
ASCII angle characters. This causes title_cleanup.pl, which is
expecting good XML, to puke on parsing the file. Of course, it would
also cause osis2mod to puke when I get to that stage.

Obviously this is an encoding issue in the source file, but I thought
I should mention that this is also a bug/shortcoming of usfm2osis.pl.
If it is supposed to be outputting well-formed XML then it should
encode the plain text to escape such characters with their proper XML
entity representations. Is there anyone who wants to look into that,
or do I need to roll up my Perl sleeves and get dirty?

--Greg



More information about the sword-devel mailing list