[sword-devel] usfm2osis.py

Chris Little chrislit at crosswire.org
Sun Sep 30 15:18:37 MST 2012


On 09/26/2012 03:15 PM, Greg Hellings wrote:
> Chris,
>
> I just tried to switch over to using usfm2osis.py and there are two
> minor issues:
>
> 1) The script is giving me an output language on the container tag of
> xml:lang="und". This should read xml:lang="tke" but I don't know if
> it's possible to determine that. I'd like to be able to set that as a
> command-line option if possible.

I've added a feature request: 
http://www.crosswire.org/tracker/browse/MODTOOLS-36

Language detection is probably impossible with USFM since there is no 
standard place to encode the language and no standard set of language 
tags in use for USFM. Adding the ability to set the code from a command 
line argument will be easy enough, but until then 'und' seemed more 
appropriate than the old practice of just using 'en' for every module.

> 2) I'm getting a whole bunch of XML validation problems triggered by
> <item type="x-indent-^A" subType="x-introduction">...
>
> The ^A character (Displayed on the command line as [0d/01] missing
> character box) is apparently invalid XML in that spot and causes the
> file to be all garbled from later XML parsing documents. I'm not sure
> how it's getting in there other than from seeing the instances of
> x-indent in the script. I think you still have access to the files I'm
> using if you want to see for yourself where it's coming from.

This should be fixed now. I haven't tested it, but it was fairly clear 
that it was just a missing r to make strings raw strings for a regex 
replace.

--Chris




More information about the sword-devel mailing list