[sword-devel] USFM character encodings

Greg Hellings greg.hellings at gmail.com
Thu Jul 26 06:45:20 MST 2012


On Thu, Jul 26, 2012 at 6:49 AM, Chris Little <chrislit at crosswire.org> wrote:
> Has anyone ever used the -e switch of usfm2osis.pl to do character encoding
> conversion on USFM docs as they're being converted to OSIS?
>
> I'm doing the Python rewrite of usfm2osis and wondered whether I can safely
> dump this functionality. It shouldn't be difficult to implement, but it the
> usage statement would be much cleaner without it. Personally, I would likely
> use uconv to change encoding as a preprocessing step, but if anyone actually
> desires to keep this in the markup conversion script, I'll include it.

In Python it should be as trivial as a single line of code - or
possibly two, no?

utf_input = input.decode(src_encoding)

Possibly followed by

enc_output = utf_input.encode(destination_encoding)

Provided the source encoding is known to Python it ought to be
straightforward for the conversion. It is also possible to allow the
user to specify a manual encoding conversion routine if the source
encoding is unknown to Python. I've had to do this before when working
with files that use very strange or custom encodings.

Probably more work than is helpful if no one is using it, but it's
worth keeping in consideration.

--Greg

>
> --Chris
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page



More information about the sword-devel mailing list