[sword-devel] Converting RTF \'XX to UTF-8

Chris Little chrislit at crosswire.org
Sun Jun 22 12:31:24 MST 2008


Aha! An example goes a long way, so now I understand the real problem.

You just need to change the codepage. cp1252 is the Windows equivalent 
of ISO 8859-1. Since you want Greek, you need the ISO 8859-7 equivalent, 
which would be cp1253, thus:

perl -CO -pe 'use Encode; s/\\\'([0-9a-fA-F]{2})/decode("cp1253", 
chr(hex($1)))/eg'

If you don't have cp1253 as an available encoding in perl, just skip the 
decode part, convert the \'XX to chars and use iconv to convert:

perl -CO -pe 's/\\\'([0-9a-fA-F]{2})/chr(hex($1))/eg'

And to manage the codepoints not in cp1253, you can do a separate pass:

perl -CO -pe 's/\\u(\d{1,5})./pack("U", $1)/eg'

(I haven't tested that, so it might be a little off, but it should point 
you in the right direction.)

--Chris


Karl Kleinpaste wrote:
> I've got an RTF document which contains this kind of encoding:
> 
>  \cf2 \'c3\'e5\u769?\'ed\'e5\'f3\'e9\'f2\cf0
> 
> That renders the word "Genesis" in the Greek, i.e. \'c3\'e5 is the
> capital gamma.  As seen in another app which uses this RTF natively:
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 
> ------------------------------------------------------------------------
> 
> 
> I need to find a scriptable way to convert this kind of encoding to
> UTF-8.  I've tried a few things (and Chris has offered a couple more
> variants) of this general flavor:
> 
> perl -CO -pe 'use Encode; s/\\\'([0-9a-fA-F]{2})/decode("cp1252", chr(hex($1)))/eg'
> 
> But at best I seem to get what amounts to a format-shifted identity
> function (\'ab becomes an actual 0xAB byte) which does me no good.
> 
> Any ideas?
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page



More information about the sword-devel mailing list