[sword-devel] usfm2osis.py

Sun Aug 5 19:40:16 MST 2012

On 06/08/12 14:20, Chris Little wrote:
> Linux packagers apparently go the UCS-4 route, so I didn't notice any 
> issue with using the Language Tags. But trying the above on Windows 
> shows that the cygwin build and the builds from python.org (2.7 & 3.2) 
> all use UCS-2. So my script won't work correctly on Windows.
>
> Not to worry, though. I'll just replace the Language Tags with 
> Noncharacters in the range u+FDD0-u+FDEF. They're UCS-2-safe since 
> they're BMP codepoints and they're specifically designated as 
> "intended for process-internal uses, but are not permitted for 
> interchange." So in the unlikely event that they appear in input, it's 
> the fault of the USFM-encoder if anything goes awry.
>
> We'll have to watch for input outside of the BMP on UCS-2 Python, 
> though, as that could cause problems.
I guess I'm quite surprised that you wrote a new Python program using 
Python2 when its development is basically coming to an end (and the next 
Ubuntu will no longer have it installed by default). I also wonder if 
Python3 would handle Unicode better.

(I've been writing all new code in Python3 for the last couple of years 
now.)

Robert.