hunt.robertj at gmail.com
Sun Aug 5 19:40:16 MST 2012
On 06/08/12 14:20, Chris Little wrote:
> Linux packagers apparently go the UCS-4 route, so I didn't notice any
> issue with using the Language Tags. But trying the above on Windows
> shows that the cygwin build and the builds from python.org (2.7 & 3.2)
> all use UCS-2. So my script won't work correctly on Windows.
> Not to worry, though. I'll just replace the Language Tags with
> Noncharacters in the range u+FDD0-u+FDEF. They're UCS-2-safe since
> they're BMP codepoints and they're specifically designated as
> "intended for process-internal uses, but are not permitted for
> interchange." So in the unlikely event that they appear in input, it's
> the fault of the USFM-encoder if anything goes awry.
> We'll have to watch for input outside of the BMP on UCS-2 Python,
> though, as that could cause problems.
I guess I'm quite surprised that you wrote a new Python program using
Python2 when its development is basically coming to an end (and the next
Ubuntu will no longer have it installed by default). I also wonder if
Python3 would handle Unicode better.
(I've been writing all new code in Python3 for the last couple of years
More information about the sword-devel