[sword-devel] language/locale codes

DM Smith dmsmith at crosswire.org
Wed Nov 11 07:46:38 MST 2009


On Nov 11, 2009, at 8:25 AM, Karl Kleinpaste wrote:

> Jonathan Morgan <jonmmorgan at gmail.com> writes:
>> We had a similar problem in BPBible (and I think we would have been
>> using the same files).
> 
> XEmacs identifies it in our ui/languages file as U+00E5, which matches
> what's found in the GNOME character map application when searching it
> for "ring".  I still don't grasp how it could be a problem.

There is a difference between code points and their encoding.

U+00E5 is the unicode code point, not the encoding. In hex the utf-8 encoding would be C3 A5. In ISO-8859-1, it would be E5. 

Not in your case, but in decomposed form in utf-8 would be 61 CC 8A, which is an U+0061U+030A, where U+030A is a "Combining Ring Above".

So I'd suggest looking at a hex dump to see what the encoding is.

Regarding the handling of \u00e5 in vc++, it might be that it is taking that and interpreting it as a cp1252 encoded character. Maybe, one has to tell vc++ that the string is to be understood as a utf-8 string. I've run into something similar in perl and in native2ascii.

In Him,
	DM




More information about the sword-devel mailing list