[sword-devel] language/locale codes
Troy A. Griffitts
scribe at crosswire.org
Wed Nov 11 11:55:53 MST 2009
Just as a side note to this discussion, we've recently added in utilstr.h:
SWBuf assureValidUTF8(const char *buf);
It would be interesting to pass "Bokmål" to this method and see if it
returns the same data. There is a test program you can try under in the
source located at:
DM Smith wrote:
> On Nov 11, 2009, at 9:59 AM, Karl Kleinpaste wrote:
>> DM Smith <dmsmith at crosswire.org> writes:
>>> U+00E5 is the unicode code point, not the encoding. In hex the utf-8
>>> encoding would be C3 A5. In ISO-8859-1, it would be E5.
>> XEmacs tells me that the buffer is UTF-8. Manually re-asserting it...
>> M-x set-buffer-file-coding-system RET utf-8 RET
>> ...and re-saving the file makes no change to the content, yet that's
>> exactly the mechanism I've used in the past to convert ISO-8859 to UTF-8.
>>> So I'd suggest looking at a hex dump to see what the encoding is.
>> BTDT. "od -c" of this...
>> # correct: Norwegian Bokmål
>> #nb Norsk Bokmål
>> # a hack while g_utf8_validate() dislikes 'å': Norwegian Bokmaal
>> nb Norsk Bokmaal
>> ...produces this...
>> 0007300 o e r o \n # c o r r e c t :
>> 0007320 N o r w e g i a n B o k m 303 245
>> 0007340 l \n # n b \t N o r s k B o k m
>> 0007360 303 245 l \n # a h a c k w h i
>> 0007400 l e g _ u t f 8 _ v a l i d a
>> 0007420 t e ( ) d i s l i k e s ' 303
>> 0007440 245 ' : N o r w e g i a n B o
>> 0007460 k m a a l \n n b \t N o r s k B
>> For a-ring, the character map application observes...
>> C octal escaped UTF-8: \303\245
>> ...so I'm pretty well convinced that the content is right.
> You've convinced me. I'm curious as to whether this is a reported GTK bug?
> I'm also curious as to whether it handles the decomposed form. The following is \141\314\212:
> In Him,
> sword-devel mailing list: sword-devel at crosswire.org
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel