[sword-devel] StripText() result not converted to UTF-8?
Troy A. Griffitts
scribe at crosswire.org
Sun Feb 18 12:20:37 MST 2007
I believe the filter is wrong. It should return the UTF-8 value. This
is a bug. Anyone want to look through the unicode code chart and recode
all these values?
Sorry for the bug Joachim.
Joachim Ansorg wrote:
> replying to myself.
> I've been wrong in some of my assumptions.
> JFB is ThML. It contains the entity Æ
> StripText() calls the filter ThMLPlain which converts the Æ into 0xC9,
> which is the corresponding cp1252 character code.
> I thought that StripText() would remove all markup and return text in the
> encoding given to EncodingFilterMgr.
> My question:
> Is that right or wrong?
> Some help would be wonderful,
>> I'm just debugging a bug in BibleTime.
>> Our SWMgr is created to output utf8.
>> The module JFB contains the entitiy Æ .
>> When I call StripText() the entitity is converted to the corresponding
>> character in the cp1252 charset, i.e. char with the value 0xC9.
>> I thought that the latin2utf8 filter would convert this plain text to utf8
>> because I told SWMgr to do this for me.
>> Is there a way to set the output encoding for StripText() to be different
>> than the module's encoding?
>> Thanks a lot,
More information about the sword-devel