[sword-devel] StripText() result not converted to UTF-8?
nospam+sword-devel at joachim-ansorg.de
Sun Feb 18 12:26:19 MST 2007
thanks for the reply.
The latin2utf8 does already the conversion.
Maybe it's possible to let the thml2plain output cp1252 by default and somehow
plug in the latin2utf8 filter if the utf8 charset has been chosen with the
I guess cp1252 should remain to be the default output of the thml2plain
filter, shouldn't it?
> I believe the filter is wrong. It should return the UTF-8 value. This
> is a bug. Anyone want to look through the unicode code chart and recode
> all these values?
> Sorry for the bug Joachim.
> Joachim Ansorg wrote:
> > Hi,
> > replying to myself.
> > I've been wrong in some of my assumptions.
> > JFB is ThML. It contains the entity Æ
> > StripText() calls the filter ThMLPlain which converts the Æ into
> > 0xC9, which is the corresponding cp1252 character code.
> > I thought that StripText() would remove all markup and return text in the
> > encoding given to EncodingFilterMgr.
> > My question:
> > Is that right or wrong?
> > Some help would be wonderful,
> > Joachim
> >> Hi,
> >> I'm just debugging a bug in BibleTime.
> >> Our SWMgr is created to output utf8.
> >> The module JFB contains the entitiy Æ .
> >> When I call StripText() the entitity is converted to the corresponding
> >> character in the cp1252 charset, i.e. char with the value 0xC9.
> >> I thought that the latin2utf8 filter would convert this plain text to
> >> utf8 because I told SWMgr to do this for me.
> >> Is there a way to set the output encoding for StripText() to be
> >> different than the module's encoding?
> >> Thanks a lot,
> >> Joachim
> sword-devel mailing list: sword-devel at crosswire.org
> Instructions to unsubscribe/change your settings at above page
<>< Re: deemed
More information about the sword-devel