[sword-devel] Fix for & (Re: Updating Clarke commentary to become readable)

DM Smith dmsmith555 at yahoo.com
Wed Sep 27 19:19:21 MST 2006

On Sep 27, 2006, at 12:24 PM, Karl Kleinpaste wrote:

> That is, the reason & wasn't being properly handled is because [a]
> all those EscapeSequences in thmlhtml.cpp being commented out lead to
> handleEscapeString() returning false -- no substitutions exist -- and
> so [b] because passThruUnknownEsc is false (see ctor), all &symbols;
> are dropped.  The code was actually willfully eliminating every
> possible such &symbol;.  Turning on passThruUnknownEsc lets them go by
> unmolested.

I think the intention of the code was to let known entities pass  

> What I don't know is if this should be considered a correct fix,
> rather than just one that makes it work for me, a GS user.  That is,
> why would it ever be desirable _not_ to pass a &symbol; just because
> it's not known to the particular substitution set coded?  Hence, I
> *think* it's correct just to turn on passThruUnknownEsc globally, but
> I'm not positive.

I think that you found that the code expected to strip out unknown  
entities. Entities that are not handled via html should not be passed  
through. So, if there were an entity &disclaimer; for example, it  
should be stripped. When the block of addEscapeStringSubstitute was  
commented out, it changed the behavior.

I think the correct fix is to have something *like* the following:

bool ThMLHTML::substituteEscapeString(SWBuf &buf, const char  
*escString) {
	DualStringMap::iterator it;

	if (!escStringCaseSensitive) {
	        char *tmp = 0;
		stdstr(&tmp, escString);
		it = p->escSubMap.find(tmp);
		delete [] tmp;
	} else
	it = p->escSubMap.find(escString);

	if (it != p->escSubMap.end()) {
         // This is the one line that changes
         // It probably should get the declared escapeStart and  
		buf += '&' + escString + ';';
		return true;
	return false;

And then uncomment the section in ThMLHTML ctor that declares the  
entity replacements. Note, the famous 4 entities &, ", >  
and <) should not be replaced, but should be passed. And IE does  
not handle ' in xhtml (don't know about html) so it should be  
replaced. I don't know that all browsers (used by Sword applications)  
can handle Latin-1 entities. Recent ones do. The lowest common  
denominator would be to replace them with Latin-1 or UTF-8, depending  
on the encoding.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.crosswire.org/pipermail/sword-devel/attachments/20060927/8765e83f/attachment.html 

More information about the sword-devel mailing list