[sword-devel] Fix for &

DM Smith dmsmith555 at yahoo.com
Thu Sep 28 05:27:40 MST 2006

On Sep 28, 2006, at 1:27 AM, Chris Little wrote:

> What is more, there is absolutely no necessity to use any entity other
> than & and < in Sword. Entities other than the XML set (&,
> ", ', <, >) are not supported at all in Sword and  
> should
> not be used. There is no good reason to do so.

Just a minor quibble:
" is necessary within an attribute value, unless the attribute  
is quoted with '. Because SWORD programmatically generates xml and  
always uses " to quote attributes, it is necessary.
&gt; is necessary in a few instances, e.g. <[CDATA[....]]> and <?....? 
 >. But I don't think they will occur in SWORD module.
&apos; is not defined in HTML4.0.1 in the Voyager DTD. This implies  
that attribute values are quoted with " when containing '.

Also, thmlrtf.cpp, thmlhtml.cpp, thmlplain.cpp and a few others have  
explicit support for latin-1 entities and the 5 predefined. So SWORD  
supports them in ThML. Given SWORD's history of backward  
compatibility, I don't see this going away.

> Any other character should be encoded as UTF-8, not named entities.

I think this is a best placed in the module creation code, either as  
a hard stop or as a conversion. The problem with automatically  
converting them to unicode is that the module might be latin-1  
otherwise and that would be bad. I also note that SWORD does not  
support entities of the form &#xxx; which are allowed in a Latin-1  

In His Service,

More information about the sword-devel mailing list