[sword-devel] Fix for &

DM Smith dmsmith555 at yahoo.com
Thu Sep 28 05:06:18 MST 2006

On Sep 28, 2006, at 12:32 AM, Karl Kleinpaste wrote:

> DM Smith <dmsmith555 at yahoo.com> writes:
>> Entities that are not handled via html should not be passed
>> through. So, if there were an entity &disclaimer; for example, it
>> should be stripped.
> I believe I disagree.  If some &symbol; is unknown to Sword (say,
> because some new HTML standard has come along, already implemented in
> GtkHTML [which GS uses], so that a Sword module is produced which
> contains it, yet Sword itself has not been updated to recognize it),
> why shouldn't Sword simply pass it through?  The fact that Sword
> doesn't know about &disclaimer; is no guarantee that both the module
> author and the end-line HTML renderer can't be perfectly happy with
> it -- Sword may quite possibly be behind the curve.

The behavior of entities are well defined in xml. If the entity does  
not have a definition in the DTD it is an error. More interestingly  
(at least to me), schemas, with which OSIS is defined, do not support  
the definition of entities. The famous 4 are predefined.

One of the fundamental uses of entities in writing a DTD is that of a  
non-parameterized, conditional macro. When an entity is expanded, it  
is recursively processed for entities. There are two forms of  
entities: &entity; and %entity; One of the common scenarios is to use  
% entities to modularize a DTD into separate files that are included.  
There is also a mechanism to allow for a document to override any  

Given this, without processing a DTD for a document for all entities  
via a robust entity resolver, it is impossible to know what  
&disclaimer; resolves to.

ThML, as a Voyager superset, supports 3 sets of entities Latin-1,  
symbol and special. The famous 4 are in special and &apos; is not  
defined anywhere. With the inertia of Microsoft's Internet Explorer,  
I don't expect any changes in this arena.

For details see: http://www.w3.org/TR/1998/WD-html-in-xml-19981205/ 

Of these, Sword's ThML filters handle/support Latin-1 and the famous  
4. (You did find a bug here)

> And in fact, it surely is, in a few small areas.  For example,
> WinSword/BibleCS doesn't implement <u> or <font color=...>, though it
> implements <b> and <i>.  Conversely, GtkHTML implements <u> and <font
> color=...> but does not have support for <sup>.  So pass the source
> material and let the renderer take its best shot.

I think it is important that we have some guarantee of well-defined  
XML in SWORD. XML states that undefined entities are an error that  
produce a hard stop. A system that uses the SWORD engine and uses an  
xml parser should have a reasonable expectation that the text it is  
given will not cause it to abend. 

More information about the sword-devel mailing list