[sword-devel] Valid vs Best Practice XML

Chris Little chrislit at crosswire.org
Sat Sep 15 15:24:36 MST 2012


On 09/15/2012 09:56 AM, Greg Hellings wrote:
> To emphasize that we have an issue here, in the SWORD filters, here is
> the output from diatheke with HTML, HTMLHREF and XHTML (which support
> I just hacked in now in order to test).
>
> greg at Gateway08:~/Source/sword/build (master)$ !diath
> diatheke -b TKE -o h -f HTMLHREF -k Gen 1:2
> Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
> ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
> waviravira vadhulu va mahinje, osasanyedhelaga.  <!/P><br />
> (TKE)
> greg at Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
> HTML -k Gen 1:2
> <meta http-equiv="content-type" content="text/html;
> charset=UTF-8">Genesis 1:2: Elaboya kayawomele naari kayanna dhego.
> Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa
> Mulugu waviravira vadhulu va mahinje, osasanyedhelaga.  <div
> eID="gen11" type="paragraph"/><br />
> (TKE)
> greg at Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
> XHTML -k Gen 1:2
> Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
> ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
> waviravira vadhulu va mahinje, osasanyedhelaga.  <div eID="gen11"
> type="paragraph"/>
> (TKE)
>
> All three are outputting the same verse from the same module. HTML and
> XHTML are outputting <div eID="gen11" type="paragraph"/> which is what
> the module has in its rawest form. HTMLHREF outputs <!/P> which is not
> valid anything. There are other, odd, differences between the three
> but none of those are germane to this discussion, it would seem to me.

HTML & XHTML are obviously problematic because eID isn't part of 
(X)HTML, but it's arguable that there is no problem with the HTMLHREF 
output. HTMLHREF is a proprietary format that was developed for 
GnomeSword, so it has extra stuff in it for subsequent processing within 
that application. It's HTML-ish, but not standard HTML, and the degree 
to which it violates HTML specs is really a matter to be decided by the 
Xiphos developers & anyone else using this format.

All of the extra stuff in the HTML output is inserted by Diatheke after 
the render filters have been applied to indicate the character encoding 
& add linebreaks. No one has bothered to add similar markup for the 
other HTML filters, and I would not necessarily argue that anything 
should be added.

--Chris




More information about the sword-devel mailing list