[sword-devel] Valid vs Best Practice XML

Greg Hellings greg.hellings at gmail.com
Fri Sep 14 13:02:38 MST 2012


So I've been debugging a module display problem in BibleTime. I
mentioned it on IRC with Troy the other day but we weren't able to
connect at the same time to discuss further. The issue has to do with
paragraph tags - in osis2mod these tags are being converted from <p>
to <div sID="someid" type="paragraph" />.

These tags are passing through to BibleTime and are messing with the
rendering of the module. In the case of this particular module, the
<p> tags lie outside of the verses so </p> is being converted to <div
type="paragraph" eID="something" /> on the end of a verse and the <p>
is being added as a preverse header <div type="paragraph"
sID="something" />. Now <div type="paragraph" sID="something" /> is
technically valid XML because the tag is self-closing. However, the
<div> tags in OSIS are not defined as necessarily empty tags - that
is, they are able to hold content these ones simply are not doing so.
As such, the XML spec says that they _should_ not be created as self
closing (see http://www.w3.org/TR/xml/#d0e2480, the relevant text of
which reads "Empty-element tags may be used for any element which has
no content, whether or not it is declared using the keyword EMPTY. For
interoperability, the empty-element tag should be used, and should
only be used, for elements which are declared EMPTY.").

Furthermore, we leave these <div> tags alone in the default HTML and
XHTML rendering filters. Troy claimed that BibleTime does not use
SWORD's filters, which is incorrect - our OsisToHtml filter is an
extensions of sword::OSISHTMLHREF with heavily customized output.
Both BibleTime and SWORD's filters - at least the HTML filters - leave
div tags in place. I'm not sure what our target HTML version is, but
if we're targeting HTML4 then the self-closing tag is strongly advised
against ("SGML systems conforming to [ISO8879] are expected to
recognize a number of features that aren't widely supported by HTML
user agents. We recommend that authors avoid using all of these
features."). If we are targeting HTML5, then the spec provides for
optional '/' in void elements (area, base, br, col, command, embed,
hr, img, input, keygen, link, meta, param, source, track, wbr) where
the character is purely decoration. It is not valid in any other
native elements. All of them must close with a distinct close tag. See
http://dev.w3.org/html5/spec-author-view/syntax.html#syntax-start-tag
for the appropriate text. In XHTML it is also not permitted as stated
in these two answers here http://www.w3.org/TR/xhtml-media-types/#C_2

Thus, osis2mod is in violation of the suggested XML best practice by
creating a non-EMPTY tag as self-closing but this is seemingly pretty
common in the OSIS world. Furthermore our filters are producing
invalid (or very strongly discouraged) HTML as per every still-in-use
version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
opinion that this represents a bug in SWORD - at the very least in the
filters that permit empty, self-closing div tags to slip through what
are supposedly HTML outputs. Do others agree or disagree on this?

--Greg



More information about the sword-devel mailing list