[sword-devel] Valid vs Best Practice XML

Greg Hellings greg.hellings at gmail.com
Fri Sep 14 13:21:33 MST 2012

On Fri, Sep 14, 2012 at 3:15 PM, DM Smith <dmsmith at crosswire.org> wrote:
> Just to focus the post in what I see at play.
> A few issues here:
> 1) Does OSIS's milestone form of container elements violate XML best practices and does it matter? The sID/eID is a common OSIS construct and, IIRC, in the TEI world.

Milestoned could still be built just by changing to <div
sID="id"></div> and still honor the milestoned concept and the XML

> 2) What should SWORD filters do when outputting vertical whitespace. I've noted on other threads that there are other problems? This mostly deals with nesting.
> 3) How should SWORD filters handle container elements that can cross other container elements, especially when verses are shown in isolation or in table cells? E.g. A paragraph or div that starts in one verse and ends within another.
> 4) Should osis2mod use this form for a milestoned paragraph, which OSIS does not have.

osis2mod could use the above format for milestoned elements and
shouldn't lose anything in the process. At least, in terms of the XML
structure it is producing. SWORD"s isEmpty() method on the XmlTag
class would not behave the same on these tags, though, since it looks
for self-closing. But all that is just "should" in the XML spec, not a
requirement the way it is in XHTML and HTML5.

As such, it seems osis2mod could be left alone if the filters are
altered and both sets of specs would be minimally fulfilled. Or both
could change and fulfill even the recommendations of the specs.


> I think that if the filter output a <br/> for these it would do better.
> On Sep 14, 2012, at 4:02 PM, Greg Hellings <greg.hellings at gmail.com> wrote:
>> So I've been debugging a module display problem in BibleTime. I
>> mentioned it on IRC with Troy the other day but we weren't able to
>> connect at the same time to discuss further. The issue has to do with
>> paragraph tags - in osis2mod these tags are being converted from <p>
>> to <div sID="someid" type="paragraph" />.
>> These tags are passing through to BibleTime and are messing with the
>> rendering of the module. In the case of this particular module, the
>> <p> tags lie outside of the verses so </p> is being converted to <div
>> type="paragraph" eID="something" /> on the end of a verse and the <p>
>> is being added as a preverse header <div type="paragraph"
>> sID="something" />. Now <div type="paragraph" sID="something" /> is
>> technically valid XML because the tag is self-closing. However, the
>> <div> tags in OSIS are not defined as necessarily empty tags - that
>> is, they are able to hold content these ones simply are not doing so.
>> As such, the XML spec says that they _should_ not be created as self
>> closing (see http://www.w3.org/TR/xml/#d0e2480, the relevant text of
>> which reads "Empty-element tags may be used for any element which has
>> no content, whether or not it is declared using the keyword EMPTY. For
>> interoperability, the empty-element tag should be used, and should
>> only be used, for elements which are declared EMPTY.").
>> Furthermore, we leave these <div> tags alone in the default HTML and
>> XHTML rendering filters. Troy claimed that BibleTime does not use
>> SWORD's filters, which is incorrect - our OsisToHtml filter is an
>> extensions of sword::OSISHTMLHREF with heavily customized output.
>> Both BibleTime and SWORD's filters - at least the HTML filters - leave
>> div tags in place. I'm not sure what our target HTML version is, but
>> if we're targeting HTML4 then the self-closing tag is strongly advised
>> against ("SGML systems conforming to [ISO8879] are expected to
>> recognize a number of features that aren't widely supported by HTML
>> user agents. We recommend that authors avoid using all of these
>> features."). If we are targeting HTML5, then the spec provides for
>> optional '/' in void elements (area, base, br, col, command, embed,
>> hr, img, input, keygen, link, meta, param, source, track, wbr) where
>> the character is purely decoration. It is not valid in any other
>> native elements. All of them must close with a distinct close tag. See
>> http://dev.w3.org/html5/spec-author-view/syntax.html#syntax-start-tag
>> for the appropriate text. In XHTML it is also not permitted as stated
>> in these two answers here http://www.w3.org/TR/xhtml-media-types/#C_2
>> Thus, osis2mod is in violation of the suggested XML best practice by
>> creating a non-EMPTY tag as self-closing but this is seemingly pretty
>> common in the OSIS world. Furthermore our filters are producing
>> invalid (or very strongly discouraged) HTML as per every still-in-use
>> version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
>> opinion that this represents a bug in SWORD - at the very least in the
>> filters that permit empty, self-closing div tags to slip through what
>> are supposedly HTML outputs. Do others agree or disagree on this?
>> --Greg
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

More information about the sword-devel mailing list