[sword-devel] Valid vs Best Practice XML
chrislit at crosswire.org
Fri Sep 14 17:15:28 MST 2012
On 09/14/2012 01:02 PM, Greg Hellings wrote:
> So I've been debugging a module display problem in BibleTime. I
> mentioned it on IRC with Troy the other day but we weren't able to
> connect at the same time to discuss further. The issue has to do with
> paragraph tags - in osis2mod these tags are being converted from <p>
> to <div sID="someid" type="paragraph" />.
This is extraordinarily bad. This is a change in semantics, because <p>
and <div type="paragraph"> are not semantically equivalent.
<p> marks the type of paragraph we all probably think of first:
generally, a chunk of text with newlines before and after.
<div type="paragraph"> marks a formal division within a text that
happens to be identified as a 'paragraph' and may consist of multiple
<p>-type paragraphs. Examples of these divisions are found in many laws
and the Catechism of the Catholic Church (which does exist in OSIS
form). Here's part 1, section 1, chapter 1, article 1, paragraph 1 of
the CCC: http://www.vatican.va/archive/ENG0015/__P16.HTM. As you can
see, it consists of many <p>-type paragraphs but is a single <div
Abhorrent though I consider milestoned <p/>, I think I would much prefer
to see us map <p>...</p> to <p sID=""/>...<p eID=""/> than see us
clobber the semantics of a defined <div> type.
> Thus, osis2mod is in violation of the suggested XML best practice by
> creating a non-EMPTY tag as self-closing but this is seemingly pretty
> common in the OSIS world. Furthermore our filters are producing
> invalid (or very strongly discouraged) HTML as per every still-in-use
> version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
> opinion that this represents a bug in SWORD - at the very least in the
> filters that permit empty, self-closing div tags to slip through what
> are supposedly HTML outputs. Do others agree or disagree on this?
I'm of the opinion that our OSIS is generally fine, meaning we should go
ahead and keep allowing self-closing OSIS tags if possible (as input and
output from osis2mod and as content of modules not produced by
osis2mod). This is just a recommendation and specifically a
recommendation for the purpose of aiding processing with legacy SGML
tools, which I can't see us doing and don't personally care about. (The
semantic violation noted above is a bug in my mind, but that issue is
I would agree that the filter output is buggy if we're generating
disallowed tag forms. OSIS <div> and <p> would need to be translated to
their correct, non-self-closing HTML forms. Beyond those two, I can't
think of any tags that have the same form & general semantics in both
OSIS & HTML.
More information about the sword-devel