[sword-devel] Valid vs Best Practice XML
dmsmith at crosswire.org
Sat Sep 15 10:26:45 MST 2012
On Sep 14, 2012, at 8:15 PM, Chris Little <chrislit at crosswire.org> wrote:
> On 09/14/2012 01:02 PM, Greg Hellings wrote:
> > So I've been debugging a module display problem in BibleTime. I
> > mentioned it on IRC with Troy the other day but we weren't able to
> > connect at the same time to discuss further. The issue has to do with
> > paragraph tags - in osis2mod these tags are being converted from <p>
> > to <div sID="someid" type="paragraph" />.
> This is extraordinarily bad. This is a change in semantics, because <p> and <div type="paragraph"> are not semantically equivalent.
> <p> marks the type of paragraph we all probably think of first: generally, a chunk of text with newlines before and after.
> <div type="paragraph"> marks a formal division within a text that happens to be identified as a 'paragraph' and may consist of multiple <p>-type paragraphs. Examples of these divisions are found in many laws and the Catechism of the Catholic Church (which does exist in OSIS form). Here's part 1, section 1, chapter 1, article 1, paragraph 1 of the CCC: http://www.vatican.va/archive/ENG0015/__P16.HTM. As you can see, it consists of many <p>-type paragraphs but is a single <div type="paragraph">-type paragraph.
No where in the OSIS manual does it give any indication of a semantic difference.
> Abhorrent though I consider milestoned <p/>, I think I would much prefer to see us map <p>...</p> to <p sID=""/>...<p eID=""/> than see us clobber the semantics of a defined <div> type.
It may be abhorent from a module authoring perspective, but from a software perspective, it is needed. I think it is better than <div type="x-p" ...>.
In OSIS the only container element that is not milestoneable is <p>. The goal of osis2mod is to create BCV where verse is the container.
All SWORD/JSword software requires that a verse in isolation can be meaningfully rendered. (for hit lists, verse lists, parallel view, cross-reference popups, ...)
If we had a mode flag for SWORD and JSword that would indicate the scope (chapter or verse), then the render filter could do BSP for chapter and BCV for verse.
I would rather see milestoned <p> too. However, it seems that the spec is not being maintained/updated. We have a page in the wiki with our recommendations for changes to the OSIS spec. How can we move them forward?
I'd suggest that we maintain our own OSIS schema with the changes and fixes mentioned there and use that in our module validation.
> > Thus, osis2mod is in violation of the suggested XML best practice by
> > creating a non-EMPTY tag as self-closing but this is seemingly pretty
> > common in the OSIS world. Furthermore our filters are producing
> > invalid (or very strongly discouraged) HTML as per every still-in-use
> > version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
> > opinion that this represents a bug in SWORD - at the very least in the
> > filters that permit empty, self-closing div tags to slip through what
> > are supposedly HTML outputs. Do others agree or disagree on this?
> I'm of the opinion that our OSIS is generally fine, meaning we should go ahead and keep allowing self-closing OSIS tags if possible (as input and output from osis2mod and as content of modules not produced by osis2mod). This is just a recommendation and specifically a recommendation for the purpose of aiding processing with legacy SGML tools, which I can't see us doing and don't personally care about. (The semantic violation noted above is a bug in my mind, but that issue is orthogonal.)
You've opened a Jira issue on it, which I'll be glad to work on once we have an acceptable mechanism to milestone paragraphs.
> I would agree that the filter output is buggy if we're generating disallowed tag forms. OSIS <div> and <p> would need to be translated to their correct, non-self-closing HTML forms. Beyond those two, I can't think of any tags that have the same form & general semantics in both OSIS & HTML.
Table cells and list items are similar between OSIS and HTML: container elements that generally imply vertical whitespace.
More information about the sword-devel