[sword-devel] Pre-verse hack Was: Re: Small problem with section headers in an OSIS module

DM Smith dmsmith555 at yahoo.com
Wed Dec 3 16:00:18 MST 2008

Troy A. Griffitts wrote:
> OK, I can't remember all my arguments for keeping the infamous 
> "Pre-verse hack", but generally, the logic goes something like this:
> module content is displayed in many different contexts: the obvious 
> contextual display, search results, popup verse lists for x-ref, 
> references, etc. parallel displays, and such.
> A goal for the engine is to make easy things easy and more complicated 
> things possible.
> So,
> SWMgr library;
> SWModule &kjv = library.getModule("KJV");
> kjv.setKey("jn.3.16");
> cout << kjv;
> should be understandable and straightforward by most any beginning C++ 
> programmer.
> If the programmer were expected to skip content until they found the 
> <verse> tag, then it would become much more complicated.
Agreed. I presume that the programmer would set some criteria on how 
they would like the content to be returned. E.g. headings, format 
(plain, html, rtf,...) notes, ....

> The response would be... well, let the filters strip everything before 
> the verse tag and place it in the pre-verse entryAttributes slot.
> My response is: osis2mod should do this keeping the logic out of 
> multiple filters and running in realtime, but instead only in one 
> codebase and only once in the importer/import activity.
I don't understand this comment. osis2mod marks the title as x-preverse 
and moves some tags that would otherwise goof up the placement of the 
verse number.

But the SWORD engine has to figure out where the verse number should go. 
And it needs to know how to turn on/off titles. If you are saying that 
this should be in one piece of code, I'd agree with that.

> I'm not necessarily against retaining the n attribute of <verse>, or 
> even the <verse> tag-- unless the <verse> tag becomes the place marker 
> for start of canonical text, as essential has been suggested.

In OSIS canonical text is indicated by the canonical attribute. 
According to the OSIS manual:
The canonical attribute identifies the element bearing it as
containing actual text of the work being encoded, as
opposed to annotations, commentary, inserted headings,
header metadata, notes, and other (non-canonical)
information. Its value inherits in the same way as xml:lang.
That is, the value applies to all descendant elements except
where overridden.
(Note: for a Bible, this is a distinction of extra-biblical material. 
For a commentary or any other work, most elements should be 
canonical="true" as we are representing the text as given.)

It defaults to true for verse and osisText and set it to false for 
corpusHeader, header, div, note, reference, title and titlePage. All 
other elements it is optional and inherited. (What in the world does 
"inherited" mean when milestoned elements are used???!!!!)

Note: OSIS requires that div be a child of osisText as the container for 
the document text. It is meaningless that the default on osisText is 
canonical="true", as all of the allowable children have defaults of 

Given a single SWORD verse (i.e. what SWORD returns when a verse is 
requested), the implication is that any element outside of a verse 
element is by default not canonical. This assumes that all elements 
outside the context of a verse do not specify canonical="true".

I'm not inclined to change osis2mod to ensure that canonical ="true" 
when inherited.
>   I 
> wouldn't mind having a new entryAttributes()["verse"]["n"]["body"] = 
> "3-12";  And we could pull this from a <verse n="3-12"> if left in the 
> content, 
I don't know the SWORD engine well enough to understand this.

> but THIS <verse> tag SHOULDN'T represent the MARKER WHERE 
Agreed. Today, we don't do this. We shouldn't in the future either. To 
be faithful to OSIS, it should be driven by the canonical attribute. We 
do this today with regard to the title elements.
>   In fact, if we included <verse> in the content, 
> I'm quite sure we would digress to this usage, which makes things much 
> more complicated per above argument.
I don't see how this would happen in the SWORD or JSword engine. You and 
I have a bit of control there.

> I'm very much in favour of extending pre-verse to include all pre-verse 
> content-- not just title.

There are several separate things to address. (Numbered but in no 
particular order and there might be others).
1) Inter-verse content needs to be split between the prior and the 
following verse. Today, we re-arrange the inter-verse material, 
appending whitespace producing elements to the prior verse.
2) Allowing content within a title. E.g. Strong's numbers, footnotes, 
divine name, ... especially true for Psalm titles.
3) Allowing content other than just title. E.g. Section introductions. 
These may have footnotes and cross-references, too. I've seen this in 
devotional Bibles.
4) Not re-arranging elements. That is, allowing whitespace producing 
markup to be in it's original location.
5) Having the <verse> element be retained. While this can indicate the 
end and start of interverse material, we can have preverse/interverse 
6) Allowing titles within the middle of a verse. (I might be wrong, but 
I think the SWORD engine pulls them out of line and puts them before the 
7) Not hiding the structural markup (e.g. div, p) when headers are not 

To me, solving these results in solving the pre-verse hack, even if we 
still mark the pre-verse material.

Here is a reasonable representation of what people are showing me (Go to 
the end for more comments):
<div type="book" osisID="Matt">
<chapter eID="Matt.4"/>
<chapter osisID="Matt.5" sID="Matt.5"/>
<div type="section">
<title>Sermon on the Mount</title>
<div type="devotional">
.... devotional material...
<note type="crossReference">
<reference osisID="Matt.x.y">x:y</reference>
<note type="footnote">
</div> <!-- End of devotional description of the Sermon on the Mount -->
<div type="subSection">
<title>The Beatitudes</title>
<verse sID="Matt.5.1" osisID="Matt.5.1"/>
<verse eID="Matt.5.1"/>
<verse sID="Matt.5.2" osisID="Matt.5.2"/>
<verse eID="Matt.5.2"/>
<verse sID="Matt.5.3" osisID="Matt.5.3"/>
<q sID="Matt.5.3" who="Jesus" marker="“/>...
<verse eID="Matt.5.3"/>
<verse sID="Matt.5.4" osisID="Matt.5.4"/>
<verse eID="Matt.5.4"/>
<verse sID="Matt.5.5" osisID="Matt.5.5"/>
<verse eID="Matt.5.5"/>
<div type="subSection">
<title>Disciples and the World</title>
<verse sID="Matt.5.13" osisID="Matt.5.13"/>
<verse eID="Matt.5.13"/>
<verse sID="Matt.5.14" osisID="Matt.5.14"/>
<verse eID="Matt.5.14"/>
<verse sID="Matt.5.15" osisID="Matt.5.15"/>
<verse eID="Matt.5.15"/>
<verse sID="Matt.5.16" osisID="Matt.5.16"/>
<verse eID="Matt.5.16"/>
<div type="section">
<chapter eID="Matt.5"/>
<chapter osisID="Matt.6" sID="Matt.6"/>
<chapter eID="Matt.6"/>
<chapter osisID="Matt.7" sID="Matt.7"/>
<div type="section">
<title>The Two Foundations</title>
<verse sID="Matt.7.27" osisID="Matt.7.27"/>
<q eID="Matt.5.3" who="Jesus" marker="”"/>
<verse eID="Matt.7.27"/>
<verse sID="Matt.7.28" osisID="Matt.7.28-Matt.7.29"/>
<verse eID="Matt.7.28"/>
</div> <!-- end of The Two Foundations -->
<chapter eID="Matt.7"/>
</div> <!-- End of Sermon on the Mount -->
</div> <!-- End of Matt -->

In this example, much of the interverse tags are necessary for the 
proper layout of the passage.
Today the only allowable pre-verse material is a title. The headings 
on/off strips out all pre-verse material. Currently, this filter does 
not strip out introductions. My thinking is that the heading filter 
should not strip out whitespace producing structural elements (e.g. 
<div>, <lg>, <l>, <p>, ...) nor non-canonical inter-verse narrative. 
Maybe there should be a filter for showing verse text only, i.e. a 
canonical/non-canonical filter. But that should not strip out structural 

In Christ,
> Thoughts?
> 	-Troy.
> DM Smith wrote:
>> On Nov 30, 2008, at 12:00 AM, Chris Little wrote:
>>> Tom Cornell wrote:
>> <snip/>
>>>> My markup looks like this, basically:
>>>> ...
>>>> </div>
>>>> <div type="section">
>>>> <title>The Section Title</title>
>>>> <verse sID="..." .../>...<verse eID="..."/>
>>>> ...
>>>> </div>
>>> This markup is definitely correct.
>>> More long-term, DM and I are in agreement that we need to change the  
>>> way
>>> we handle storage of OSIS documents within modules. We feel we need to
>>> get away from the pre-verse hacks that you'll notice in the output  
>>> from
>>> mod2imp. And we feel we need to do a better job of preserving all of  
>>> the
>>> data in a document (including the <verse> tags themselves).
>>> When I committed a new version of osis2mod 3-4 years ago that did  
>>> all of
>>> this (in a way that neither harmed existing nor future data) it was
>>> roundly rejected and reverted. I'm still convinced that preservation,
>>> including storing <verse>, is the only solution to certain of our
>>> problems. And I'm hoping that DM and I can convince the naysayers of  
>>> the
>>> merits of that position.
>> The pre-verse hack is that the <title>...</title> is yanked out of  
>> line and prepended to the following verse using the following construct:
>> <title type="section" subType="x-preverse">...</title>
>> (That is we add type="section" and subTuype="x-preverse" to the title  
>> element. This may be lossy.
>> The other part of the preverse hack is that SWORD modules do not  
>> contain the <verse> tag.
>> The SWORD engine uses this to do two things:
>> 1) Handle headings. Currently only <title> is allowed in this x- 
>> preverse div.
>> 2) Know where to place the verse number.
>> Any changes to the SWORD engine will still need to handle existing  
>> modules.
>> Chris, Troy and I are in agreement that this needs to change. There  
>> are two proposed solutions:
>> 1) Change osis2mod to output the tags in the order that they occur,  
>> either appending them to the prior verse or prepending them to the  
>> following verse and also output the verse start and end tags.
>> 2) Extend the pre-verse hack to include more than just title. All  
>> inter-verse tags are output in the order they appear, either appended  
>> to the prior verse or placed into a preverse div and the preverse div  
>> is prepended as before.
>> <div sID="xxxx" type="section" subType="x-preverse"/>
>>      ... inter-verse stuff before the title that belongs with this  
>> verse...
>>      <title>...</title>
>>      ... inter-verse stuff after the title ...
>> <div eID="xxx">
>> (Note: osis2mod performs some transformations. For example, it  
>> transforms all container elements into their milestoned form. OSIS  
>> does not allow <p> to be milestoned, so <lb type="x-begin-paragraph"/>  
>> and <lb type="x-end-paragraph"/> are used as rough equivalents.)
>> A related aspect of osis2mod is the identification of introductory  
>> material. I'll write about this separately.
>> Chris and I would like to see that osis2mod is a lossless  
>> transformation into a SWORD module, at least for the text of the Bible  
>> or Commentary. Today, the <verse> element and it's attributes are not  
>> included in the module. There are a few advantages of having it in the  
>> module.
>> 1) It provides the exact placement of the verse number and renders the  
>> pre-verse hack entirely unnecessary. Yes the SWORD engine will still  
>> need to support the hack.
>> 2) The verse tag conveys information beyond the placement of the verse  
>> number. Of the attributes, the n attribute is perhaps the most  
>> significant. The n attribute holds the verse number. While this  
>> typically is just a number, for some Bibles, it can give a range, e.g.  
>> 4-7. This could be useful.
>> 3) Whitespace does not need to be added.
>> 4) osis2mod is greatly simplified.
>> 5) osis2mod would be lossless.
>> Today, you can experiment with the <verse> tag being included in a  
>> module by uncommenting
>> //#define INCLUDE_TAGS
>> in osis2mod.
>> Having the verse tag present requires no changes to the SWORD engine.
>> In His Service,
>> 	DM

More information about the sword-devel mailing list