Are individual verses in a Sword Module well formed? was Re: [sword-devel] Verses not in sequential order - front-end problem

Chris Little chrislit at
Wed Apr 20 04:58:19 MST 2005

DM Smith wrote:
> I looked again at the OSIS website and could not find that verse with 
> milestones is the best practice. I think I was able to figure out why it 
> would be a necessary practice. It is mentioned that if any OSIS 
> container element is used in the milestone form then that element must 
> always use the milestone element in the entire work.

I don't find anything either, but trust me that this was the effect of 
our decisions. Book/section/paragraph (BSP) is primary. That is the best 
practice. Book/chapter/verse (BCV) is secondary and overlays BSP. BCV 
doesn't identify linguistically significant or linguistically motivated 
segmentation. It is of essentially historical importance and is used 
because it is a widely accepted system today, in spite of many known 
flaws. BSP is based on linguistically motivated segmentation. It's also 
the system that most of the user base from Bible societies & publishing 
use. So... that's a little of the reasoning behind why BSP was chosen 
over BCV.

You should really avoid milestoning elements in the BSP hierarchy (in 
other words, <div> and <p>, though the latter isn't milestoneable). 
However, elements that sometimes cross these boundaries include things 
like <chapter> and <verse>. So, in effect, you have to use milestones 
for <verse> (which crosses <p> boundaries quite frequently). You can 
probably get away with using a container <chapter> in many Bibles since 
translators/publishers go out of their way to avoid things like 
paragraphs that cross chapter boundaries. (However, you might need to 
use milestoned <chapter> if you use container <q>.)

> Help me if I am missing something here:
> If a Bible has rich markup, then there will be a need for milestones. 
> Lets take <q> and <verse> overlapping as in <q>...<verse>... 
> </q>...</verse>
> 1) Milestones are used for <verse> and not for <q>.
> 2) Milestones are used for <q> and not for <verse>.
> 3) Milestones are used for <q> and <verse>.

Actually, you've got me confused below, unless you mixed up 1 and 2. My 
confusion is with the above for 2 saying <verse> is not milestoned, but 
2 below says it would have to be.

> If 1 is chosen then it will have the most likely side effect of 
> requiring most, if not all other containers to be milestoned. This 
> means: abbr, closer, div, foreign, l, lg, q, salute, seg, signed, and 
> speech. It will be easier to use milestones for all of them unless one 
> is certain that verses will never be split by one.

I don't think <q> would ever cross the boundaries of abbr, closer, 
foreign, salute, or signed.

> If 2 is chosen then it is likely that only verse and possibly chapter 
> will need to be milestoned. So I can see why this may be the best 
> practice. Also, the OSIS manual notes that pretty much the only 
> practical consequence of a verse element is the rendering of a verse 
> number. And of course Sword will use it to mark the start and the length 
> of the verse in the module.
> 3 is the easiest to adhere to the OSIS rule of consistency in 
> milestoning an element in a work.

When I encode, I use milestones for <verse> and <q>. I use them for 
<verse> because some other people decided it would be the best practice 
and because it simplifies things tremendously to make this 
non-linguistic unit cross linguistic unit boundaries. And I use them for 
<q> because the primary use of <q> is for rendering quotation marks and 
because I consider elements like <l> more improtant to maintain as 
containers. But it is really the encoder's choice.

> Of the elements that can contain a verse, at least one, <p>, is not 
> milestoneable. So, if a verse ever crosses one of these then using 
> milestones for verses is a must. What is not clear from the schema is 
> which container elements that can contain verses can hold part of a 
> verse. For example, I don't imagine that <cell> or <item> should. <p> is 
> specifically mentioned in the OSIS manual as allowing verses to be split.

In theory, there is no reason why a verse boundary could not occur 
within a <cell> or <item> element. In practice, I can't think of a time 
when it does. Most instances of <cell> and <item> that I have seen in 
Bibles occurred in a way that contained the element entirely within a 

> With regard to the Sword API, it is possible to get a single verse. If 
> the verse has an an element end tag and not its begin or a begin element 
> and not its end, i.e. it is not well formed, then an XML parse of that 
> verse will fail. OSIS does not require that a verse be well-formed. Does 
> Sword in making a module from OSIS ensure that each verse is well formed?
> If not, then how should it be handled?

No. There is no guarantee that a verse will contain an end tag matching 
every start tag it contains or a start tag matching every end tag it 
contains. The importers give you almost exactly what the document contains.

Troy has some practical ideas for how to deal with this.


More information about the sword-devel mailing list