[sword-devel] Titles and other Inter-verse material

DM Smith dmsmith at crosswire.org
Mon Jul 23 08:52:57 MST 2012


Let me clear up some misconceptions:
Osis2mod does not re-order anything.
It retains nearly everything. Always, in the original order.
osis2mod's transformations are few, well-defined and documented: http://www.crosswire.org/wiki/Osis2mod#Transformations

Using a revision prior to 2358 and what you say is true:
	osis2mod reorders everything it thinks is a title.
	Anything between verses other than titles may orphan the verse number.
	The transformations were not well defined.

On Jul 23, 2012, at 2:57 AM, Peter von Kaehne wrote:

> Thanks DM,
> 
> Forgive for top posting.
> 
> Put simply, as a module maker I would prefer if osis2mod would simply
> stop interfering. 

I understand the sentiment. This has been the goal. But stated a different way, osis2mod should take any good OSIS and make a module from it. The SWORD filters should handle it in a manner that makes sense.


> 
> Osis2mod should identify chunks of text and put them into the right
> places and then be done - as little transformation as possible.

The vast majority of what osis2mod does is determine the boundaries between chunks and put them into the right places.

Regarding transformations, they are necessary. SWORD's requires BCV. Module makers want BSP. Module makers should not care what the transformations are. If they need to, then there is more work that needs to be done in the SWORD engine and perhaps osis2mod.

There are two types of transformations:
1) Converting structural container elements (e.g. div, chapter, lg, ...) into milestoned versions.
2) Marking up the Words of Christ per verse. (Allows the quote to start in one chapter and end in another)

The goal of these is so that a verse in isolation (e.g. in a hit list, in a parallel view [i.e. table cell]...) will  be well-formed XML.


> 
> At that point it would allow us to do following things
> 
> 1) Define a standard markup which will work/should work and not working
> would become a filter/frontend bug

See the wiki for how to write a BSP OSIS document. It should work. If not it is a filter bug. The whole point of this thread is to identify what the filter should handle and does not.

Currently the SWORD engine lifts a "title" out of the text and allows it to be accessed as a note or a strong's number or morph. This allows for the following construct in front-end code:

For each verse to display
do
	Output the "title" (aka preverse material)
	Output a verse number
	Output the verse text
done

The problem is that it is more often like (in effect):
For each verse to display
do
	if (headings should be shown)
		output the title
	Output a verse number
	Output a verse text
done

There is far more in the pre-verse block than just a title. It holds all the elements between verses that belong with the following verse.

> 
> 2) Create clean module making tools which create a visible and well
> defined output.

Just create a raw module using the -d 2 flag to put milestones where the verse elements would be and open it in a text editor. It is well-defined.

> 
> The black box of osis2mod which creates a format which can only be
> interrogated with tools with a large number of further layers (filters
> etc) with any number of intervening bugs makes the whole thing so
> frustrating.

This is not the case. Just make a raw module and open it in any text editor. Use the -d 2 flag to put in milestones where the verse elements would be.

> 
> If we had a clear input format - this is how a CrossWire OSIS text
> should look like - we could create the tools which create this very OSIS
> text, subject it to XML validation, mess around with it and tell any
> frontend author "This is what you do wrong!" - with exact and pinpoint
> accuracy.

See the wiki for how to write a BSP OSIS document. It should work. If not it is a filter bug. (There are a couple of bugs in osis2mod that cause it to quit prematurely.)

> 
> The complexity of the purposes of osis2mod is what makes it so
> vulnerable.
The purposes of osis2mod are not many and are not complex:
1) Accept any and all OSIS documents that are Bible like (e.g. commentaries)
2) Keep everything other than OSIS header, verse elements and stuff that is commented out.
3) Re-order nothing. (I repeat re-order nothing!!!!!!)
4) Find the boundaries between "verses" (quoted, because this includes Introductions to Testament, Books and Chapters -- AKA verse 0)
5) Transform from BSP to BCV.
6) Mark Word of Christ on a per verse basis.

> 
> A tool should do one thing and do it well - not a multiplicity of things
> at the same time in a half-arsed way.

What is half-arsed?


> 
> So, if we want to automatise BSP->BSP conversion and preverse reordering
> then the output should not be a module but another OSIS text.

I presume you meant BSP -> BCV.

It is a two line change to retain the verse elements. If that were done then the raw output would be that form.

It could then be fed back in to create a module, this time without the -d 2 flag.
Actually, vpl2mod might work.

> 
> The final, ideal OSIS2mod should take a defined format and create a
> module, but baulk at nonstandard formats with a clear error message.
> 
> Re Troy's suggestion re improved tests - it stunned me as a suggestion
> hence I had no response. Sure, we need this, but right now the situation
> is so complicated with multiple interfering layers, that we do not even
> know what we test. And osis2mod in its current form is the one
> interfering layer too much.

The problem is not osis2mod, but the filter. That is the purpose of this email, to define the scenarios that currently fail in the filter.

> 
> So, my proposal - cut osis2mod into half. Let one part handle reordering
> but then spit the result out as a new osis file. Which can be tested,
> which can be worked upon. 
> 
> And let the other part do the defined format to module transform,
> without any further transformations. And whine if something is not to
> exact liking.
> 
> Peter 
> 
> On Sun, 2012-07-22 at 17:38 -0400, DM Smith wrote:
>> I had mentioned earlier that I'd send something on this. These thoughts
>> are from working on a few modules and on osis2mod.
>> 
>> There are several things that play into this: 1) Titles: These use the
>> <title> element for their content. This has been the focus of much of
>> the discussion. The Show/Hide Heading filter was designed with this in
>> mind. Later, the ability to always show canonical titles (e.g. Psalm
>> titles) was added.
>> 
>> 2) Rich content in titles. Canonical titles are the premier example of
>> this, having Strong's Numbers and Morphology info; Markup for Divine
>> name; notes, ....
>> 
>> 3) Sections. The OSIS spec suggests that a title should be within and
>> at the top of a <div type="section"> element. They typically surround
>> verses. That is the <div> and </div> should be between verses.
>> 
>> 4) Paragraphing. The <p> element typically surrounding verses. Often
>> they are in sections. Likewise the <p> and </p> should be between
>> verses. (Note: <p/> (empty paragraphs) is just plain bad form.)
>> 
>> 5) Split verses. A verse may be split by titles, sections and
>> paragraphs. I don't particularly like it, but I've seen it. I could
>> very will be wrong, but I think it is an artifact of a translation
>> using a KJV versification but disagreeing where the verses really start
>> and end.
>> 
>> 6) Poetry. This uses three elements <lg>, <l> and <lb> (from memory) to
>> create a group of stanzas where each might be split over several lines.
>> Poetry often starts in the middle of a verse. And may end within a
>> verse. But it is not uncommon for it to surround verses. That is to say
>> we can expect these elements between verses too.
>> 
>> 7) Arbitrary interverse content. Introductory material can be pretty
>> much anything. Typically we expect this at the beginning of Bible books
>> and even chapters. It is not unreasonable for it to occur between
>> verses within a chapter, as in a study Bible.
>> 
>> 8) Block element handling. HTML agents have special handling of nested
>> block elements. Simplistically, a block element start that follows one
>> or more block starts is treated specially, often coalescing vertical
>> whitespace. If the block element has particular visual styling
>> (margins, padding, indentation, ...), it is applied. I mention this
>> because there have been numerous comments about too much vertical
>> whitespace. In handling vertical whitespace, I think a distinction
>> needs to be made between structural markup that needs to be retained
>> even if titles, headings, introductions are hidden.
>> 
>> 9) osis2mod transforms from BSP (Book/Section/Paragraph) into BCV
>> (Book/Chapter/Verse). This allows for a verse in isolation to be valid
>> xml. This makes <div> (and other block elements) to no longer behave
>> like HTML containers.
>> 
>> 10) x-preverse markup. Currently osis2mod is using (where %d is a
>> matched pair): <div type=\"x-milestone\" subType=\"x-preverse\"
>> sID=\"pv%d\"/>...pre-verse content...<div type=\"x-milestone\"
>> subType=\"x-preverse\" eID=\"pv%d\"/> Note: These are merely milestones
>> and should never produce whitespace of any kind. The only purpose of
>> the construct is to know what is before the verse. A problem is that
>> the Show/Hide Headings filter treats this as something that can be
>> toggled. It may contain much that needs to be retained. (see 8)
>> 
>> 11) Retention of all markup (except the <verse> element) in the order
>> that it appears in the input. Module authors are going far beyond a
>> simple markup of just the basic verse content. We've published in the
>> wiki best practices in marking up various things. If followed it should
>> have a reasonable rendition in a module. (Please, let's not diverge on
>> to the verse element discussion. It doesn't change the problem at
>> hand.)
>> 
>> Troy suggested bolstering the test case. I'm not at all sure how to go
>> about doing that. Especially the expected output.
>> 
>> Hope this helps.
>> 
>> In His Service, 	DM _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to
>> unsubscribe/change your settings at above page
> 
> 
> 
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list