[sword-devel] the future of OSIS support (importer/filters)

Tue Apr 26 21:08:33 MST 2005

DM Smith wrote:
> I agree that support should be limited to 2.0. Or perhaps 2.1, if it is 
> pretty near completion. At the OSIS website, you cannot find 
> documentation for prior versions. This makes it difficult to manage an 
> earlier version of OSIS. Also, 2.0 is a significant improvement that it 
> should be enough motivation to cut.

I think 2.1 is pretty stable and it may be a while before any of this 
particular suggestion really gets implemented, so my meaning is really 
that we should adopt whatever is current at the time. In any case, for 
our purposes 2.0 and 2.1 are virtually identical.

> With regard to proprietary extensions, I understand they are necessary, 
> but I think their use should be very limited and well-documented. Only 
> when that happens can proper filters be written.

Proprietary extensions aren't entirely evil. :) Some are forced, by 
design, by OSIS (at least in the past), such as x-Strongs was in pre-2.0 
(or was it pre-1.5). OSIS tackles a limits set of features for each 
release. If we want to do detailed linguistic or manuscript markup, we 
couldn't do that with 2.x. Actually, we couldn't do that with any 
version of OSIS Core. That said, I do think we /should/ document 
proprietary extentions for internal use and also to share with the OSIS 
TC for the purpose of improving future versions of OSIS. (That excludes 
things like the pre-verse title type which are essentially intended as 
aids to rendering within Sword.)

>> Verse numbers are not necessarily a single digit and do not 
>> necessarily flow in numerical order. Encoding <verse> elements (along 
>> with their n attributes, when present) permits us to render lettered 
>> verses and range verses easily. It affords us the possibility of 
>> rendering out-of-order verses (though this will require some 
>> additional thinking/work). And until multiple versifications are 
>> actually supported, it allows us to fake them.
> 
> I am not sure what you are thinking, but I don't think it will work. The 
> verse (start/length) index will point to the verse as it is in its 
> order, not by its number. Or it will be massaged to refer to the verse 
> by its number and not its order. Unless more information is added to the 
> index (i.e. what the verse actually is, which at this time is implicit 
> by its offset into the index), this will lead to inconsistencies. We 
> have discussed these at great length here so I won't repeat them again.

The verse element has an n attribute, which is supposed to be used for 
verse number rendering. If you have an element like <verse 
osisID="Matt.1.1 Matt.1.2" n="1-2">, Sword frontends will currently 
render a "1" for the verse number and make no reference to verse "2". 
Yet if you lookup either Matt.1.1 or Matt.1.2, you will get that verse. 
What should be rendered is "1-2". If we have this element in the data, 
we can render the verse number correctly.

Some Bibles mark sub-verses using elements like <verse 
osisID="Matt.1.1!a" n="1a">. As it is, we don't represent sub-verse 
numbers, but we could render "1a" if we had verse tags included. The 
same goes for verses that use non-numeric (or non-Latin numerals) for 
numbering. We could correctly number Hebrew manuscript verses with 
Hebrew letters; Greek manuscripts could be numbered with Greek letters; 
Arabic Bibles could be numbered with Arabic numbers; etc.--if we had the 
verse element.

As I said, handling out-of-order issues would take a little more work so 
it might better be postponed until v11n is handled better, as you suggest.

Until then, however, we store non-canonical verses in the previous 
canonical verse. If we had verse elements (and chapter too, in the case 
of Ps.151), we could at least render these more attractively. As it is, 
they just like a single (big) verse, without verse numbers. Like I said, 
it's basically faked, since you can't actually reference the individual 
non-canonical verses (that's part of the v11n work). But rendering a 
readable well Bible is an improvement over the current situation.

> So, where do you break a verse? Is everything between verses included by 
> the following verse? What about material before the first verse in a 
> chapter/book or work? (i.e. do we actually support introductory material 
> and if so, how is it delineated?)

Yes, material preceding a verse goes in the verse that follows the 
material. The exception is the first verse of a chapter. Material 
preceding the first verse of a chapter goes in the chapter intro. 
Material preceding a chapter element goes in the book intro.

At the moment, material preceding the book's div element goes in the 
book intro also, unless it precedes Gen or Matt (in which case it goes 
in the testament intro). An intro to the prophets, for example, would go 
in the intro to the first book of the prophets (Isa in the current 
static v11n). This is kind of a hack, but it's the best we can do with 
the current v11n.

All this is already supported by the API. Introductions have always been 
part of Sword modules. How frontends support it is not my business, but 
it would be best if they rendered it properly. :D

>> We also have the option of normalizing OSIS to a form of our choosing. 
>> Towards that end, we CAN require that all book/chapter/verse tags be 
>> milestones.
> 
> You have already noted that some OSIS container elements are not 
> milestoneable. For any OSIS work with significant structural markup, 
> these will result in milestones being used for verses, likely for 
> chapters and possibly for book (though I am not aware of any instance of 
> structure crossing a book boundary.)

I don't think anything crosses book boundaries, either, so we /could/ 
permit container book divs. Likewise, we could probably force chapters 
to be well-formed XML. There's really only one place (Rev.12-Rev.13) 
where paragraphs ever cross a chapter division. Arguably, q does at some 
points (but q will often be milestoned). So we could normalize 
containers that cross chapters as milestones, if that helps anyone and 
provided there are no negative consequences anyone can think of.

> From earlier threads on quotes, there are several quote markers that 
> need to be handled.
> Block vs inline quotes. (The <q> tag is used for both, but it is not 
> clear when to render one or the other. These are structural elements, 
> not simply rendering issues. Does OSIS define a mechanism for this?)

Block quotes need to have type="block" set.

> Beginning quote mark, continuing quote mark, end quote mark, nested 
> begin/continue and end quote marks, and nested with in nested quote 
> marks. (I consider this to be a structural issue. Notice, there is no 
> mention of the actual marks that are used.)

Nesting can be specified by the level attribute. Which mark is used is 
supposed to be a style-sheet issue, hence my suggestion that we handle 
it in .confs. However, there is also the n attribute, where you can put 
the rendered form of the quotation mark, I believe. (I forget, but we 
might have also talked about adding a rend attribute to serve this 
purpose instead.)

>  From a JSword perspective, we work on only the verses that the user 
> wishes to see. In the context of a fragment of a larger, complicated 
> quote, there will not be enough information carried in the conf to 
> determine where we are in the structure of the complex quote to render 
> it the same as when the entire context is shown.

> Can we include information on the <q> element concerning the kind of 
> quote mark that is used? (I don't mean the actual mark)

I presume we would define something like level 1, 2, ... n marks that 
begin & end a quotation and that mark both sides of a break in quotation 
(according to what a language requires). English, for example would need 
levels 1 & 2, beginning, continuation beginning, and end--6 marks total 
(level = level modulo 2). So if you hit a tag that reads <q eID="..." 
level="2"/>, you know to render a single 9 quotation mark.

We could do this on a per-translation basis or a per-language basis and 
we could allow switching based on locale or user preference.

> While this has been limited to OSIS bibles, I would like to entertain a 
> discussion on other works wrt OSIS, for the express purpose of ensuring 
> that we don't make decisions that need to be revisited.
> 
> Specifically, I am thinking about Nave's and Strongs, both of which have 
> (at least) two interesting characteristics in common:
> 1) They have two keys. In the case of Strongs, they have a Strong's 
> number and they have the word to which that number refers. Nave's is 
> similar in that it has both a code and a word for that code. The basic 
> difference between them is that Strong's uses the number for the key and 
> displays the word along with the definition and Naves uses the word for 
> the key and does not does not display the code. Nave's code is in the 
> source as a means of cross-referencing words.

Those codes are not from Nave. They are OLB's indexing mechanism. They 
should be replaced by <reference> elements that point to the entry they 
represent.

> 2) Both have references to other entries. In the case of Strongs, it 
> will refer from Strongs Greek to Strongs hebrew as well as internally.
> When I tackle Naves, I want to be able to create an internal cross 
> referencing as well as a referencing to verses.

We should probably make the Greek & Hebrew versions a single module. The 
current modules are based on databases intended for OLB, so they just 
have numbers for keys (four digit numbers plus a leading 0 in the source 
for Hebrew words). A better way to do this is with a leading G or H in 
the key (osisID). That's how Strong's numbers are referenced in OSIS 
modules, for example.

Anyway, your question is really about cross-referencing. The correct way 
to do that is with the reference element. Internal cross-referencing we 
can probably handle pretty easily. <reference 
osisRef="Moses">Moses</reference> would be used to create a reference to 
the Moses entry in the same document (technically, whatever element has 
osisID="Moses"). Frontends don't support this (to my knowledge), but 
that's how it's supposed to be encoded.

References to OTHER works (modules) is going to be a headache that I 
recommend we put off until Sword 3.0. :) It would require matching 
osisRefs' workIDs with actual modules that use the same reference 
system. It's trivial if we use workID were required to acutally match 
the module ID. We could also somehow track OSIS workID/module name 
correspondences through a registry. Sounds like a good project for us 
all to assign to Troy next time he tries to say Sword has basically all 
the features it needs. :D

--Chris