[sword-devel] WEB update request; OSIS encoding problems

Kahunapule Michael P. Johnson Kahunapule at mpj.cx
Wed Aug 11 05:56:40 MST 2004

At 20:04 11-08-04, Chris Little wrote:
>Kahunapule Michael P. Johnson wrote:
>Basically, the reason I want to change quotation marks to <q> elements 
>in the Sword module is high standards.

Why is <q> a higher standard than a quotation mark? What if both were used?

>  We could just run the file 
>through osis2mod and post the result, but I would rather post what I 
>consider a higher quality text, even if it requires some work.  Besides, 
>the quotation marks are not the only issue--every instance of an x- 
>value in the WEB has a defined correct encoding in OSIS.

<milestone type="x-noteStartAnchor" /> doesn't really have an exact equivalent, although there is a computationally more difficult way to specify the same kind of thing.

If there is a "standard" encoding for <w type="x-plural">, I missed it. What is it?

What are the "standard" ways of encoding different levels of poetry lines? The World English Bible uses 2 levels (type="x-primary" and type="x-secondary"), but some translations use 3 levels, usually rendered as three levels of hanging indent. These levels cannot be reliably inferred by counting from the beginning of the line group.

Of course, the type="x-doNotGeneratePunctuation" elements could all go away if the n=" " that precedes each of them did what I wanted it to do.

>> In the mean time, perhaps you can defend OSIS' quote handling in a
>> world with over 6,000 languages, and explain to me how OSIS can be
>> used to properly encode red letter edition-capable electronic texts
>> of the following translations: KJV, NASB, NIV in a way that fully
>> conforms to OSIS 2.0, and in a way that all OSIS-conforming readers
>> will always properly render them like the printed editions. Please
>> offer me a serious answer to this request.
>The "6,000 langauges" number is misleading.  Languages do not correspond 
>1:1 with quotation styles.  The majority of minority languages, I would 
>guess, make use of either no quotation marks or quotation mark styles 
>borrowed from the language of whoever first reduced their language to 
>writing.  The real number of different styles is difficult to estimate, 
>but I would guess it is only a few hundred.

Granted, but the currently documented form of OSIS still can't handle several cases that I'm familiar with.

>But a great portion of that count is made up of multiple styles used by 
>  individual languages.  I believe it's important to maintain a record 
>of the underlying typographic representation of quotation marks, through 
>an attribute like the n attribute on <q>, but I don't believe this 
>should necessarily be used as the basis for rendering.  The reason for 
>this is that quotation styles can vary across different time periods, 
>different subgroups of a linguistic community, and different rendering 
>styles (e.g. paragraph breaking after verses vs. normal paragraphing). 
>Marking <q> elements rather than typographic quotation marks allows this 
>kind of flexibility.  I do believe we need some manner of identifying 
>quotation mark styles within the document, either on each element or in 
>the document header.

OK, so it may be desirable to render the punctuation of the same text differently than the original, essentially creating a derivative work. Copyright issues and such aside, wouldn't it still be best to be able to specify the original rendering within the OSIS document? Then, if you want to transform it to another format, say switching between paragraph-oriented (NIV) and verse-oriented (NASB) open quote reminders and text layout, you could do that, but it would be a conscious decision to change the translation punctuation, and not an accidental artifact of the encoding.

>With OSIS 2.0.1, the assumption is that you will mark <q> elements and 
>you will write stylesheets to render them correctly.

I have a problem with this assumption. The problem is this: I believe that style sheets should specify the style of layout, such as fonts used, page or screen sizes, number of columns, etc., but the markup should contain all of the information about what the text itself (including punctuation) is.

>  If you want to do 
>paragraph breaking after verses and modern English quotation marks, 
>that's one stylesheet.  If you want to do normal paragraphing and modern 
>Spanish quotation marks, that's another stylesheet.  It puts a burden on 
>the stylesheet writer that could be decreased with some of the things 
>I've mentioned, and I would guess that the OSIS working groups will 
>address them at some point.  But I doubt <q> will go away in favor of 
>allowing ".

These mythical style sheets are not defined, but even if they were, they would now have to accompany the Bible translation OSIS markup to have a complete markup. Granted, there would be fewer quotation style sheets than Bible translations, for the reasons you gave before, but still, the markup would be incomplete without this style sheet. That means, to me, that the style sheet, or at least the punctuation rendition part of it, is really part of the markup, and should be defined as such.

I haven't asked, but I suspect that the NIV committee on Bible translation would not like people to willy-nilly reformat their poetry and prose as an NASB-style list of verses, changing the punctuation to match. They probably aren't a big concern to you or me, since they probably want more money than either of us can pay to include their translation as a free download for Sword or for posting on my web site, but still, they probably aren't the only ones thinking that way.

<q> doesn't have to go away to make me happy-- it just has to be tweaked to allow me to specify EXACTLY if and what punctuation is to be generated in response to it.

Offer me a solution, please, that allows me to encode a reasonable selection of Bible translation language and punctuation styles and have them rendered, by default at least, with the same punctuation that I started with. I'd like to do that with OSIS. If I can't, then OSIS becomes a standard to compete with instead of to use and promote.

>> OSIS may well be the best XML Scripture interchange format definition
>> that is open and published, right now. What will it take to make it
>> good enough to actually be used? "Passion of the Christ XSLTs"? Or
>> maybe some responsiveness to concerns of those who might use and
>> promote it?
>Actually, only 12% of respondants believe it's the "Passion of the 
>Christ XSLTs".  And I'm not even sure what "Indexer" means. :)

Maybe those who chose those options were just curious to see what would be built if they asked for something bizarre?

My need is neither frivolous nor hypothetical. I am right now developing Scripture typesetting software that SIL IPUB in Dallas wants to eventually interface with OSIS as well as USFM. The part of the USFM interface that I have written is working pretty well, so far. OSIS, however, is not something I enjoy working with in its current form. I do have other options. I just thought I'd see if OSIS can be salvaged before I delve into some of the alternatives. OSIS is salvageable if the keepers of the standard aren't too proud to bend just a little so that OSIS works as a reversible encoding with respect to punctuation marks for all of the cases that I'm interested in, and if they do so in a timely fashion. If not, there are some good alternatives.

More information about the sword-devel mailing list