[sword-devel] [osis-editors] Re: The death of OSIS?

Kahunapule Michael P. Johnson Kahunapule at mpj.cx
Thu Aug 12 18:47:14 MST 2004

Thank you, Steven DeRose, for answering my query and clearing up some things for me. I now understand that you are so convinced that you are right and I am wrong, that it is unlikely that you will make the very minor changes to the OSIS specification that I ask quickly, if ever... but then you do seem to offer a little hope in your comments to my constructive suggestions. I'm kind of sad about your response, over all, because I had hoped that OSIS would become a unifying Bible translation encoding standard with much wider use. I had also hoped that I could benefit from using this standard directly in a few applications. It is now clear that I cannot, and that I can't be certain that you will make it acceptable for wider use.

I'm very passionate about preserving every jot and tittle of a Bible translation text, including quotation punctuation. If you are willing to accommodate my desires in OSIS, then I would like to support OSIS, in spite of its many other shortcomings (most of which are very minor), because it is nice to have one widely supported XML Bible encoding standard. If not, that is your choice.

Your strong opinion and explanations that you think quotation punctuation is not part of the actual Bible text notwithstanding, I have no use for a standard that does not allow me to treat quotation punctuation as part of the actual Bible translation text. I don't think that any of the organizations that I work with do, either, except in some less common cases. As far as I can tell, you lose nothing in the way of any desirable capability by allowing OSIS to be used this way. However, if you are right and I am wrong, then you should see your support gain momentum as you alienate at least one of your former supporters. Your standard won't rise or fall on my support alone, anyway, regardless of all of the things I am doing in this area. Please don't blame me if OSIS doesn't catch on like you want it to.

Specific answers to your questions are below:

At 02:45 13-08-04, Steven J. DeRose wrote:
>It may well be that we all made mistakes in the design of quotation 
>handling in OSIS, but I assure you we considered a much wider range 
>of cases than the English NIV or English. Some of us are of US 
>origin, but even so I don't think we have any monolinguals among us.

I believe you. I also understand your position very well, I think. I just disagree that you should disallow handling quotation punctuation as part of the Bible translation text for anyone who wants to use OSIS.

>There is a real tradeoff here -- are quotation marks conventional 
>ways of marking a discourse phenomenon (let's call it "quotation" to 
>keep things simple), or are they part of "the text"? That is not so 
>straightforward as it seems to me you are suggesting. There were no 
>quotation marks in the original texts of the Bible, so all the 
>quotation marks are products of someone's interpretation.

I'll grant you that. However, the quotation marks ARE part of Bible translations into languages that use quotation marks. You believe you have the right to interpret, move, replace, and edit them as you see fit, without the Bible translators' consent. While those particular marks were not part of the original language, the meaning of where they go was, and they are required in many target languages to not lose meaning that was in the original languages.

>Nevertheless, we all agree that OSIS markup has to provide enough 
>information to get the formatted result that one wants.

I don't agree with that statement. It is obviously false, if I am included in "we", when it comes to quotation mark handling and the formatted results that I want.

>Actually, let me clarify that a little: widow and orphan management 
>is an important part of high-quality formatting: certainly part of 
>"the formatted result that one wants." But surely it shouldn't be 
>part of what OSIS encodes. This may seem obvious or trivial, but I 
>have heard people criticize OSIS for just this: they look at a 
>printed Bible someone produced from OSIS source using some formatting 
>tool that doesn't do widowing well, and say "OSIS can't produce a 
>good Bible" -- we must always keep in mind that there are at least 
>two separate parts involved here: the markup and the engine that 
>processes it.

Widow and orphan management is a totally separate, unrelated issue. It is a layout and formatting issue and totally different than the usual Bible markup issues. Punctuation (except for some hyphens) is a part of the text.

>This is they key point, isn't it?  "will the application reading the 
>OSIS file add quotation marks?" is not a question that can be 
>answered. Which application?

Any application that properly implements your standard.

> Reasonable software for formatting XML 
>should do what your style sheets say it should do. Perhaps not all 
>software is reasonable, but even most CSS implementations give you 
>that much control.

Your answer is totally inadequate and unacceptable. OSIS is dead, as far as I'm concerned, if I have to use a style sheet to preserve punctuation that is part of the Bible translation text.

>Clearly the KJV and the NIV have different styles for quotations. The 
>style sheets you would use to generate printed versions of them 
>therefore would differ. They might be completely separate, or just 
>differ in a few things, or a very clever stylesheet might even check 
>what version it's formatting (by looking at the header) and do the 
>appropriate thing for any version it knows about, and a default thing 

You are getting abstract. I am dealing with the concrete. I want to write code today to do Scripture encoding. If it isn't real today, then OSIS isn't usable today. Show me the style sheets, now, complete with their documentation and schemas if you want me to pay any attention to this kind of argument. Of course, you can't...

>By not enshrining punctuation in  the text itself, a wider range of 
>options are available to the translators, publishers, and other 
>concerned parties.

This is where we disagree: you think this is an advantage. You think this is a feature. I think this exact thing is a fatal flaw and a defect that should be exterminated as quickly as possible, because you leave the only option I want out. The option I want is to preserve quotation punctuation as a part of the Bible translation text, exactly as the translators published them originally. That is all I want that I don't see in OSIS. I don't want to change the style and rules used for quotation punctuation, or regenerate the punctuation for different language rules. If you do, fine, but please make that a separate process that doesn't deny me the right to preserve every jot and tittle.

> For example, if I were printing an NIV in France 
>for some reason, I might want to use the French chevron-like 
>quotation marks (sorry, I forget the name for them just now).

Why, pray tell? Even if that were a reasonable thing to do, and even if you had IBS and Zondervan permission to modify their copyrighted work that way (which I doubt they would grant), why should this edit be a function of the original markup? Isn't that more appropriate to leave to a simple search-and-replace operation? Why burden me with requirements based on hypothetical situations that are unlikely? Wouldn't a more likely adaptation of an American English Bible translation for publication in France be to convert it to British spelling? Should the markup do that? Should all instances of "neighbor" be tagged with an alternate British spelling (neighbour)? I don't think so, but that would make more sense to me than changing the quotation punctuation. No, IBS considers that transformation to be a translation issue, and they want to issue their own British edition, thank you very much.

> No 
>problem: tweak the stylesheet. You don't have to even touch the touch 
>the text itself -- thus the risk of accidentally messing it up is 

I disagree with this statement, because I regard quotation punctuation as part of the text of the Bible translation. No matter how many times you say otherwise, I'll still think that.

>Also, these source files will be processed by many things other than 
>formatters. Consider blind users with voice-generation interfaces: 
>they won't get quotation marks at all -- but if the system knows 
>there is a quote starting, it should be able to signal that to them. 
>One system might just say "quote" in whatever the user's language is; 
>a better system might generate voice inflections or suprasegmentals 
>of some sort to communicate the same thing. Second, consider a search 
>engine: it shouldn't have to search for a different pattern of 
>specific characters to locate quotes in every language it encounters 
>(especially when some patterns are ambiguous).

A voice reader for the blind could just as easily notice the quotation punctuation and do something special with it. Alternatively, you could leave the punctuation in place in the text where it belongs, and tag all quotes with an included "who" attribute, and generate a different voice for each. None of that requires denying the capability to preserve quotation punctuation to the user of OSIS.

>So, it seems to me we definitely need to have markup in there for 
>quotes -- the question then is whether OSIS quote markup provides 
>sufficient information to drive a formatter, and if not, what to do 
>about it.

I agree that OSIS should have markup for quotes. I STRONGLY disagree that the quote markup should REPLACE quotes. If a Bible translation publisher wants to have quotes always generated automatically, let him. However, if a Bible translator wants the quotation punctuation treated as a part of the text, why force him not to use OSIS? Do you want OSIS to be accepted and used or not?

>>The other problem with controlling quotation punctuation with OSIS 
>>and always using markup (i. e. q or speech elements) is that there 
>>are not just start and end locations. There are also open quote 
>>reminder locations. This gets confusing. Can I specify that a 
>>quotation starts at a given location with one character, continues 
>>at a paragraph boundary with a different character, then ends with 
>>still another character? Would it be OK to use a duplicated sID in a 
>>q milestone element to indicate that this is a part of the same 
>>quotation, but more punctuation is needed here?
>Absolutely agreed. We discussed this at length (Patrick, can we add a 
>section with some examples for this in the doc, if we haven't yet?). 
>Typically, the placement of quotation reminders is determined by some 
>fairly simple rule, that may differ by language, writing system, 
>culture, and genre (and probably other factors too). Your example of 
>a paragraph boundary is a very common case. In such a case, the 
>stylesheet rule for paragraph simply checks whether a quotation is 
>open, and if so, issues the appropriate punctuation.

Unfortunately, with any natural language, there are exceptions to these simple rules and more complex cases. There are also cases where the rules are simply not applicable, and it makes sense to the readers of the language, but defies program logic.

>This is a valuable approach, because there might well be two 
>different groups that share a translation, but live in different 
>areas and have become accustomed to different quotation style rules. 
>For example, a language group from a war-torn country where many have 
>emigrated, and ended up in different countries. If you put the 
>literal quote characters in the text for one group, you have to go 
>and fix it all manually for the other group. If instead you mark the 
>quotes via markup and have a stylesheet generate the correct 
>characters for display, then you just change that stylesheet, getting 
>a uniform change with much less effort.

This is possible, but I have never encountered this situation as an actual need. It is a conjecture. In case it comes up, you could use OPTIONAL quotation markup to deal with it.

>Does any of us know of a situation where the placement of "reminder" 
>punctuation is discretionary?

Yes. You will find such cases in complex nested quotations where shifts between prose and poetry occur within the quotation in Bible text. You can also find really strange cases in theological texts where various Bibles are quoted.

>In my opinion (and that of my OSIS validation code), it would be 
>incorrect to use a duplicate sID for this case as the OSIS schema 
>stands right now. It could be that there is need to explicitly mark 
>paragraph boundaries inside quotes, rather than letting the style 
>sheet do the right thing. If you believe so, can you explain it to me 
>in more detail? I'm not quite understanding your point here, and I 
>very much want to.

Without giving you specific examples, let me just say that I have seen cases that are unusual enough that you can't authoritatively decide what is proper in the language with any reasonable programmatic logic. Let the translators, not the programmers (who probably don't know the languages in question), decide the placement. Who is really responsible for the text of the translation, anyway?

>*If* there turns out to be such need, then I see a few simple solutions:

There is such a need. If you don't want to meet this need, then so be it. Someone else will.

>a) Allow additional milestones with the same sID (or possibly eID, 
>but I like your sID notion better)
>b) Create a new empty element for the purpose, say <q-continued> or similar
>c) Reserve a 'type' attribute value somewhere to distinguish this case.
>If there really is need, you can simulate solution b or c right now 
>in OSIS by using a regular milestone and assigning it a special type 
>for this purpose. People (namely, the people writing stylesheets for 
>you or doing typesetting) might complain unless you could show why it 
>is in fact needed -- but if it really is, then it is.

That solution is ugly, but workable. A more elegant solution is to simply put the quotation punctuation in the text AND put in markup to indicate the extent of the quotations-- markup that is not intended to cause the reader to try to figure out where the punctuation should be and corrupt the text by adding extra punctuation that isn't needed. Ugly is OK. It doesn't help acceptance of a new standard, but it won't necessarily kill it, either.

>>In short, I consider the placement of quotation punctuation and the 
>>selection of characters to be used for quotation punctuation to be a 
>>part of the Bible translation text itself, and if any encoding, like 
>>OSIS, cannot guarantee that these characters are maintained in their 
>>original locations, then that encoding is defective.
>Wow. That's interesting. Let me see if I understand it right: So if I 
>published an NIV in France (or better, a Francophone country with an 
>English-speaking minority population that wants the NIV), and if I 
>used chevrons for quotation marks, you would say it's a different 
>*translation*, not just a different printing or edition or layout? I 
>must admit I have a hard time accepting that.

Yes, I am saying that. If I'm not mistaken, the NIV Committee on Bible Translation would agree with me, too. They would at least think that they had a say in if that should be done or not. Have you asked them?

>As for guaranteeing, no encoding can guarantee the result of applying 
>software to it. For all the encoding knows, the formatter you're 
>using simply throws out all punctuation marks, or even all the text. 
>It seems to me that that doesn't make all encodings defective. There 
>must be some more limited claim you're trying to get at here, but I 
>don't see clearly what it is. Help, please?

No encoding that provides enough information to guarantee that it is possible to preserve the encoded information is adequate. You can guarantee lossless encoding if the encoding and documentation, if properly followed, will result in proper decoding.

I'm not talking about guaranteeing against bad software implementations. I'm talking about a standard that always works when properly implemented.

>It seems to me that the *fact* of something being a quotation is 
>clearly part of the translation text, but that the punctuation marks 
>(or whatever) used to communicate that are part of the formatting, 
>just like the choice of font.

I strongly disagree with your view of the world, here. If you can't at least accept that your view is not the only valid view, then I can't reasonably accept that you will ever make OSIS acceptable to me.

>Can you explain this further for me if it's central to your point? 
>But it seems to me this is not central -- you just want the quotes 
>right, right?

I want a lot of things, but the single issue that makes OSIS in its current form totally unacceptable is that you don't regard quotation punctuation as part of the Bible translation text, and you have documented the standard to preclude the view that it is. Period.

There are other smaller problems with OSIS, but I can live with them. I have no reason to accept a lossy encoding of punctuation, however.

>>Do you see the problem?
>I don't think so. Please explain further.

If you don't see it by now, then you probably don't want to see it. I think that you see what I think is a problem, but disagree that it is. Further explanations are probably futile.

My question now is if the OSIS committee will allow quotation punctuation to be treated as part of the Bible text or not. Technically, it is a very easy thing to change to accommodate my point of view while still allowing all of the benefits of the quotation markup to be used. You don't have to do what I ask. Of course, I don't have to use "standard" OSIS, either. :-)

>>Now, let me suggest at least two possible solutions that are easy to 
>>incorporate into the OSIS standard. First, let me explicitly state 
>>what I'm trying to accomplish:
>>1. Preserve the current OPTION in OSIS to generate quotation 
>>punctuation with markup.
>>2. Preserve the OPTION in OSIS to mark quotations by speaker for 
>>specialized searches or, in the case of Jesus' direct quotes, to 
>>color or present them in some different way.
>>3. Add the OPTION to control quotation punctuation precisely for 
>>languages and styles that differ from the "usual" in the type and 
>>placement locations of quotation punctuation.
>>Suggested solution number 1 (recommended):
>>Document that any <q> or <speech> element marked with an attribute 
>>of n=" " (a blank space) should not be taken as an instruction to 
>>insert any quotation mark. Rather, in this case, it should be 
>>assumed that the correct punctuation is already in the text as a 
>>Unicode character (just like other kinds of punctuation). <q> or 
>><speech> elements not so marked would be taken as an instruction to 
>>insert quotation punctuation in the manner that the NIV English 
>>Bible does, including open quote reminders, and alternating double 
>>and single typographic quotes for nested quotes.
>I rather like the idea I perceive here -- some signal that the 
>punctuation is already in the text. The stylesheet could use this in 
>a nicely general way. I don't think it belongs on the 'n' attribute, 
>but that's a minor detail.

There is hope!
I don't care if it on the 'n' attribute; just that it exists.

>Is there a case, though, where a stylesheet couldn't be reasonably 
>expected to generate all the right quotation marks?

Yes, there are many cases. Indeed, you can't even do it right with the World English Bible, I suspect.

> If a language 
>required a different quotation mark depending on the voicing of the 
>following consonant, or (worse) the gender of the next noun, that 
>would be beyond typical stylesheet mechanisms to do. I don't know of 
>any languages where punctuation choice depends on linguistic 
>phenomena that aren't already represented by other markup or layout 
>(like paragraph breaks). If there are, then we have a clear problem 
>to deal with. But given the historical development of writing 
>systems, that seems to me really unlikely. Anybody know an exception?

More to the point, who wants to volunteer to maintain the resulting style sheets? Not I!

I look forward to your response.

Please do what you honestly believe is the best and most pleasing to God.

More information about the sword-devel mailing list