[sword-devel] OSIS 2.0.1 modules available

Michael Paul Johnson sword-devel@crosswire.org
Thu, 05 Feb 2004 17:02:44 +1000


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 00:22 05-02-04, Chris Little wrote:
>Michael Paul Johnson wrote:
>> American Standard Version http://eBible.org/asv/asvosis.zip
>> God's Living Word http://eBible.org/glw/glwosis.zip
>> Hebrew Names Version http://eBible.org/hnv/hnvosis.zip
>> King James Version http://eBible.org/kjv/kjvosis.zip
>> Melanesian Pidgin http://eBible.org/pdg/TokPisinOSIS.zip
>> World English Bible http://eBible.org/web/webosis.zip
>
>Looks good.  I saw just a few issues that need some correction.  The 
>most important is that <verse> eID's need a value matching the 
>preceding 
>sID on another <verse> element.  I think this is the only issue that 
>actually violates the spec.

Oops! Sorry about that. I have corrected the error in my source code 
that did that, and will be uploading updates when I can. (I'm trying 
not to be envious of broadband Internet connections available all over 
the USA & other more developed nations.)

Of course, this does bring up a question. Should overlapping verses 
ever be allowed? I would hope not, but the syntax would seem to allow 
it. Perhaps something should be said in the documentation about that. 
Actually, the content of sID and eID markers on verse elements are 
entirely redundant (assuming you don't overlap verses), but someone 
might actually look at them, so I would rather have them be useful. My 
intention was to make them the same as the osisID of the first verse 
of the verse bridge set (which is the only verse in the case of most 
normal verses), as you suggested.

>Aside from that:
>The book <div> elements should have an osisID attribute where you 
>used 
>scope.

I'll add an osisID attribute to those and leave the scope. Redundancy 
is obviously not a problem in OSIS. I rather think it is regarded as a 
virtue. <grin>

>The code for English is "en".  You can use "ENG" in the <language 
>type="SIL"> element, however.  (This isn't yet clear from the manual, 
>of 
>course, but I expect the final version of the manual will cover this 
>area adequately.)

I did use "en" for English texts in <osisText osisIDWork="WEB" 
osisRefWork="Bible" xml:lang="en">,  but since I am most interested in 
minority languages without two-letter codes, I'd prefer to stick with 
the SIL Ethnologue codes wherever practical. For now, "ENG" is good in 
the language element. The type is supplied, so it is not ambiguous. If 
I nudge people towards supporting Ethnologue language codes, that 
would be a good thing.

>Various other issues, like the format of the <identifier 
>type="OSIS">, 
>are in flux, and will probably be defined in OSIS 2.1 or the final 
>manual.  (My current best guess at the value 
>"Bible.en.Rainbow_Ministries.WEB.2004-01-22".)

Actually, that should be "Rainbow_Missions" instead of 
"Rainbow_Ministries" for the publisher name. That is easy to adjust, 
as it is just a constant in the GBF -> OSIS converter code.

>> If you care to alter the <q> marker and quote marks to strictly 
>> comply 
>> with the OSIS 2 documentation, then you face the following 
>> difficulties:
>> 
>> 1. You MUST provide additional information outside of the OSIS 
>> standard to the users of OSIS text that allows the punctuation to 
>> be 
>> EXACTLY recreated as in the original text. The rules of this 
>> recreation and the exact markers used are different for different 
>> languages, different dialects, and even for different translations 
>> within the same dialect. They aren't even the same for all of the 
>> texts above. If you use the <q> marks in the KJV to generate red 
>> text, 
>> that is OK, but if you generate quotation marks, you are changing 
>> the 
>> text. The KJV has no quotation marks, nor does the ASV.
>
>I was sympathetic with this position, since it really does make 
>conversion from other formats easier, but using <q> is undeniably 
>better.

I still deny that it is better. I remain unconvinced that use of <q> 
to generate punctuation should be mandatory. Maybe I just don't like 
computer geeks telling linguists & Bible translators what to do. Maybe 
I have some valid reasons that you should consider.

I do concede that it is good to allow <q> to be used to generate 
quotation marks where it makes sense -- and in some places it makes 
lots of sense. I still disagree that it should be mandatory. I might 
want to use this feature if I were drafting an entirely new 
translation in OSIS (or something that converted more directly to 
OSIS, which is more likely), and if I had software in hand to insert 
the quotation marks the way they should go for this language and 
style. I still think that once that insertion was done, I would prefer 
to distribute the resulting text with quotation marks already 
generated, and <q> tags, if present, serving only to indicate who the 
speaker was. That way OSIS readers don't have to know all language & 
style rules pertaining to punctuation for every language (not likely 
to happen, really), and OSIS doesn't have to be extended to specify 
all of these rules.

>It is true that different language, dialects, and translations use 
>different standards of placing quotation marks.  However, there are 
>also 
>plenty of instances when the SAME translation demonstrates different 
>standards of placing quotation marks, depending on locale, 
>paragraphing, 
>and contemporary standards.  This is part of why OSIS requires 
>marking 
>with <q> rather than typographic quotation marks.

This is NOT a benefit. Rather it is a serious defect in OSIS. The 
reason it is a defect is that there is no way to unambiguously specify 
how quotation marks must be generated for each language, and if 
variants are allowed, then how. As a Bible publisher, I don't like 
this. As a Bible translator, this bothers me. It takes control of the 
punctuation away from the translators and publishers. It provides more 
opportunities to make mistakes. Making reliable software is hard 
enough.

As a computer geek, I think it is cool that I could change the way 
quotation marks are rendered. I could render the NIV in verse list 
format with quotation mark reminders starting at every verse like the 
NASB, or render the NASB like the NIV. I could force English 
punctuation rules on Spanish or Italian, or vice versa. When 
extracting a scripture passage from the middle of a quotation, 
punctuation could be adjusted to fit the quotation (i. e. putting 
quotation marks around one of the Beatitudes when that is all you 
quote). The first option is great if you ARE the publisher. If you 
aren't, then I have a gut feeling that it is a good way to further 
alienate IBS & Zondervan from us. (They consider the poetry & prose 
formatting a translational issue not to be mucked with by the computer 
geeks.) The second option is totally without practical merit, and is 
really a disadvantage. The third option may be useful, but it would be 
a problem if you extracted adjacent sections of Scripture, then 
concatenated them.

The bottom line is that until I am convinced that proper punctuation 
will ALWAYS be reconstituted by OSIS-compliant software, and that OSIS 
itself provides enough information to do that for EVERY language, 
dialect, and style variant, I will not support this feature of OSIS as 
a mandatory item, nor will I recommend that anyone else does that. If 
you want to make it optional, and if you allow me to tag who is making 
a quotation without generating punctuation, then I would be happy with 
that.

> (Another benefit is 
>the potential for more richly tagged text, with speaker information.)

This can be a benefit, when it is done. It can also be a royal pain to 
provide, and it isn't worth the effort of doing so for every 
translation. I suppose that for translations that are close enough to 
each other (i. e. based on the same source text and not too loosely 
paraphrased), you could use a clever program to transfer the speaker 
tags from one translation to another automatically. Better yet, maybe 
you could just do that as a separate database, and merge the 
information on demand in the display engine (i. e. in Sword). That 
would be better, and wouldn't require everyone to tag their Bibles 
that way. 

>> 2. If you scan a new Bible text that has correct quotation marks, 
>> you 
>> probably won't be able to fully automate conversion from those 
>> marks 
>> to <q> markup.
>> 
>> 3. If you fail in doing 1 or 2, above, you may be in violation of 
>> copyright, trademark, and/or common law. Worse yet, you shift 
>> responsibility before God from the translators to yourself for the 
>> accurate transmission of His Word.
>
>Copyright, trademark, common law, aren't involved, though contract 
>law 
>might be (depending on your contract).

I beg to differ. Copyright and contract law are combined with the GLW 
text, in that the text is copyrighted, but you have permission to do 
pretty much anything reasonable with it for free PROVIDED THAT you 
don't alter the text. Period. If you change the punctuation, you have 
altered the text, and therefore have no permission to make copies 
(beyond whatever "fair use" rights you might have, which are pretty 
limited these days).

With the WEB & HNV, the text is in the Public Domain, but if you use 
the trademarked names, then you are bound to not alter the text as a 
condition of using the trademarked names. Otherwise, you have to call 
it something different. Again, this is a combination of trademark & 
contract law.

In reality, I'm not very likely to sue anyone for screwing up the 
quotation marks in the GLW text, but I do have the legal right to do 
so.

>  Suggesting that you will somehow 
>have "responsibility before God" (unless you're intentionally 
>rendering 
>incorrectly) would be pretty ridiculous and implies that every 
>typesetter or translator who ever made a mistake while working on a 
>Bible (probably all of them) will be held responsible for those acts.

It would be foolish to not be careful in dealing with God's Word, 
don't you think? No, I don't think God will strike everyone dead who 
makes an honest mistake, but I don't want to be one who intentionally 
mis-handles God's Word or takes it lightly. On the other hand, the 
original Greek and Hebrew manuscripts had no quotation marks. We only 
put them in translations because the target languages require them. 
They are derived entirely from the context. In a few cases (especially 
in the Prophets), it is a judgement call as to where exactly the 
quotation marks should go. Therefore, I'm not going to make a holy war 
of this issue. Let your Holy-Spirit-sanctified conscience be your 
guide.

>> The OSIS spec should be changed to allow separation of quotation 
>> mark 
>> generation markers from words of Jesus markers.
>
>We probably won't ever see that, precisely because there already 
>exists 
>a way to express this.

Sure-- the way I did it. Just change the documentation to say that is 
OK. Alternatively, you could redefine <q> to always generate 
punctuation and <speech> to never generate punctuation, but allow 
either to specify who is speaking or writing. Both are milestoneable 
markers used for approximately the same thing, right now.

>There also probably won't ever be anything akin to a note start 
>anchor, 
>since it can already be expressed.  The first verse of the WEB reads:
>
><verse sID="Gen.1.1" osisID="Gen.1.1" />In the beginning <milestone 
>type="x-noteStartAnchor" />God<note type="translation">After “God,” 
>the 
>Hebrew has the two letters “Aleph Tav” (the first and last letters of 
>the Hebrew alphabet) as a grammatical marker.</note> created the 
>heavens 
>and the earth.<verse eID="Gen.1.1" />
>
>and could instead be encoded with a <catchWord> to indicate the 
>annotant 
>of the <note>: ...
>or with an osisRef with a grain, to explicitly define the range of 
>the 
>annotant: ...

Those approaches could work. They are quite contorted to my way of 
thinking, but you could spend many man-months making it work in OSIS 
generation, conversion, and display for HTML. Even for print, some 
printed Bibles use footnote start & end markers. A start marker would 
be MUCH easier to convert to HTML hyperlinks, don't you think? I'll 
probably never support those methods you suggest. I even cut corners 
in that I made no distinction between kinds of notes, because I don't 
distinguish between them in the source text format (GBF). Maybe if I 
ever used OSIS for a native Bible text format to start editing in, and 
if good quality conversions to HTML and other formats already existed, 
I might. Of course, only a hard-core computer geek would manually edit 
OSIS Scripture texts (i. e. for a new translation) with nothing but a 
text editor, so I'll wait to see if anyone generates a Scripture 
editor that generates OSIS text that is easier to use than the current 
alternatives. 

Don't get me wrong. I almost like OSIS. <grin> I love the idea of a 
good Scripture interchange format standard. OSIS seems to have more 
support than XSEM, and it is XML, unlike USFM, GBF, or the old STEP 
format. If I were starting from scratch, I would do some things 
differently, but at this point, I'd rather ride on your octagonal 
wheel than reinvent a round one. <grin> If I seem to whine a bit about 
it, I'm just trying to get you to round off some of the corners so 
that my passengers and I can have a smoother ride.

Take it for what it is worth...

... I'll let you know when I have the (almost) OSIS texts updated & 
posted.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (MingW32)
Comment: http://eBible.org/mpj/gpg.htm

iD8DBQFAIep0RI/gxxfXR7sRAjhKAKDz8OSB3LtSn85dup7i7L3ye7g45ACfZvfO
QHkWqWHkSpSYZsb43+Gf3iA=
=KhyB
-----END PGP SIGNATURE-----