<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=windows-1252"

 http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

DM Smith wrote:

<blockquote cite="mid:463622D0.5010905@yahoo.com" type="cite">

  <pre wrap="">Kahunapule Michael Johnson wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">How does the Sword project handle display of OSIS text quotations when:

1. the &lt;q&gt; or &lt;speech&gt; element is used without a marker attribute,

    </pre>

  </blockquote>

  <pre wrap=""><!---->The speech element is not handled, except to process its content. It is 

as if the element were not in the text at all. I think the speech 

element is to indicate the speaker, not that what's said is a quote. I 

won't mention the element &lt;speech&gt; below.

  </pre>

</blockquote>

OK. I have no need to generate the &lt;speech&gt; element, as there is

no USFM equivalent, so I'll ignore it, too. :-)<br>

<blockquote cite="mid:463622D0.5010905@yahoo.com" type="cite">

  <pre wrap="">Assuming that the module's conf does not have osisQToTick=false (i.e. it 

defaults to true when not present), then the level attribute determines 

the quotation mark that will be used, alternating double quote and then 

single quote. If no level attribute is present, then it uses a double quote.

It will use the same mark when it gets to &lt;/q&gt;.

  </pre>

</blockquote>

In that case, would open quote reminders be inserted at paragraph and

stanza beginnings automatically, or would that require a cQuote

milestone to make happen? (I'm just curious. Normally, I'm interested

in just making sure this doesn't happen, since the quotation

punctuation is already fully specified, and it may not conform to

current English usage. However, in the hypothetical case where someone

wanted this to happen, I'm curious how it would be done.)<br>

<blockquote cite="mid:463622D0.5010905@yahoo.com" type="cite">

  <pre wrap="">The same holds true when milestoned versions of &lt;q&gt; are used, except 

that &lt;q eID="xxx"/&gt; elements will not cause the code to look at the 

opening &lt;q sID="xxx"/&gt; for a marker attribute. Instead, it will use the 

marker attribute, or it's lack to determine what to output.

  </pre>

</blockquote>

So in the milestone elements, markers may vary. That is actually good,

since sometimes quotes are introduced with an em dash and close with a

newline, or some other asymmetrical case.<br>

<blockquote cite="mid:463622D0.5010905@yahoo.com" type="cite">

  <pre wrap="">

However, if osisQToTick=false, no quotation mark is used.

  </pre>

</blockquote>

So osisQToTick=false is essentially equivalent to putting a marker=""

attribute on all &lt;q&gt; elements?<br>

<blockquote cite="mid:463622D0.5010905@yahoo.com" type="cite">

  <blockquote type="cite">

    <pre wrap="">2. the &lt;q&gt; or &lt;speech&gt; element is used with a marker attribute,

    </pre>

  </blockquote>

  <pre wrap=""><!---->

When the marker attribute is present, it is used.

  </pre>

</blockquote>

Good. :-)<br>

<blockquote cite="mid:463622D0.5010905@yahoo.com" type="cite">

  <blockquote type="cite">

    <pre wrap="">3. no &lt;q&gt; or &lt;speech&gt; elements appear, or

    </pre>

  </blockquote>

<pre wrap="">Then as far as sword is concerned then it is not in a quote.

  </pre>

</blockquote>

OK... what, exactly, does that mean? Does that make a difference for

anything besides the option of rendering Words of Jesus in red (or some

other alternate color) for display? Normally, the point of knowing if

something is in a quote or not is to display the quotation marks

correctly, but if there are no quotation marks to display (or they are

already in the text in whatever way is appropriate for that language),

thenа Sword doesn't actually need to "know" when something is a quote

or not, does it? Or is there some search feature or function that I'm

not aware of that would use such knowledge?<br>

<blockquote cite="mid:463622D0.5010905@yahoo.com" type="cite">

  <blockquote type="cite">

    <pre wrap="">4. quotation punctuation (У, С, Т, Ф, л, ╗, Ч, newline, etc.) appears

outside of &lt;q&gt; or &lt;speech&gt; elements (i. e., not in a marker attribute)?

    </pre>

  </blockquote>

<pre wrap="">Any punctuation in the text is produced as is.

  </pre>

</blockquote>

This is good. Very good. :-)<br>

<blockquote cite="mid:463622D0.5010905@yahoo.com" type="cite">

  <pre wrap="">Another feature of OSIS is &lt;milestone type="cQuote" marker="xxxx"/&gt;

This is used for a continuation quote. (substitute xxxx with the 

appropriate quote mark)

  </pre>

</blockquote>

This is good to know. I regard this (or something like it) as an

essential feature if all quotation marks are going to be put in markup.<br>

<blockquote cite="mid:463622D0.5010905@yahoo.com" type="cite">

  <pre wrap="">Words of Christ (WoC) can be indicated by adding who="Jesus" to the &lt;q&gt; 

container element or to both the milestone elements. In the KJV, ESV and upcoming NASB modules, the WoC are marked on a per 

verse basis, using the container form of &lt;q&gt;, with marker="".

  </pre>

</blockquote>

This is an interesting concept-- and one that is helpful to me. You

see, I thought that marking WoC per verse was bad OSIS the way I read

the documentation, but it sure makes conversion from USFM (which

actually demands that sort of markup) easier (because I don't have to

discard adjacent end + start pairs with no actual text in between, just

a verse marker), and it also makes display easier on a verse-by-verse

basis (like Sword does) easier if you are working from raw OSIS. The

same technique would be useful for translating the USFM \qt ...\qt*

markup (which is marked verse-by-verse to indicate OT quotes in the NT)

to &lt;q marker="" who="OT" sID="somethingunique"&gt;...&lt;q marker=""

who="OT" eID="somethingunique"&gt;. If you regard this as acceptable,

then I'll just embrace it quickly before anyone objects. :-)<br>

<br>

OSIS is very flexible, and there seem to be many reasonable ways to

interpret how Scriptures should be encoded. At this point, there are so

many ideas out there, I would like to just start with one goal:

encoding OSIS texts from USFM in such a way that Sword displays them

properly. If that works, then there is a good chance the resulting OSIS

will be of use to others, as well.<br>

<br>

Would it be too weird to separate q elements intended for replacing

punctuation (with marker specified) from those used for what is

essentially a character style (i. e. WoC)? Like &lt;q marker="У"

sID="aoeu"/&gt;&lt;q marker="" sID="qjkx" who="Jesus"/&gt; (actual

quotation) &lt;q marker="" eID=qjkx" who=Jesus/&gt;&lt;q marker="Ф"

eID="aoeu"/&gt;, where the actual quotation may span several verses,

and the inside set of markers may be ended and restarted with each

verse?<br>

<blockquote cite="mid:463622D0.5010905@yahoo.com" type="cite">

  <blockquote type="cite">

    <pre wrap="">I want to (1) ensure that Bible texts are displayed correctly, and (2)

minimize the amount of manual labor necessary to make #1 happen.

It should not be necessary to do any manual editing of Bible source

texts in well-formed Unicode USFM to create a valid Sword module. (USFM

or something close to it is the format in which a very large number of

minority-language Bibles exist.) In USFM, quotation punctuation, if any,

is in the text of the document, with no special markup. In an informal

extension to USFM, sometimes &lt;&lt; is used for У, &lt; for С, etc. (A space is

required to disambiguate УС and СУ.) Speaking of ambiguity, apostrophe,

closing single quote, and (in some languages) glottal stop all use the

same character. This ambiguity, coupled with language and style

considerations, seems to be a serious problem in automatically

converting from either GBF or USFM to OSIS, in general.

    </pre>

  </blockquote>

  <pre wrap=""><!---->I have recently written a quote recognizer in C++. I did find that an 

apostrophe is potentially ambiguous, but in the source I was working, it 

was not an issue.

Fortunately, my input use ` for a single quote start and ' for an end 

quote. This made disambiguation significantly easier.

If you wish, I can send you the routine.

  </pre>

</blockquote>

I already have some LGPL C# code that does a reasonably accurate job of

recognizing quotation marks in English text that I use for checking

quotation-mark balancing. It doesn't work very well for other

languages, because it uses some English-specific rules to disambiguate

apostrophes and closing single quotes, and doesn't even handle the case

where the same marker is used for glottal stop. (The latter is bad

practice in Unicode, but some people do it anyway.) Does your quote

recognizer work for non-English Bibles with different writing systems

and different punctuation rules?<br>

<blockquote cite="mid:463622D0.5010905@yahoo.com" type="cite">

  <blockquote type="cite">

    <pre wrap="">I'm wondering if I should target OSIS or GBF as a target format for a

converter I'm writing, and also working on updating the dialect of OSIS

that the World English Bible and HNV are distributed in. While I'm not

in favor of dropping support for GBF, yet, I'm not very thrilled about

the idea of putting any new work into supporting it, either. However, if

I can't make an OSIS module without a lot of manual labor, any

reasonable alternative is worth considering.

    </pre>

  </blockquote>

  <pre wrap=""><!---->Remembering your earlier posts about OSIS's lack of quotation support, I 

think I can now say that it provides you the level of control that you 

wish. Having done three modules myself, I think that OSIS 2.1.1 is 

sufficient for Bible texts.

So, I'd suggest OSIS.

  </pre>

</blockquote>

Indeed, it looks like I have at least two ways to get the level of

quotation support I want: (1) always put quotation punctuation in

marker attributes of q elements or cQuote milestone elements and

specify empty marker elements when using q just for WoC, or (2) [pause

to don body armor and start running] always put quotation punctuation

in the text and use q elements with empty marker attributes just for

translating USFM \wj ...\wj* and \qt ...\qt* markup on a per-verse

basis. Option #1 has the major disadvantage of requiring finding all of

the quotation punctuation in text I may not be able to read, let alone

understand the grammar of, for conversion purposes. Option #2 has the

disadvantage of potentially offending certain people who have, at least

so far, held the deep religious conviction that all quotation

punctuation should live in markup, not the text of the Bible, but it

has the major advantage of the simplest, fastest conversion possible

from USFM to OSIS, with no manual labor required for each translation

(other than making sure the source text is really in Unicode USFM).

Although option #2 seems like it would work just fine, at least

functionally if not idealistically, I'm concerned that someone might

think such texts weren't pure enough OSIS, and not use them. If that is

the case, then perhaps I really would be better off going back to

GBF... or just punting on this whole converter and move on to improving

my converters to other formats for other Bible study software.<br>

<br>

In the case where the translators have made use of the &lt;&lt;, &lt;,

&gt;, &gt;&gt; quotation markup option in their SFM, which is actually

a fair number of them, I would like to convert those to the appropriate

q elements with markup specifying the normal equivalent of those

markings. I'm loathe to mess with apostrophe/ending single quote

disambiguation for non-English texts, though. I don't see any benefit

to doing so, really, but maybe I'm missing something?<br>

<br>

What do you think?<br>

<br>

Michael<br>

<br>

<br>

</body>

</html>