<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=UTF-8" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

DM Smith wrote:

<blockquote cite="mid:46373BFF.7080203@yahoo.com" type="cite">

  <pre wrap="">Kahunapule Michael Johnson wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">So... it sounds like I could simply convert USFM to OSIS with the

obvious conversions (like \p ... -&gt; &lt;p&gt;...&lt;/p&gt;) plus

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Remember to have the &lt;p&gt; surround the entire paragraph.

  </pre>

</blockquote>

Of course.<br>

<blockquote cite="mid:46373BFF.7080203@yahoo.com" type="cite">

  <pre wrap="">In various source I have seen the equivalent of \p be nothing more than 

a paragraph separator, with the ambiguity that the first verse of 

chapters does not have a \p. There may be paragraphs that don't begin or 

end on a chapter boundary.

  </pre>

</blockquote>

In USFM \p marks only the beginning of a paragraph. The end of a

paragraph is implicitly marked by the beginning of any other

paragraph-class marker (such as a title, subtitle, another paragraph,

poetry lines, a blank line, etc.) or the end of a chapter. To solve the

problem of a paragraph not actually ending at a chapter boundary, they

invented the \pc marker to indicate that a paragraph continues here.

So, the processing is a little more complex than just putting a

&lt;p&gt; at the beginning, a &lt;/p&gt; at the end, and replacing all

\p markers with &lt;/p&gt;&lt;p&gt;.<br>

<blockquote cite="mid:46373BFF.7080203@yahoo.com" type="cite">

  <pre wrap="">

osis2mod will convert the open and close tags to &lt;lb 

type="x-begin-paragraph"/&gt; and &lt;lb type="x-end-paragraph"/&gt;, 

respectively. These x- types are non-standard, but they allow a lossless 

reconstruction of the original.

  </pre>

  <blockquote type="cite">

    <pre wrap="">\qt -&gt; &lt;seg type="otPassage" sID="someid"/&gt;

\qt* -&gt; &lt;seg type="otPassage" eID="someid"/&gt;

    </pre>

  </blockquote>

  <pre wrap=""><!---->

When you use sID/eID, OSIS "requires" that they be paired and each pair 

have unique values. Sword does not care at this point in time about this.

I found having a stack for each distinct milestone usage (e.g. &lt;q&gt;, 

&lt;seg&gt;, &lt;div&gt;) it is constructive to have a stack and a counter. When a 

open element is found, its counter is pushed onto the stack and 

incremented. When an close element is found, it is popped off the stack. 

For quotes, I find the depth of the stack useful for populating the 

level attribute. If when the document is finished the stack is 

non-empty, then I have a bug somewhere.

  </pre>

</blockquote>

The contents of the sID and eID are entirely redundant in most cases,

but for the sake of the specification, I currently generate them with

the OSIS ID of the verse containing the beginning marker, concatenated

with the next value from a global counter. There may be better or worse

ways to generate the sID/eID pairs, but it seems to me to be largely

irrelevant, since the only potential constructive use for them is for

an OSIS reader to spit out an error message if an eID doesn't match its

corresponding sID. I also use stacks for saving the eID.<br>

<blockquote cite="mid:46373BFF.7080203@yahoo.com" type="cite">

  <blockquote type="cite">

    <pre wrap="">\wj -&gt; &lt;q who="Jesus" marker="" sID="someid2"/&gt;

\wj* -&gt; &lt;q who="Jesus" marker="" eID="someid2"/&gt;

    </pre>

  </blockquote>

  <pre wrap=""><!---->

With the WoC, I would ask, selfishly, that you use the container form of 

&lt;q&gt;, that is

&lt;q who="Jesus"&gt;...&lt;/q&gt;

so

\wj -&gt; &lt;q who="Jesus"&gt;

\wj* -&gt; &lt;\q&gt;

JSword cannot handle the milestoned form at this time.

  </pre>

</blockquote>

OK. This isn't too burdensome since we aren't allowing the WoC markup

to cross verse boundaries. I just won't let it cross paragraph

boundaries, either. This additional restriction pretty much locks us

into per-verse markup of WoC instead of marking, for example, the whole

Sermon on the Mount with one pair of milestones.<br>

<blockquote cite="mid:46373BFF.7080203@yahoo.com" type="cite">...

  <pre wrap="">

For a general purpose converter, that deals with apostrophes having 

meanings that differ according to the language and the text, it 

reasonable to not disambiguate them.

  </pre>

</blockquote>

Agreed.<br>

<blockquote cite="mid:46373BFF.7080203@yahoo.com" type="cite">

  <blockquote type="cite">

    <pre wrap="">Since the q elements generated from WoC (\wj) markup will never

span verses in this implementation, but the actual quote often does, it

is probably better to not combine the two resulting q elements at the

beginning and end of the quotation into one q element, because then the

start/end points wouldn't line up properly for one or the other of the

meanings of the element (quotation start/end marking vs. text coloration).

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Right. The two cannot be combined. Precisely because they are two 

different semantics, differing in markup and in meaning.

There are some instances where there is an "island" of text, a gloss, a 

parenthetical statement, by the book's author, in the WoC that does not 

force the quote to begin and end around it, but needs to be 

distinguished from it.

  </pre>

</blockquote>

Good point.<br>

<br>

</body>

</html>