[sword-devel] osis2mod import issue

DM Smith dmsmith at crosswire.org
Fri Jun 5 05:16:49 MST 2009


On Jun 5, 2009, at 5:14 AM, Mattias Põldaru wrote:

> Ühel kenal päeval, N, 2009-06-04 kell 12:53, kirjutas DM Smith:
>> Mattias Põldaru wrote:
>>> Ühel kenal päeval, K, 2009-06-03 kell 19:25, kirjutas DM Smith:
>>>
>>>> On Jun 3, 2009, at 1:36 PM, Mattias Põldaru wrote:
>>>>
>>>>
>>>>> Hi everybody.
>>>>>
>>>>> It is nice to see you (DM, I suppose) got the osis2mod working  
>>>>> in no
>>>>> time at all. There is one more issue with preverse stuff. Some
>>>>> whitespace gets counted as preverse on my file and I think this is
>>>>> wrong, although it isn't that complicated at all to remove  
>>>>> whitespace
>>>>> from my source document. I paste a example here.
>>>>>
>>>>>
>>>>> Here is the input osis file. Please correct me, if I have  
>>>>> something
>>>>> wrong here.
>>>>> <!-- start of example clip -->
>>>>> <div type="bookGroup">
>>>>>       <title>Vana Testament</title>
>>>>>       <div type="book" osisID="Gen" canonical="true">
>>>>>               <title type="main">1. Moosese</title>
>>>>>                       <div type="section" scope="Gen.1.1-Gen. 
>>>>> 2.3" >
>>>>>                       <title>Maailma ja inimese loomine</title>
>>>>>                       <chapter sID="Gen.1" osisID="Gen.1" />
>>>>>                           <title type="chapter">1. peatükk</title>
>>>>>                           <p>
>>>>>                               <verse sID="Gen.1.1" osisID="Gen.
>>>>> 1.1" />
>>>>> Alguses lõi Jumal taevad ja maa.
>>>>>                               <verse eID="Gen.1.1" />
>>>>>                           </p>
>>>>>                           <p>
>>>>>                               <verse sID="Gen.1.2" osisID="Gen.
>>>>> 1.2" />
>>>>> Ja maa oli tühi ja paljas ja pimedus oli sügavuse peal ja Jumala  
>>>>> Vaim
>>>>> hõljus vete kohal.
>>>>>                               <verse eID="Gen.1.2" />
>>>>>                           </p>
>>>>> <!-- end of example clip -->
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> And here is the corresponding module output. Please notice the one
>>>>> space
>>>>> only preverse.
>>>>> <!-- start of example clip -->
>>>>> <div sID="gen1" type="bookGroup"/> <title>Vana Testament</title>  
>>>>> <div
>>>>> canonical="true" osisID="Gen" sID="gen2" type="book"/> <title
>>>>> type="main">1. Moosese</title> <div sID="gen3" scope="Gen.1.1- 
>>>>> Gen.2.3"
>>>>> type="section"/> <title>Maailma ja inimese loomine</title>
>>>>> <chapter osisID="Gen.1" sID="Gen.1"/> <title type="chapter">1.
>>>>> peatükk</title> <div sID="gen4" type="paragraph"/>
>>>>> Alguses lõi Jumal taevad ja maa.  <div eID="gen4"  
>>>>> type="paragraph"/>
>>>>> <div type="x-milestone" subType="x-preverse" sID="pv1"/><div
>>>>> sID="gen5"
>>>>> type="paragraph"/> <div type="x-milestone" subType="x-preverse"
>>>>> eID="pv1"/> Ja maa oli tühi ja paljas ja pimedus oli sügavuse  
>>>>> peal ja
>>>>> Jumala Vaim hõljus vete kohal.  <div eID="gen5" type="paragraph"/>
>>>>> <!-- end of example clip -->
>>>>>
>>>> The pre-verse contains "<p> " (the paragraph start and the space)
>>>>
>>>> Handling of whitespace is a bit problematic. What osis2mod does is
>>>> replace sequences of whitespace (newlines, spaces and tabs) with a
>>>> single space. If a verse contains leading or trailing space, it is
>>>> trimmed. (I don't think it should do this trimming.)
>>>>
>>>> What osis2mod does not have knowledge of the containment model of  
>>>> the
>>>> OSIS schema. That is, if it did, it could remove whitespace between
>>>> element tags that don't allow for text.
>>>>
>>>> In this case, the OSIS schema allows for whitespace after the  
>>>> opening
>>>> paragraph tag and before the verse tag. One could have:
>>>> <p>yada yada yada <verse>verse text</verse> yada yada yada</p>
>>>> In this case, it would be inappropriate to trim the whitespace  
>>>> off of
>>>> the text that precedes the verse.
>>>>
>>>> If we can come up with a good heuristic I'd be glad to implement  
>>>> it.
>>>>
>>>>
>>> For the case I have, it would be sufficient to check if the  
>>> preverse has
>>> any printing characters and not to add an empty preverse.
>>>
>>
>> The preverse is not empty, it contains
>> <div type="paragraph" sID="gen5">
>> which is the transformation of <p> into a milestoned representation.
>>
>> It also has a single space following that element.
>>
>> Where should the paragraph be put? It either is appended to the prior
>> verse or it is pre-verse.
>>
>> The one solution I thought of is that any whitespace immediately
>> following a block element start (<div>, <lg>, <p>, ...) can be  
>> deleted.
>> Likewise for any whitespace immediately before the end element.
>>
>> Would this work?
>>
>> In Him,
>>    DM
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> I reported this against Xiphos. It may be a bug of it's. You will find
> the screenshot from the report.
> https://sourceforge.net/tracker/?func=detail&aid=2801620&group_id=5528&atid=105528

That certainly looks bad in Xiphos. I think it is a rendering bug. We  
are getting more and more modules with structural markup (<div  
type="section">, <p>, <lg>, ...) and now osis2mod retains all of it in  
its original position (in the past some divs were dropped and inter- 
verse elements were rearranged.)

It appears that both are set to show verses starting on new lines.

What does it look like if it is not set that way?

I'm curious what it looks like when headings are turned off and it is  
in verse-per-line mode.

I know BibleDesktop has a similar problem in that when showing verse- 
per-line it does not properly account for newlines introduced by  
markup. It merely adds a new line before each verse number.

Maybe SWORD needs a verse-per-line filter that strips out structural  
markup? It looks like BibleTime has that.

In Him,
	DM



More information about the sword-devel mailing list