[sword-devel] osis2mod import issue

DM Smith dmsmith at crosswire.org
Wed Jun 3 16:25:31 MST 2009

On Jun 3, 2009, at 1:36 PM, Mattias Põldaru wrote:

> Hi everybody.
> It is nice to see you (DM, I suppose) got the osis2mod working in no
> time at all. There is one more issue with preverse stuff. Some
> whitespace gets counted as preverse on my file and I think this is
> wrong, although it isn't that complicated at all to remove whitespace
> from my source document. I paste a example here.
> Here is the input osis file. Please correct me, if I have something
> wrong here.
> <!-- start of example clip -->
> <div type="bookGroup">
>        <title>Vana Testament</title>
>        <div type="book" osisID="Gen" canonical="true">
>                <title type="main">1. Moosese</title>
>                        <div type="section" scope="Gen.1.1-Gen.2.3" >
>                        <title>Maailma ja inimese loomine</title>
>                        <chapter sID="Gen.1" osisID="Gen.1" />
>                            <title type="chapter">1. peatükk</title>
>                            <p>
>                                <verse sID="Gen.1.1" osisID="Gen. 
> 1.1" />
> Alguses lõi Jumal taevad ja maa.
>                                <verse eID="Gen.1.1" />
>                            </p>
>                            <p>
>                                <verse sID="Gen.1.2" osisID="Gen. 
> 1.2" />
> Ja maa oli tühi ja paljas ja pimedus oli sügavuse peal ja Jumala Vaim
> hõljus vete kohal.
>                                <verse eID="Gen.1.2" />
>                            </p>
> <!-- end of example clip -->
> And here is the corresponding module output. Please notice the one  
> space
> only preverse.
> <!-- start of example clip -->
> <div sID="gen1" type="bookGroup"/> <title>Vana Testament</title> <div
> canonical="true" osisID="Gen" sID="gen2" type="book"/> <title
> type="main">1. Moosese</title> <div sID="gen3" scope="Gen.1.1-Gen.2.3"
> type="section"/> <title>Maailma ja inimese loomine</title>
> <chapter osisID="Gen.1" sID="Gen.1"/> <title type="chapter">1.
> peatükk</title> <div sID="gen4" type="paragraph"/>
> Alguses lõi Jumal taevad ja maa.  <div eID="gen4" type="paragraph"/>
> <div type="x-milestone" subType="x-preverse" sID="pv1"/><div  
> sID="gen5"
> type="paragraph"/> <div type="x-milestone" subType="x-preverse"
> eID="pv1"/> Ja maa oli tühi ja paljas ja pimedus oli sügavuse peal ja
> Jumala Vaim hõljus vete kohal.  <div eID="gen5" type="paragraph"/>
> <!-- end of example clip -->

The pre-verse contains "<p> " (the paragraph start and the space)

Handling of whitespace is a bit problematic. What osis2mod does is  
replace sequences of whitespace (newlines, spaces and tabs) with a  
single space. If a verse contains leading or trailing space, it is  
trimmed. (I don't think it should do this trimming.)

What osis2mod does not have knowledge of the containment model of the  
OSIS schema. That is, if it did, it could remove whitespace between  
element tags that don't allow for text.

In this case, the OSIS schema allows for whitespace after the opening  
paragraph tag and before the verse tag. One could have:
<p>yada yada yada <verse>verse text</verse> yada yada yada</p>
In this case, it would be inappropriate to trim the whitespace off of  
the text that precedes the verse.

If we can come up with a good heuristic I'd be glad to implement it.

> Thanks for your effort and good work.
> Regards
> Mattias

More information about the sword-devel mailing list