[sword-devel] French Darby translation is OSIS (beta version)

DM Smith dmsmith555 at yahoo.com
Sat May 12 19:02:55 MST 2007


On May 12, 2007, at 6:05 PM, Chris Little wrote:

>
>
> DM Smith wrote:
>>
>> On May 12, 2007, at 4:15 PM, Chris Little wrote:
>>
>>>
>>>> -Line-feed and tabulations are not considered as space: if you  
>>>> look at
>>>>
>>>> Genesis 1:2, it should be "Et l'Esprit de Dieu" and it is  
>>>> displayed as
>>>>
>>>> "Etl'Esprit de Dieu" (a space is missing).
>>>>
>>>
>>> This looks like a problem with osis2mod, but the OSIS file itself  
>>> could
>>>
>>> use some whitespace cleanup.  There is a lot of stray whitespace,  
>>> for
>>>
>>> example at ends of lines, before </p>. The problem in Genesis 1:2  
>>> could
>>>
>>> be handled by deleting changing the linefeed + tab to a single  
>>> space.
>>>
>>
>> I think this is rather a "feature". osis2mod is trimming "extraneous"
>> whitespace. I think this was to handle input that is pretty. I'm in
>> favor of retaining all whitespace. My opinion is that an osis  
>> document
>> should be what is actually wanted. I've got some changes I need to  
>> make
>> because of the NASB (osis2mod is not handling stuff between verses
>> well). I can change this too if it is what people want.
>
> It should trim whitespace in favor of smaller, simpler files. But  
> here,
> it sounds like \n and \t are being deleted rather than something like
> s/[\s]+/ /.
>
> I'm surprised we're doing this, but I'm just judging by the reported
> symptoms, rather than looking at the osis2mod code itself.

And I was going by memory. So shame on me. I just went and looked at  
the code.

Osis2mod does not get rid of any "extraneous" whitespace, but it  
calls FileMgr::getLine, which trims whitespace from the beginning and  
the end of the line. I also think there is a bug in its handling of  
line endings, in that in some places it just checks for 13 and others  
just 10 and yet others both are looked for.

 From what I can determine FileMgr::getLine is called by swcofig,  
osis2mod and imp2gbs.

I think this should be replace with a call to std::getline. This is  
used by imp2ld, imp2vs, and xml2gbs.

(for completeness, it should be noted that vpl2mod defines its own  
readline, which reads one character at a time into a buffer.)





More information about the sword-devel mailing list