[sword-devel] usfm2osis.pl

Daniel Owens dhowens at pmbx.net
Wed Nov 12 22:10:33 MST 2008


Peter,

I did some work on usfm2osis.pl to work with several usfm source texts, but I don't know if my revision ever made its way on to the server (I think I sent version 1.4 to Chris). I can send it to you if you like. It fixes a few problems and extends the support of usfm tags slightly. Perhaps the biggest contribution I made was to list/identify all of the usfm tags (using a UBS handbook) and create comments in the pl file about what tags usfm2osis.pl supports in some fashion (keep in mind that the script doesn't always support every tag well...) and which ones it doesn't. There are some obscure and some not-so-obscure tags that it doesn't do anything with. Also it fixes some weird tagging that results due to the way verse eid's are produced (though imperfectly). I don't believe anyone but me has tested the changes, so any suggestions you have would be welcome.


One thing I learned in working with it is that it doesn't handle multiple notes in a verse well unless they are on separate lines in the source file. The result is often that a note is partially transformed into osis, but back-slashes remain from the usfm. If I were you I would open all the source files in jEdit and search and replace in all buffers the opening note tags (\f and \x, if I remember correctly), adding a line break before each note, making sure you don't split a note up, so that the script catches all the notes.

I should add, though, that I have only worked with ltr texts up to this point...

Daniel 


-----Original Message-----
From: "Chris Little" <chrislit at crosswire.org>
To: "SWORD Developers' Collaboration Forum" <sword-devel at crosswire.org>
Sent: 11/13/08 9:33 AM
Subject: Re: [sword-devel] usfm2osis.pl

Not that I very much desire to open usfm2osis.pl again, but could you 
post an example? I'm having trouble guessing what kind of input is 
resulting in what kind of output.

And what's the encoding of the text? UTF-8, an 8-bit encoding, or 
something else?

--Chris


Peter von Kaehne wrote:
> Thanks to Chris who rewrote usfm2osis a while back it works a lot better
> with utf8 texts.
> 
> A permanent problem I have though with rtol texts is the treatment of
> foot notes:
> 
> As a result of producing essentially a bidi text with ltr tags and rtl
> content inline ltr tags get often messed up. This affects mostly the
> <note> and </note> tag. with order of "note", slash and brackets mixed
> up. As result these require often difficult by hand fixing.
> 
> Is there a way to improve usfm2osis.pl in this matter.
> 
> Thanks!
> 
> Peter
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel at crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list