[sword-devel] OSIS Commentaries
Peter von Kaehne
refdoc at gmx.net
Tue Jan 20 06:42:48 MST 2009
Chris Little wrote:
> Free floating, in both cases. Never make an explicit chapter/verse 0 in
> The importer /should/ correctly interpret material
Thanks, that seems to work.
>> 2) references..
> .. should be marked with <reference>, of course, but that's
> all. The cross-ref notes are intended primarily for hiding via filters,
> and you don't want to hide verse references that are part of the text.
>> 3) has anyone experience with creating sed/perl/whatever scripts which
>> read RtoL text? specifically references are the remaining big problem.
> I can't think of any that I've done with references, but I can't see
> what the problem would be. If the text is Unicode (as opposed to some
> 8-bit encoding where text is being encoded in visual order rather than
> logical), then there shouldn't be any difference from the text
> processing side between left to right and bidi text.
And that is the problem. It is now unicode. It was originally written in
Zarnegar. Zarnegar is a DOS based ancient Word processor specific for
the Iranian area. Its format is highly proprietary and the importers are
all struggling. I got the best I could get, but it is bad. Of particular
concern are the references which are jumbled. I start to determine
patterns, but frankly it is a mess.
The other difficulty for me is my lack of understanding of how to
encode letter ranges in Farsi for sed regexes (or perl for that matter).
With letters across the whole range of unicode seemingly distributed,
simple ranges do not seem to exist (like [a-z]) or at least I have
found nothing on that.
Also numbers appear sometimes Farsi sometimes Western
> Perhaps we need an example here.
لاويان 5:18؛ روميان 5:10
This Lavian 5:18 and Rumian 5:10 (Lev 5:18 and Rom 5:10)
I simply do not know what this should be. Maybe
Lavian 16:6, Lavian16:10-16
I am trying to obtain a printed source to disentangle this, but I guess
the difficulties are obvious
>> 4) are there any prettifiers for XML - as soon as rtol text is involved
>> it becomes a convolute anyway, but something which would dispense of
>> white space and enforce a clean indentation system would be still nice.
> I hope others can give better advice here. I generally avoid linebreaks
> except on </div> and </p>, and I avoid all indentation of XML. But most
> XML editors will do some level of pretty-printing. Possibly the free
> editor Wolfgang mentioned would work.
> From the command line, tidy should work, but would need some
> configuration to work with XML.
>> 5) should verse and chapter divisions milestoned or as containers -
>> structurally the commentary is chapter by chapter, verse by verse, so
>> container appears sensible, but before I commit much energy...
> If you check the wiki, http://www.crosswire.org/wiki/OSIS_Commentaries,
> you'll note that commentaries are /supposed/ to have a quite different
> structure from Bibles. DM has added support for such correctly-encoded
> commentaries to osis2mod.
> Nevertheless, commentaries encoded as Bibles should work fine.
Sorry, yes. I had read it but not fully figured. I do now.
More information about the sword-devel