[sword-devel] Why is OSIS preferred? Was Re: usfm2osis.pl

Chris Little chrislit at crosswire.org
Tue Jul 1 06:54:47 MST 2008

Karl Kleinpaste wrote:
> "Jonathan Morgan" <jonmmorgan at gmail.com> writes:
>> ThML is also still (I think) used by the greatest percentage of our
>> modules (though that may be changed in the future).
> ...
>> Will GBF continue to be supported?  I seem to remember that Chris
>> reported lack of GBF support as a missing feature in BPBible, despite
>> the fact that I'm sure that I have heard statements suggesting GBF is
>> very strongly deprecated.  How many modules are still GBF?
> A couple shell commands will give useful summaries.  Refresh main and
> beta repos in your mod.mgr, then peek in ~/.sword/InstallMgr/*/mods.d.
> for i in plain gbf thml osis ; do
>     echo $i `grep -i ^sourcetype=$i * | wc -l`
> done
> Main:                   Beta:
> plain   2               plain   1
> gbf     49              gbf     0
> thml    163             thml    6
> osis    23              osis    93

This is a little misleading because plain is usually unmarked. (It's the 
default value of SourceType.)

The history of the numbers is basically that when I came to CrossWire, 
there was support for plaintext, GBF, and a specialized filter for just 
the RWP module. Eventually I outgrew GBF's capabilities, so I submitted 
the ThML filters and started using ThML wherever it appeared that GBF 
would be incapable of handling the data. Then I got this grand idea that 
we should use a single format for everything so that we wouldn't have to 
keep supporting n input formats times m render formats every time we 
needed to add features and so that we could have a more consistent look 
& feel across modules. At the time, ThML was the best we had, so lots of 
things got encoded as ThML, regardless of whether they could have been 
encoded as GBF. Then we got involved in OSIS, so we wrote OSIS filters 
and have been, fairly consistently, releasing only OSIS (or plaintext) 

As content gets upgraded, it will generally be upgraded to OSIS or TEI. 
Likewise, new content will generally be OSIS or TEI. And everything that 
gets posted in these formats will have passed schema validation.

> The reason for the new increase in beta OSIS modules is due to the
> arrival of 41 new WBT texts 2 days ago -- almost half the beta repo in
> one shot.
> Significantly, a couple of really important modules (LXX, for one) are
> still distributed as GBF.
> (Aside: All these new WBT texts appear in GS as "unknown" language.  Is
> there a mapping somewhere handy, from "ngu", "tzz", et al to something
> readable by mere mortals?  I'm happy to update GS to accommodate more
> language definitions but I need a source for them.)

The current ISO 630-3 table is at 

You can usually get an English-language name of the language by 
extracting the LCSH value, too (after removing Bible. and possibly O.T. 
or N.T.). I haven't added this info to the WBTI Bibles yet, though.

However, some of the language codes are incorrect and need to be fixed. 
(The ones I know of ATM are sco, which should be cso, and xmt, which 
should be mxt.)


More information about the sword-devel mailing list