[sword-devel] How does one uncompress the Crosswire nt.bzz and ot.bzz files?

DM Smith dmsmith at crosswire.org
Thu Mar 1 11:15:46 MST 2012


On 03/01/2012 12:02 PM, David Instone-Brewer wrote:
> I need to get hold of the tagged Chinese Bible texts in a readable form
> because I'm trying to get some Chinese readers to check some issues 
> with the tagging.
>
> Does anyone know how to uncompress the Crosswire nt.bzz and ot.bzz files?

Use mod2imp.

> I tried renaming them as ZIP and GZIP etc but didn't get anywhere.
> Is it a proprietary compression routine, or have I missed something 
> obvious?

David said it is proprietary. It is, but it is not secret. The poorly 
commented code is readily available for personal study.

We use regular zip (or possibly lzss) on parts of the file and 
concatenate the parts into the whole. Even if you figured out how to 
split it into parts and uncompress it, the parts have no implicit order 
and the verses in the parts also have no implicit order. Also, if the 
module were fixed by appending corrected verses, it does not remove the 
incorrect verse. You'd find both the old and the new in there. And you'd 
not find any verse markers to help you figure out one verse from another.

Even if you had an uncompressed module, whose dat file is readable, the 
order of the data is no indicator of the order of the text. And you'd 
not find any verse markers.

The only way to work with the text is either to get the original from 
the source (highly recommended) or use one of our export utilities. By 
using the source you can work with the "owner" to feed back corrections, 
which would ultimately get back to us.

Each module's conf gives information regarding the source of the text.

In Him,
     DM



More information about the sword-devel mailing list