[sword-devel] How does one uncompress the Crosswire nt.bzz and ot.bzz files?
dmsmith at crosswire.org
Thu Mar 1 11:15:46 MST 2012
On 03/01/2012 12:02 PM, David Instone-Brewer wrote:
> I need to get hold of the tagged Chinese Bible texts in a readable form
> because I'm trying to get some Chinese readers to check some issues
> with the tagging.
> Does anyone know how to uncompress the Crosswire nt.bzz and ot.bzz files?
> I tried renaming them as ZIP and GZIP etc but didn't get anywhere.
> Is it a proprietary compression routine, or have I missed something
David said it is proprietary. It is, but it is not secret. The poorly
commented code is readily available for personal study.
We use regular zip (or possibly lzss) on parts of the file and
concatenate the parts into the whole. Even if you figured out how to
split it into parts and uncompress it, the parts have no implicit order
and the verses in the parts also have no implicit order. Also, if the
module were fixed by appending corrected verses, it does not remove the
incorrect verse. You'd find both the old and the new in there. And you'd
not find any verse markers to help you figure out one verse from another.
Even if you had an uncompressed module, whose dat file is readable, the
order of the data is no indicator of the order of the text. And you'd
not find any verse markers.
The only way to work with the text is either to get the original from
the source (highly recommended) or use one of our export utilities. By
using the source you can work with the "owner" to feed back corrections,
which would ultimately get back to us.
Each module's conf gives information regarding the source of the text.
More information about the sword-devel