[sword-devel] HowTo: create ztext module?

DM Smith dmsmith555 at yahoo.com
Tue May 9 05:07:36 MST 2006


On May 8, 2006, at 8:36 PM, Greg Hellings wrote:

>
> This brings up another interesting question (in my opinion).  Why are
> there several standard modules which are distributed without
> compression?  Things like the the ASV, the Vulgate and the WEB are all
> distributed in uncompressed format.  Might it be beneficial for us to
> zip those up (especially the ASV and WEB, which I would imagine are
> both popular modules?) and distribute them in a ztext format?  Are
> there any advantages to having them in rawtext rather than ztext,
> except for minor performance advantages?  Just curious!

In my opinion ztext serves two purposes:
1) Since it is not a simple compression of the files (i.e. you can't  
run unzip on them) but an internal compression of parts of the file,  
it raises the importance of the SWORD api in accessing them,  
providing a bit of information hiding.
2) The SWORD api downloads the files individually, and having  
compressed files improves download performance.

One may argue that it also reduces the disk footprint, which it does.  
But in today's world of large disks (even my old win 95 laptop has a  
20G drive), I don't think that is much of an issue.

The drawback is that the client application needs to uncompress the  
data on the fly and it does so into memory. Most use book compression  
so the entire book needs to be unzipped. Using the principle of  
locality, caching the most recent book's uncompression results in  
reasonable performance, since most requests for a verse are within  
the same book as the last verse requested. The anomalous case is the  
returning of a ranked verses for search of a common word. (Ranked  
searches is the normal behavior of Lucene).

I can appreciate a goal of encouraging the use of the SWORD api for  
access of a module's content, whether it was a deliberate or an  
accidental goal, but there are other, simple ways to achieve  
information hiding that don't have the performance penalty.

While not simpler, there are ways to compress the files that don't  
use stream compression such that each verse can be handled  
independently.

With regard to compression of the download file, I think that there  
are better ways to handle it as well. Rather than downloading the  
parts individually, we could change the installer to download a zip  
file of the entire contents.

As a side note, JSword was changed to download the raw zips rather  
than the parts because ftp within Java stopped working after a  
security patch on Win XP and it did not work behind a proxy on a Mac.  
Now we use http tunneling. The side benefit was that the downloads  
were faster, more reliable and when they failed, the cleanup was  
simpler. Also, the code was a bit simpler as well.



More information about the sword-devel mailing list