[sword-devel] STEP modules

Fri, 3 Nov 2000 05:15:27 -0800

Troy,
	I'm sorry that I haven't made more progress on the STEP modules.
I've been planning on "getting around" to starting up again, but things are
pretty busy at work.  I upgraded my Debian Linux box to 2.2, and finally got
sound working so I can listen to music while I write.

	The next step is probably related to you're proposal.  I need to
figure out what to do with the decompressed RTF text to make the STEP
modules compatible with the regular modules.  Obviously, if RTF output is
needed, that's no problem.  However, I suspect that the output needs to be
something else, but I don't know what that is.  I'm somewhat daunted at the
task at the moment, because I've never programmed in Linux, Gnome/GTK, or
KDE/Qt, and I don't have a good understanding of how SWORD works, either.
However, I would like to very much.  I am somewhat more familiar with Visual
C++, but as far as I can tell, none of the SWORD modules are written using
that.

	So... I would like to remain involved, and any help and direction
you (and the other SWORD developers) are able to provide will be greatly
appreciated.

Steve

P.S. Thanks for the CDs.  I'm going to see if anyone at church is interested
in SWORD.

-----Original Message-----
From: Troy A. Griffitts [mailto:scribe@crosswire.org]
Sent: Thursday, November 02, 2000 11:26 PM
To: sword-devel@crosswire.org; steve.trandahl@cotelligent.com
Subject: [sword-devel] STEP modules

Steve,
	We've had some more interest in the STEP area, and I was wondering
if
you might be able to give us a rundown of what you were able to
accomplish?  If you actually understand the STEP spec well enough to
decompress a module, I'd love for us to work together to add a low level
driver to the framework for this.  I realize the RTF+STEPtoHTML filter
will be a much larger effort, but let's at least capture what you got to
work in code-- if you're willing.

	Thanks,
		-Troy.

"Trandahl, Steve" wrote:
> 
> I don't know too much about searching and compression algorithms, but
here's
> a suggestion...  SWORD already has a compression/decompression module that
> was added to support STEP format Bibles.  As I understand it, the
algorithm
> is public domain.  I haven't tested the compression portion of the code,
but
> decompression works.
> 
> Steve
> 
> -----Original Message-----
> From: Drew Haninger [mailto:drew@OliveTree.com]
> Sent: Monday, September 18, 2000 10:18 AM
> To: sword-devel@crosswire.org
> Subject: Re: [sword-devel] Some other fast search index options
> 
> With Palm Pilot's with more memory available, it would seem not worth the
> effort, but with cell phones on the horizon as a new platform, the effort
> may be worth it.  We never finished doing a better compression for the
> smaller Palm Pilot's.
> 
> Drew Haninger
> 
> ----- Original Message -----
> From: "Nathan" <mail@nathan.co.za>
> To: <sword-devel@crosswire.org>
> Sent: Sunday, September 17, 2000 6:05 AM
> Subject: RE: [sword-devel] Some other fast search index options
> 
> > Good day Jerry Hastings and others
> >
> > On 3rd September, Jerry Hastings wrote:
> > > Here are three more methods to try.  Consider the bit map
> > > divided into records, of say 32, 16, or 8 bits each.
> >
> > I did the tests as you recommended. The size of the index file
> > comes down from 894 Kb to about 814 Kb.
> > I also tried using Huffman compression, LZW, LZSS, plain RLE,
> > or a combination of all the above. I got it to about 600K.
> >
> > The problem is that we are still compressing the BITMAP files.
> > The uncompressed bitmap files are 48.8 Mbyte. Compressing this
> > down to 600K is impressive, but we can do better by rather
> > trying to compress the verse-lists (list of verses in which the
> > word occurs). The uncompressed size of these lists is 1.23 Mb.
> > (Option 1)
> >
> > Reading up on the subject in the latest research papers,
> > they all seem to indicate that the best compression  of the
> > index file is to use these verse-lists (option 1, also
> > called 'inverted lists'), and compress these lists.
> >
> > The steps are:
> > 1. Get the list of verses where the word occurs.
> >    This can be an array of 16-bit words (for the Bible)
> > 2. Get the differences between the verses (deltas), e.g.
> >    if the verses are 3, 10, 15, 21, ...
> >    the delta list would be 3, 7, 5, 6, ...
> >    These deltas can be fairly much the same throughout the list.
> > 3. Compress this delta-list with something like
> >    arithmetic coding (seems to work best).
> >
> > To use this index, the following will have to be done:
> > 1. Evaluate the search query
> > 2. Get the words in the word-list.
> > 3. Get their corresponding compressed delta-lists.
> > 4. Uncompress.
> > 5. Convert delta-list to verse-list
> > 6. Convert to bitmap if you need to do NOT, OR, AND, or
> >    wildcards
> >
> > Question: Is it really worth all that effort to save 400K
> > on the size of the index? A few years ago I would have done
> > it, but with the smallest hard disks for PC's being 8 Gb
> > nowadays, is it really a big deal to have an index of 900K?
> >
> > If you really want to compress, then why keep the text of the
> > Bible uncompressed? You can save much more compressing that! :)
> >
> > God bless,
> > nathan
> > http://www.nathan.co.za
> >
> >
> >