[sword-devel] Another Important Issue

Joe Walker sword-devel@crosswire.org
Tue, 29 Aug 2000 22:49:56 +0100


Nathan wrote:
> In option 3, would the bitmap not be about 8.3K? (31102 verses / 8)
> Else it is a bytemap, not a bitmap :)

To put the size in perspective. If you take every word in the KJV and
search for it and store the results in whatever is the smallest of
the 3 approaches mentioned, and store the lot in a big RandomAccessFile
then the total size is 4.5Mb

This is in my opinion a little on the large size if you want to do
a d/l of a new version. However since the data is duplicated there is
nothing to stop a clever installation script creating it, or even
a very clever caching search that creates it on the fly.

> You are right that it is very fast. I use the same method.
> For wildcards it is also really fast (just OR a few bitmaps).
> The way to work around the huge size of the "bitmap index" is to
> store it in another format (like a list or Ranged list) and
> convert when needed.

I have a working scheme where by you can do a best match. So you type
in your phrase and it first looks up every word you typed in in a
thesaurus and then searches for every match, returning you the verse with
hopefully the most similar meaning.
I find it very useful, but for it to work you do need a blinding fast
search mechanism.

> I like your idea about the RangedPassage as well. It really makes
> the list of verses for certain "common" words much smaller.
> 
> Where is your program located Joe?

There was a servlet version on the web, but I think it is broken right
now. I've been working on a project for my brother (blood and in Christ)
that needed to be done before his wedding, so I've not done much on it
in the past few months.

If you want to look at code, then I can send you what ever you want
very quickly. If you want a working product then I'll need a few more
weeks.

I've tarred up the code in question. And I'll place it at:
  http://www.eireneh.com/passage.tar.gz
It is all Java, and will only be of use for case A above.

Joe.