[sword-devel] Fast Search Framework [was: Another Important Issue]
Troy A. Griffitts
Tue, 29 Aug 2000 16:40:49 -0700
OK, I've heavily documented the code for the search framework.
Please get the latest rawtext.cpp from cvs and you should have
some help to understand it now.
"Troy A. Griffitts" wrote:
> Joe and others that asked,
> The code for our first attempt at word indices and fast searches is in:
> RawText::createSearchFramework // creates the framework (done once)
> RawText::Search // uses the framework
> Anyone want to reimplement these?
> I know! Let's have a contest! :) Smallest indices with the fastest
> _accurate_ response time wins.
> Joe Walker wrote:
> > Nathan wrote:
> > > In option 3, would the bitmap not be about 8.3K? (31102 verses / 8)
> > > Else it is a bytemap, not a bitmap :)
> > To put the size in perspective. If you take every word in the KJV and
> > search for it and store the results in whatever is the smallest of
> > the 3 approaches mentioned, and store the lot in a big RandomAccessFile
> > then the total size is 4.5Mb
> > This is in my opinion a little on the large size if you want to do
> > a d/l of a new version. However since the data is duplicated there is
> > nothing to stop a clever installation script creating it, or even
> > a very clever caching search that creates it on the fly.
> > > You are right that it is very fast. I use the same method.
> > > For wildcards it is also really fast (just OR a few bitmaps).
> > > The way to work around the huge size of the "bitmap index" is to
> > > store it in another format (like a list or Ranged list) and
> > > convert when needed.
> > I have a working scheme where by you can do a best match. So you type
> > in your phrase and it first looks up every word you typed in in a
> > thesaurus and then searches for every match, returning you the verse with
> > hopefully the most similar meaning.
> > I find it very useful, but for it to work you do need a blinding fast
> > search mechanism.
> > > I like your idea about the RangedPassage as well. It really makes
> > > the list of verses for certain "common" words much smaller.
> > >
> > > Where is your program located Joe?
> > There was a servlet version on the web, but I think it is broken right
> > now. I've been working on a project for my brother (blood and in Christ)
> > that needed to be done before his wedding, so I've not done much on it
> > in the past few months.
> > If you want to look at code, then I can send you what ever you want
> > very quickly. If you want a working product then I'll need a few more
> > weeks.
> > I've tarred up the code in question. And I'll place it at:
> > http://www.eireneh.com/passage.tar.gz
> > It is all Java, and will only be of use for case A above.
> > Joe.