[sword-devel] Fast Search Framework [was: Another Important Issue]

Troy A. Griffitts sword-devel@crosswire.org
Tue, 29 Aug 2000 16:40:49 -0700


OK, I've heavily documented the code for the search framework.
Please get the latest rawtext.cpp from cvs and you should have
some help to understand it now.

		-Troy.





"Troy A. Griffitts" wrote:
> 
> Joe and others that asked,
> 
> The code for our first attempt at word indices and fast searches is in:
> sword/src/modules/texts/rawtext/rawtext.cpp
> 
> RawText::createSearchFramework  // creates the framework (done once)
> RawText::Search // uses the framework
> 
> Anyone want to reimplement these?
> 
> I know!  Let's have a contest! :)  Smallest indices with the fastest
> _accurate_ response time wins.
> 
>         :),
>                 -Troy.
> 
> Joe Walker wrote:
> >
> > Nathan wrote:
> > > In option 3, would the bitmap not be about 8.3K? (31102 verses / 8)
> > > Else it is a bytemap, not a bitmap :)
> >
> > To put the size in perspective. If you take every word in the KJV and
> > search for it and store the results in whatever is the smallest of
> > the 3 approaches mentioned, and store the lot in a big RandomAccessFile
> > then the total size is 4.5Mb
> >
> > This is in my opinion a little on the large size if you want to do
> > a d/l of a new version. However since the data is duplicated there is
> > nothing to stop a clever installation script creating it, or even
> > a very clever caching search that creates it on the fly.
> >
> > > You are right that it is very fast. I use the same method.
> > > For wildcards it is also really fast (just OR a few bitmaps).
> > > The way to work around the huge size of the "bitmap index" is to
> > > store it in another format (like a list or Ranged list) and
> > > convert when needed.
> >
> > I have a working scheme where by you can do a best match. So you type
> > in your phrase and it first looks up every word you typed in in a
> > thesaurus and then searches for every match, returning you the verse with
> > hopefully the most similar meaning.
> > I find it very useful, but for it to work you do need a blinding fast
> > search mechanism.
> >
> > > I like your idea about the RangedPassage as well. It really makes
> > > the list of verses for certain "common" words much smaller.
> > >
> > > Where is your program located Joe?
> >
> > There was a servlet version on the web, but I think it is broken right
> > now. I've been working on a project for my brother (blood and in Christ)
> > that needed to be done before his wedding, so I've not done much on it
> > in the past few months.
> >
> > If you want to look at code, then I can send you what ever you want
> > very quickly. If you want a working product then I'll need a few more
> > weeks.
> >
> > I've tarred up the code in question. And I'll place it at:
> >   http://www.eireneh.com/passage.tar.gz
> > It is all Java, and will only be of use for case A above.
> >
> > Joe.