[sword-devel] Re: [bt-devel] fast search

sword-devel@crosswire.org sword-devel@crosswire.org
Thu, 08 Feb 2001 08:05:31 -0600

Trevor Jenkins wrote:
> On Wed, 7 Feb 2001, kbrannen@gte.net <kbrannen@gte.net> wrote:
> > Trevor Jenkins wrote:
> > > > Yes, but the size of one given book or commentary would never change
> > > > once the book is finished.
> > >
> > > You're forgetting personal notes. These are expected to grow. After years
> > > of use I'd expect to be able to search my notes with the same speed I
> > > search the Bible text or someone else's notes ... uh book.
> >
> > I know that QuickVerse (and probably others) get around this by making the
> > "notes" a separate file; hence, the book really doesn't change sizes.
> Sorry I meant to say that my notes will eventually be a "book". Because
> I'm working on them at all the time the size of the book increases. But I
> expect the same search facility to be availabe for the Bible modules,
> extant books and for my constantly growing personal book(s).

OK, that's reasonable. :-)

> > ... After all, even bad algorithms run well/fast when the data is small
> > enough. :-)
> Have yo read Bently's "Programming Pearls" book? Makes for interesting
> reading especially the account of how a Radio Shack TRS-80 outperformed a
> Cray-1! Bad algorithms are just that bad. Good algorithms can be badly
> implemented but would always be prefered to bad algorithms goodly
> implemented.

No, but it (and its sequel) are on my "to-read" list.

I was thinking of my experiences where I had developed an algorithm and tested
it on something like 1000 rows of data and it executed in < 1s.  After being
pleased with its correctness, I extrapolated its running time to 10-15 minutes
for 10 million rows of data, and finding that accectable, let it rip while I
went off to lunch.  Upon coming back, I found the program had done < 1% of the
data!  I'd introduced an extra loop and gave it quadractic running time.  :-( 
It wasn't going to finish for over 3 days.  Hence my observation that even bad
algorithms can show good performance on small data sets; but you are quite
correct, they are still bad algorithms. :-)

The story does end on a good note though.  I rewrote the algorithm and it
later ran the 10M rows thru in about 20 minutes. :-)