[sword-devel] Comming soon: new improved sword searching

Jordan Wiens sword-devel@crosswire.org
Wed, 11 Sep 2002 10:37:56 -0500 (CDT)


No, arm would probably be slow as well; I'm running sword on a arm based
linux handheld.  I definitely would ~not~ want to slow down the search,
it's slow enough as it is, and memory is definitely important as well, so
having a large search index would not be useful to me.

-- 
jordan

On Wed, 11 Sep 2002 porton@narod.ru wrote:

> > On September 9, 2002 07:12, porton@narod.ru wrote:
> > > Bible is 31102 (if I counted correctly) verses. It is ~3.8Kbytes if a bit
> > > for every verse.
> >
> > You counted all the verses in the Bible?! (grin)
> >
> > > Searching for "Christ & (God | Father)" we can construct 3 such bit vectors
> > > (~10.6Kbytes) and then make logical operations over these.
> >
> > Bit vectors have some nice properties such as the ability to do very fast
> > logical operations. However, they have some significant downsides as well:
> >
> > 1. They are very large to store for the Bible. I did a quick calculation and I
> > figured the indexes I've build would increase approx 10 x if I stored them as
> > bit vectors. The reason for this is that the average word occurs only 100
> > times, at least in the KJV (I assume other word based languages are in the
> > same order of magnitude). This means that 4K bit vectors are very sparse.
>
> I don't suggest to store so for anything, but only for the most often
> encountered words (like "the").
>
> > 2. Converion to and from them can be costly computationaly (especially
> > converting from them). Since storing bit vectors and returning bit vectors to
> > the frontends aren't options this would have to be considered.
>
> If my memory is right, 80386 has a special command for searching ones in bit
> vectors. In any case searching non-zeor bytes is fast.
>
> > 3. Perhaps most significantly, bit vectors are only really a big improvement
> > for logical operators. Verse and word proximity (i.e. within x verses, or
> > within y words) are better done other ways. This could easily lead to
> > multiple conversions to and from bit vectors just to complete one search
> > expression.
>
> I'm not about verse proximity, but namely about paragraphs with specified
> borders!
>
> > > I can (as will have time) even write necessary algorithms. If it will be
> > > too slow for 80386, I can remember its assembler!
> >
> > Since Sword is a cross platform library, assembler isn't really an option (I
> > know it is already compiled on at least 3 different CPU arcitectures). Plus,
> > do you really think hand coded assembly would be much faster than what a good
> > compiler could produce for a series of bitwise logical operations on arrays?
>
> Isn't only 80386 slow?
>