[sword-devel] Search up to 5.8 times faster now :)

Wed Jun 2 16:22:59 MST 2004

Hi Jochim,

Matthew 25:21 :-)

Very Cool (if that isn't a totally outdated phrase :-)

Feeling like a wet blanket, but here is something else to consider ...

If it is acceptable to allocate a 4 meg buffer, the entire Bible text can be
parsed into such a buffer with ALL filtering done ONCE. This can be done in
a low priority thread while the front-end is loading (with
sync/sentinel/monitor ipc to avoid race conditions?) and not make app launch
take any longer than the current quick performance.

Since ALL filtering is done only ONCE, searches can be up to 100x faster
with the KJV since it has so many filters/tags embedded. Other Bible texts
with fewer embedded tags will still experience dramatic search improvement,
but not quite as much. Even rawtext Bibles with little or no embedded tags
will have searches done much faster (WEB and BBE)

This might also dramatically speed up searching of encrypted text, since my
observation is that each chapter is decrypted and defiltered for every
search.

Also, I have experienced much better search performance by putting together
a monolithic RawText::StripText function that uses state machine logic to
only have to make one pass thru each verse. This sounds similar, but not
quite the same as what you are describing.

LcdBible uses the "experimental plug-in subset" of the sword-api, and is
generally 3x to 10x faster than the current BibleCS because it only fiters
once with a monolithic StripTags within a modified RawText class. BerBible
(Berean Bible) uses both of the approaches described above, and is generally
10x to 100x faster than the current BibleCS on searches.

For His glory and honor,
Lynn A
paraclete at bibleinverse.org

----- Original Message ----- 
From: "Joachim Ansorg" <junkmail at joachim.ansorgs.de>
To: "SWORD Developers' Collaboration Forum" <sword-devel at crosswire.org>
Sent: Wednesday, June 02, 2004 4:25 PM
Subject: [sword-devel] Search up to 5.8 times faster now :)

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
> the standard search function is now up to 5 times faster than before.
>
> Let me explain.
> A search in a module did the following:
> 1. Get the text of a key by calling all the strip filters ()
> 2. Search the search words in the stripped down text
> 3. If it was found add it to the result
> We assume a module with 6 strip filters.
> This means the expensive StripText() function got called 30000*6=180000
times.
>
> Now we check for the words in the raw text and only check keys which had a
> valid match in the raw text if they match in the stripped down text.
> If we assume a normal query returns 100 results the StripText function
gets
> called 100*6=600 times which saves a lot of time.
>
> Old/new comparision:
> time ./old/examples/cmdline/search KJV Revelation
> real    0m18.912s
> user    0m18.090s
> sys     0m0.780s
>
> time ./new/examples/cmdline/search KJV Revelation
> real    0m3.396s
> user    0m2.540s
> sys     0m0.830s
> Which is an improvement factor of 5.6 :)
>
> ./new/examples/cmdline/search WEB God
> only takes 2.1 secs now.
>
> Another example:
> time ./old/examples/cmdline/search KJV God
> real    0m20.371s
> user    0m18.130s
> sys     0m0.950s
>
> time ./new/examples/cmdline/search KJV God
> real    0m5.566s
> user    0m4.730s
> sys     0m0.810s
> This is "only" 3.7 times faster, because searching in the raw text gives
more
> hits which means more calls to StripText(). I tested it with a search for
" "
> which means all verses and it's as slow as the old one. Which ones usual
> search queries are a lot faster than before.
>
> The fix is in CVS now.
>
> Joachim
> - -- 
> <>< Re: deemed!
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (GNU/Linux)
>
> iD8DBQFAvlP4EyRIb2AZBB0RAqF0AKC+VgR5O3Ex9kmgtP8U6vlOgD82GwCfTapO
> yCdN4G7E22dFk6oz09wAXXY=
> =gqKO
> -----END PGP SIGNATURE-----
> _______________________________________________
> sword-devel mailing list
> sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
>