[sword-devel] Fast search

Nathan sword-devel@crosswire.org
Sun, 17 Sep 2000 15:06:03 +0200

Good day Troy and others

On 8th September: Troy Griffitts wrote:
> I hope no one is getting discouraged as to what should be 
> developed for a fast search framework. 

I need to ask a few questions at this point.
I have only joined the list recently, so if I am asking
some things which have been decided, please tell me so :)

--- Start of questions ---

Question 1:
How much effort do we really want to do to get the index
as small as possible? 
Is it really worth all the effort to do arithmetic coding
on some delta list, and do 3 conversions to save 400K
on the size of the index? (See my earlier answer to
Jerry Hastings) A few years ago I would have done
it, but with the smallest hard disks for PC's being 8 Gb 
nowadays, is it really a big deal to have an index of 900K?

If you really want to save space and do compression, then 
why keep the text of the Bible uncompressed? You can save 
much more compressing that! :)
It is the usual question of speed vs. simplicity vs. space.

Question 2:
The Sword program with all its others like BibleTime, etc.
will ultimately be for users as well, not just programmers.
If we look at the search from that perspective, we realize 
that these users of the program do not know what a regular 
expression is. They just want to type some words, and get 
results (preferably fast). They might know something about 
AND and OR and NOT and wildcards, and use these.
Why should they tell the program whether to use Multi-words,
phrase or regular expressions?
Should the program not be able to do all those from one 
input box?
(My apologies if I am questioning design decisions which 
you have made long ago.)

Question 3:
So, what is expected from the fast search? 
Do we want to keep the regular expressions (as Martin also
asked on 7th Sept), or do we build it into the fast search?
Must the index be small?
Must the fast search be the main search, or is it just an 
extra add-on?

Question 4:
Depending on the answer to 3, 
If the new fast search does everything, and we do not need
regular expressions, do we keep the text of the modules
as text, or do we compress it for space and speed reasons?
This could have some wider effects so you will have to be 
careful on this one :)

--- End of questions ---

Given some answers/decisions on these, the fast search should
actually not take all that long.
(I have some code for generating word-lists, bitmaps, 
verse-lists, parsing queries and their trees, bitmap 
operations, etc. Some of it is using VB, C and SQL stuff, 
but at I can at least give you some pseudo-code, which can 
be converted to C++ very quickly.)

God bless,

(PS. The following is my personal opinion and is not to be 
seen as trying to influence anybody's decisions...
1. Keep it at 900 Kb. No compression. No 3x conversions.
2. Keep it simple for the users
3. Build it in
4. Don't compress it yet)