[sword-devel] Another Important Issue - fast searching frame

Tsoloane Moahloli sword-devel@crosswire.org
Tue, 29 Aug 2000 09:15:13 +0200 (SAST)


Hi Nathan, 

I am operating from Johannesburg.  I have the latest CVS stuff from SWORD and
can write that off to CD for you if you want.

Cheers,

T

On 29-Aug-2000 Nathan wrote:
> Good day Troy, Martin & others
> 
> (I only joined the list about 2 weeks ago, so I am still trying to
> find out who's who, who's doing what, what needs to be done, where
> is the code, etc etc)
> 
> I am busy doing something like this for my website at the moment.
> Only at a website, the speed is of even more importance, as you
> have many users and many requests at the same time.
> I have developed some techniques for making it fast enough, and
> they also seem to work well with large resultsets.
> It makes provision for most search requirements, including
> wildcards (mesch*), AND, OR, NOT, range, and I am also looking at
> ranking (most "relevant" at the top -- if requested)
> Maybe we should talk on this?
> (I am still working on it, but I am finished with 80%+ of it, so
> I know it works)
> 
> 1. In what format are the indexes that you are currently building?
> (I assume it is something like a list of pointers to verses)
> Are you also storing the number of times the word occurs in that
> verse?
> Are you working with ALL the words, or are you eliminating
> "stopwords"? (something I see some Bible programs are doing --
> most annoying imho)
> 
> 2. I have tried to look at where you are doing the new fast search
> in the Sword CVS, but time has not allowed me to explore this yet.
> Can you point me to where/what you are doing at the moment?
> (Or better, provide me with some quick high-level overview :-)
> 
> 3. This bring up another point. Not all users know regex, etc.
> But they will want to do complex searches. Are you looking at
> making the search user interface more simple?
> E.g. why ask the users to tell you that you must user regex when
> they type "mesch*"? The * should tell you that automatically.
> Or am I making it sound too easy?
> 
> God bless you,
> nathan
> http://www.nathan.co.za
> 
> PS. Where can I get hold of a Sword CD? I am in South Africa,
> so I guess the normal outlets don't work. And the ISO image is
> too big to download. I tried it! <grin>
> 
> 
> 
> 
> -----Original Message-----
> From: owner-sword-devel@crosswire.org
> On Behalf Of Troy A. Griffitts
> Sent: 29 August 2000 03:15
> To: sword-devel@crosswire.org
> Subject: Re: [sword-devel] Another Important Issue
> 
> 
> Martin,
>       Thanks for the post.  This is exactly what we are doing with the
> reference implementation of a fast searching framework.  We do one
> search for each word in the text and create an index of every word with
> verse references for each.  We save this index and every time a search
> is performed, we ask the index for the references for the word.  And,
> yes, as you said, we do multiword searches this way also.
> 
> Problems come with large result sets.  You see, not only do we have to
> find verse references for the word[s], we also have to verify that the
> verse references are within the search range specified (valid for the
> key used to specify the search bounds).  This entails iterating through
> the search results and asking the key if each one is valid.  For
> extremely large result sets, this takes just as long as searching the
> entire text, actually sometimes longer than the default searching
> mechanism.
> 
> Any suggestions on how to speed up this process would be greatly
> appreciated.
> 
>       -Troy.
> 
> 
> 
> Martin Gruner wrote:
>>
>> Another feature request:
>>
>> At the moment you can use sword to retrieve text (a list of words) by a
> key
>> (bible reference).
>> Is it possible to retrieve keys (a list of) by a word? I am not talking
> about
>> searching. I am talking about something like a concordance. This would
>> involve creating a file for every module that contains information about
> the
>> location of every single word in the module.
>> For example, if I look up "mesch", sword tells me that this word is not in
>> the module, but the words "mescha", "meschar", "meschelemja" ....
>> If I look up "meschelemja", sword will give me 3 references to where this
>> word occures in the bible.
>> Once this would be implemented, searches for a single word would be
> speeded
>> up amazingly, because sword would just look them up in the concordance.
> You
>> could even perform multi word searches using this mechanism.
>> I do not know how realistic this is, but it is at least another
> (discussable)
>> idea.
>>
>> Martin

-- 
When was the last time you did something for the first time?

Tsoloane Moahloli
Zen Computing (Pty)Ltd.
phone +27 11 706 7054
email: tsoloane@zen.co.za               URL: http://www.zen.co.za