[sword-devel] Another Important Issue

Troy A. Griffitts sword-devel@crosswire.org
Mon, 28 Aug 2000 18:15:16 -0700


Martin,
	Thanks for the post.  This is exactly what we are doing with the
reference implementation of a fast searching framework.  We do one
search for each word in the text and create an index of every word with
verse references for each.  We save this index and every time a search
is performed, we ask the index for the references for the word.  And,
yes, as you said, we do multiword searches this way also.

Problems come with large result sets.  You see, not only do we have to
find verse references for the word[s], we also have to verify that the
verse references are within the search range specified (valid for the
key used to specify the search bounds).  This entails iterating through
the search results and asking the key if each one is valid.  For
extremely large result sets, this takes just as long as searching the
entire text, actually sometimes longer than the default searching
mechanism.

Any suggestions on how to speed up this process would be greatly
appreciated.

	-Troy.



Martin Gruner wrote:
> 
> Another feature request:
> 
> At the moment you can use sword to retrieve text (a list of words) by a key
> (bible reference).
> Is it possible to retrieve keys (a list of) by a word? I am not talking about
> searching. I am talking about something like a concordance. This would
> involve creating a file for every module that contains information about the
> location of every single word in the module.
> For example, if I look up "mesch", sword tells me that this word is not in
> the module, but the words "mescha", "meschar", "meschelemja" ....
> If I look up "meschelemja", sword will give me 3 references to where this
> word occures in the bible.
> Once this would be implemented, searches for a single word would be speeded
> up amazingly, because sword would just look them up in the concordance. You
> could even perform multi word searches using this mechanism.
> I do not know how realistic this is, but it is at least another (discussable)
> idea.
> 
> Martin