[bt-devel] change in search algo

Martin Gruner mg.pub at gmx.net
Sat Oct 21 12:48:05 MST 2006


Hi friends,

today I changed BibleTime's (CVS) search implementation from using the 
StandardAnalyzer to using the WhitespaceAnalyzer. The difference is that the 
StandardAnalyzer applies a set of default English stop words to the text 
being indexed and the queries. That means words like "the", "they" and "then" 
were not found, because they are assumed to produce too many results. Within 
BibleTime, this seems not acceptable to me, so I changed it. The new analyzer 
just splits the query into words according to the whitespace. Everything will 
be indexed and can be queried. This means the index will be slightly bigger, 
but everything can be found.

Is this ok, or would somebody disagree? Please let me know.

mg


P.S. I also improved our own search highlighting a bit to handle "*" more 
correctly. The best solution, however, would be to use clucene for that as 
well...



More information about the bt-devel mailing list