[bt-devel] Fwd: Re: clucene crash when searching

Eeli Kaikkonen eekaikko at mail.student.oulu.fi
Wed Nov 19 13:17:41 MST 2008


On Tue, 18 Nov 2008, Martin Gruner wrote:

> I do not see that we would gain much by adding support of Sword's non-indexed
> search engines, except for the ability to search for phrases.
>
> Searching in BT should be simple and consistent. That means that we should
> not, in my opinion, offer different search syntaxes to the user. Maybe one
> exception: a regexp-based search for power users, but all "normal" users
> should have one single search to work with (from a user's point of view).

Actually that is almost the same thing I was pointing at. If we add a
tab for regexp search it wouldn't add any complexity for users - they
can always choose not to open the tab. The current search launchers
would open the current search UI, only the tab text would be different.
Regexp search would give all possible power to power users, including
phrase search.

>
> My suggestion would be to talk about the search engine we do use, clucene. I
> just checked - they released a bugfix 0.9.21 version recently, and 0.9.23,
> which is a beta-quality preview release of their next development branch,
> which is supposed to improve Lucene compatibility/feature coverage. Ben also
> told me that he was going to implement the wildcard operator in the beginning
> of words (like "*minded").
> But nobody can say how long this will take. So we may want to use another open
> source search engine which suits our needs better.
>

Good to hear clucene is going forward. I'm pessimistic about finding a
better alternative, but it's always good to look around. I'm even more
pessimistic about the idea of using something else than we and Sword
already use. Even though we don't depend on Sword in this we may still
benefit each other (and other frontends) by helping (c)lucene.

A technical note about wildcard search: allowing prepended wildcard may
lead to very slow performance, it might be as slow as going through the
whole module because every word in index must be tested. Creating
another index would help but I don't know if it's realistic. Even now
searching for very common words (e.g. "and, "an")  may be more than 5
seconds with a slow machine. Therefore a threaded search may be
necessary later even if we stick with indexed search only.  (On the
other hand, this slowness which I noticed may come from the graphical
UI, not from the engine - this should be researched further.)

> We could start a wiki page listing the specific problems that we see with
> clucene, and investigate if they can be solved. At the same time we can
> collect information about other search engines in a matrix of
> features/properties that we do need. Maybe we come up with something better,
> more stable and feature-rich than clucene?
>

I'm extremely pessimistic about this (creating a new engine). It's not
impossible, but given the amount of men and time it's better to leave it
to others who already have done it. This may of course change if we gain
more interest and developers from Windows community later. But even then
it may be better to help (c)lucene.

> A major problem that I see: What about our release roadmap? We should not
> start changing the search engine in the 1.7.x branch/release cycle. I'm
> unhappy with the status quo, we cannot stay in beta state for a long time and
> continue changing the internals of our software. We should release FIRST, and
> THEN start making major changes.

I wasn't and am not going to start anything ATM. A wiki page might
really be a good idea. Time spent on thinking about this is not wasted
time, even though it may never realize. A wiki page could help the Sword
engine, too.

  Yours,
	Eeli Kaikkonen (Mr.), Oulu, Finland
	e-mail: eekaikko at mailx.studentx.oulux.fix (with no x)



More information about the bt-devel mailing list