[bt-devel] [ bibletime-Feature Requests-2097655 ] Troubles with search engine in French

SourceForge.net noreply at sourceforge.net
Thu Sep 11 10:49:47 MST 2008


Feature Requests item #2097655, was opened at 2008-09-06 23:18
Message generated for change (Comment added) made by eelik
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=350954&aid=2097655&group_id=954

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Olivier Keshavjee (okeshavjee)
Assigned to: Nobody/Anonymous (nobody)
Summary: Troubles with search engine in French

Initial Comment:
When looking for a word in French, we do not get all the occurrences if there is a short article in front.

Example : looking for "insensé" will not return the occurrences of "l'insensé" or "d'insensé"... Doesn't work even with "*insensé".

Thanks :)

Using BibleTime 1.6.5 



----------------------------------------------------------------------

Comment By: Eeli Kaikkonen (eelik)
Date: 2008-09-11 17:49

Message:
Clucene is a c++ port of Lucene written in Java. This is from the Lucene
FAQ: "Leading wildcards (e.g. *ook) are not supported by the QueryParser by
default. As of Lucene 2.1, they can be enabled by calling
QueryParser.setAllowLeadingWildcard( true ). Note that this can be an
expensive operation: it requires scanning the list of tokens in the index
in its entirety to look for those that match the pattern."

Unfortunately clucene is too much behind in progress so we can't expect
this to work in near future.

----------------------------------------------------------------------

Comment By: Eeli Kaikkonen (eelik)
Date: 2008-09-10 14:06

Message:
We use clucene, a 3rd party search engine. It indexes the words and
currently we do not care about the language so it uses the basic algorithm.
I'm just looking at the clucene faq and it states that clucene is capable
of using wildcards in the beginning of the word, but we are apparently not
using this feature. Clucene can also use language specific analyzers but
this may be too much work for us.

Enabling the wildcards would be the easiest solution and effective for
many languages. I think we have to look at it at some point.

----------------------------------------------------------------------

Comment By: Eeli Kaikkonen (eelik)
Date: 2008-09-10 14:05

Message:
We use clucene, a 3rd party search engine. It indexes the words and
currently we do not care about the language so it uses the basic algorithm.
I'm just looking at the clucene faq and it states that clucene is capable
of using wildcards in the beginning of the word, but we are apparently not
using this feature. Clucene can also use language specific analyzers but
this may be too much work for us.

Enabling the wildcards would be the easiest solution and effective for
many languages. I think we have to look at it at some point.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=350954&aid=2097655&group_id=954



More information about the bt-devel mailing list