[bt-devel] [ bibletime-Feature Requests-2097655 ] Troubles with search engine in French

SourceForge.net noreply at sourceforge.net
Thu Sep 11 14:08:59 MST 2008


Feature Requests item #2097655, was opened at 2008-09-06 23:18
Message generated for change (Comment added) made by nobody
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=350954&aid=2097655&group_id=954

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Olivier Keshavjee (okeshavjee)
Assigned to: Nobody/Anonymous (nobody)
Summary: Troubles with search engine in French

Initial Comment:
When looking for a word in French, we do not get all the occurrences if there is a short article in front.

Example : looking for "insens" will not return the occurrences of "l'insens" or "d'insens"... Doesn't work even with "*insens".

Thanks :)

Using BibleTime 1.6.5 



----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2008-09-11 21:08

Message:
Of course. I understand.

I had some knowledge in C++, but i guess it's to far away to be able to
contribute.
I'll pray, then :)

----------------------------------------------------------------------

Comment By: Eeli Kaikkonen (eelik)
Date: 2008-09-11 18:37

Message:
The java code is here:
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/fr/.
It would not be impossible to translate it to c++ but it's not reasonable.
Crosswire has Bibles for over 50 languages and I will not write an analyzer
for a language I don't know (even though French is admittedly more
important than most of them). Contributions are of course welcome if
someone is interested. An analyzer would serve not only the BibleTime users
but all clucene users.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2008-09-11 18:16

Message:
Thx for searching.

And what about the language specific analyzers ? Is it really to much work
to implement ?

----------------------------------------------------------------------

Comment By: Eeli Kaikkonen (eelik)
Date: 2008-09-11 17:49

Message:
Clucene is a c++ port of Lucene written in Java. This is from the Lucene
FAQ: "Leading wildcards (e.g. *ook) are not supported by the QueryParser by
default. As of Lucene 2.1, they can be enabled by calling
QueryParser.setAllowLeadingWildcard( true ). Note that this can be an
expensive operation: it requires scanning the list of tokens in the index
in its entirety to look for those that match the pattern."

Unfortunately clucene is too much behind in progress so we can't expect
this to work in near future.

----------------------------------------------------------------------

Comment By: Eeli Kaikkonen (eelik)
Date: 2008-09-10 14:06

Message:
We use clucene, a 3rd party search engine. It indexes the words and
currently we do not care about the language so it uses the basic algorithm.
I'm just looking at the clucene faq and it states that clucene is capable
of using wildcards in the beginning of the word, but we are apparently not
using this feature. Clucene can also use language specific analyzers but
this may be too much work for us.

Enabling the wildcards would be the easiest solution and effective for
many languages. I think we have to look at it at some point.

----------------------------------------------------------------------

Comment By: Eeli Kaikkonen (eelik)
Date: 2008-09-10 14:05

Message:
We use clucene, a 3rd party search engine. It indexes the words and
currently we do not care about the language so it uses the basic algorithm.
I'm just looking at the clucene faq and it states that clucene is capable
of using wildcards in the beginning of the word, but we are apparently not
using this feature. Clucene can also use language specific analyzers but
this may be too much work for us.

Enabling the wildcards would be the easiest solution and effective for
many languages. I think we have to look at it at some point.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=350954&aid=2097655&group_id=954



More information about the bt-devel mailing list