[bt-devel] [ bibletime-Bugs-1619594 ] Use of wildcards not consistent

SourceForge.net noreply at sourceforge.net
Mon May 11 04:22:24 MST 2009


Bugs item #1619594, was opened at 2006-12-20 16:37
Message generated for change (Settings changed) made by eelik
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100954&aid=1619594&group_id=954

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Frontend / Search dialog
Group: None
Status: Open
Resolution: None
>Priority: 3
Private: No
Submitted By: Wolfgang Stradner (ewst)
Assigned to: Joachim Ansorg (joachim)
Summary: Use of wildcards not consistent

Initial Comment:
Using of wildcards ? and * is possible in BT (BT 1.6.2) using the clucene search engine (0.9.16a).

Here are some test-results of this:
Searching in GermElb1905/Matthew for:

- her : finds all the words her: OK

- her* : finds
-- Herrn, Herodes, hervorkommen : OK
-- welcher, himmlischer: not OK

- *her : does not find anything

- he*r : finds
-- himmlischer, her, Heuchler, Herr : OK
-- herabgestiegen, herzu, Herodes, not OK

- h?er : finds all the hier, but they are not marked in yellow (by contrast in Mat 28:6 it marks her in yellow)

- her? : finds 
-- Herr :unmarked
- her : marked (I would not expect this hit as I understand ? to be a one-place-joker in contrast to * which can be 0,1 or a more places-joker (as the following example shows):

- ?ehr : finds
-- mehr,sehr : OK (only it should be marked)



----------------------------------------------------------------------

Comment By: Eeli Kaikkonen (eelik)
Date: 2009-03-27 10:03

Message:
There have been some discussion about the prepended joker marks. We need
that feature for some languages but it doesn't currently exist in clucene.
There's not much we can do unless someone volunteers to change clucene.
Note also that even if clucene enables prepended joker marks with existing
indexes, it will slow down very much. For a quick search we would need
another, reversed index.

See
https://sourceforge.net/tracker/?func=detail&aid=2097655&group_id=954&atid=350954
for discussion about clucene features.

The inline * works correctly if means 0 or more characters. For ordinary
users it's of course wrong, but this also is a clucene dependent thing.
Someone could check if clucene supports 1 or more characters behaviour.

----------------------------------------------------------------------

Comment By: Jonathan Marsden (jmarsden)
Date: 2009-03-27 08:51

Message:
Per http://clucene.wiki.sourceforge.net/Official_CLucene_FAQ
some of the examples given in this report are invalid.
In particular, wildcards may not be placed at the start
of a word.  So both *her and ?ehr are invalid.

The other unexpected results are still happening in Bibletime
2.0 alpha3 for me on Ubuntu 8.10 Intrepid x64.

The way he*r matches herabgestiegen and so forth is 
clearly incorrect.  While the examples used are simple
and so easy to spot, the real danger here is that a user
could rely on a complex search without realizing that one
or more completely incorrect results are being returned.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100954&aid=1619594&group_id=954



More information about the bt-devel mailing list