[sword-devel] search failing in Hebrew modules
dmsmith at crosswire.org
Thu Jul 30 19:35:35 MST 2009
Couple of thoughts
Assuming the search is a Lucene search.
Unicode can have multiple possible representations (byte sequences)
for a single decorated character. Search will work only if the request
and index match.
The index has a single representation of the text. The analyzer
assumes English as input and applies all kinds of transforms that may
not be appropriate for Hebrew.
When a search is performed the same analyzer is used to transform the
search request. Generally this is sufficient to ensuer that the search
will work. If the search request is not or is not transformed first
into the same Unicode representation, then the search will fail as it
will not form the stored byte sequence. Typically copy of displayed
text for a search request will work. Typically typed input will fail.
It is just too difficult to type the same stored text.
IIRC, SWORD will use the current filters (e.g. Remove accents) in
building the index. Searches that don't apply the same filters to the
request as used to build the index will fail.
In don't know if that helps at all.
Sent from my phone
On Jul 30, 2009, at 9:45 PM, Karl Kleinpaste <karl at kleinpaste.org>
> We have a Xiphos bug report from a user who reads Hebrew, which
> complains that search fails in Hebrew modules. A couple of special
> cases work (e.g. אֱלֹהִ֑ים - that may not appear properly, I
> do mail
> using Gnus in XEmacs, which has trouble with Hebrew), but by far the
> general case fails. This is even so when copying/pasting text from
> into Xiphos' search boxes and failing to have that same text from that
> same verse retrieved as a result.
> Xiphos' use of search facilities is essentially blind. We take text
> from an input box, and hand that string verbatim to the Sword search
> call. To the extent that I can trace what happens here, Xiphos is
> exactly what is expected -- I don't think we have a bug in how we are
> making the search query.
> So I'm left with what appears to be a Sword bug in handling Hebrew.
> Whether it's a unicode problem or something else, I'm not qualified to
> say. But I'd appreciate it if those more in the know about this
> sort of
> problem could take a look and try to verify whether this is correct.
> sword-devel mailing list: sword-devel at crosswire.org
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel