[sword-devel] search failing in Hebrew modules
ransom1982 at gmail.com
Thu Jul 30 21:43:12 MST 2009
On Fri, Jul 31, 2009 at 12:25 AM, Troy A. Griffitts<scribe at crosswire.org> wrote:
> Regarding languages with diacritics, accents, cantillation, etc...
> The SWORD M.O. is to have one set of StripFilters that massage both:
> o the body of the text being searched
> o the target search string
> so we can get sane results.
> With Greek we've been fairly intentional to strip accents and ms markup from
> both module and search text input for our searching. I would bet we still
> have some last minute code added somewhere which does special things if
> we're in a Greek text-- obviously this should be remedied. I doubt we've
> done the same for Hebrew. e.g., I would bet unaccented Greek searches would
> work fine in SWORDweb, but consonant-only Hebrew searches would not work.
> In anycase, the proper way to make things work is to have appropriate
> StripFilter entries in the wlc.conf, and to be sure Xiphos is calling
> module.StripText(userInputSearchText) before calling SWORD's search
> mechanism to be sure we're comparing equivalent texts.
> Does this make sense?
For the most part. I'm still not sure what behavior is expected in the
case of Hebrew. I think that users would like to be able to search
both with and without vowel points (though I'm not sure about that).
However, I am sure that the user who reported this bug would like to
be able to search *without* vowel points, which is currently
impossible in the WLC. I'm not sure if you're saying that this is
because there is an entry missing in the .conf or if something is
wrong in the frontend or engine. Xiphos and swordweb behave
identically here, so if it's a problem in the frontends, both of them
have the same problem. An example word is מְאֹרֹת which returns 6
matches in Xiphos and 6 in swordweb. The unpointed מארת returns 0 in
(and again, there is the interesting case of diatheke which seems to
act exactly opposite to this)
More information about the sword-devel