[sword-devel] Search bug & New Arabic Bible, Not Shaped SVD Version
dmsmith at crosswire.org
Mon Nov 26 07:12:05 MST 2012
Correct. JSword uses Lucene's filter for the language, which does more normalization than the StandardAnalyzer which SWORD uses exclusively. The StandardAnalyzer should only be used for "unaccented" latinate text. Same with the SimpleAnalyzer. (In Lucene, an analyzer is a filter chain which normalizes text. Rule-of-Thumb: the same should be used for both index construction and searching.)
Each release of Lucene adds and/or improves the filters for non-latin text.
The biggest problem with using a new version of Lucene is that it invalidates, without notice, prior indexes. An analyzer may change from release to release. It has been true of the StandardAnalyzer. The impact is that the number of search hits may be reduced, perhaps to 0.
Both SWORD and JSword need a mechanism to record the version of Lucene that is used in constructing an index and to refuse to search an index unless the version of Lucene for searching and indexing match.
Also of note, there have been some substantial changes to Unicode from release to release. So, if the version unicode used by the OS, Java, ICU, .... changes, the index may no longer be valid. From what I can tell this will be minority languages.
On Nov 26, 2012, at 7:22 AM, Peter von Kaehne <refdoc at gmx.net> wrote:
>> Von: David Haslam <dfhmch at googlemail.com>
>> So a similar patch would be necessary in principle to JSword ???
> No. If And Bible does not have a problem, then Jsword does its job correctly.
> sword-devel mailing list: sword-devel at crosswire.org
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel