[sword-devel] Search in Chinese modules
dmsmith at crosswire.org
Wed Feb 10 18:47:12 MST 2010
Chinese needs a special analyzer. In java Lucene there are 3 choices.
Two of them do some kind of bigram search. Basically it takes every
two chars and indexes them. So ABCD is indexed as AB BC CD. The same
analyzer would be used to prepare the search request.
From what I gather spaces are not the appropriate "word" boundary.
In JSword we use the module's lang to pick an appropriate analyzer.
When we added it we didn't worry about backward compatibility. We
considered it as a bug fix. No one complained about having to rebuild
indexes. We did get thanks, though.
On Feb 10, 2010, at 8:12 PM, Nic Carter <niccarter at mac.com> wrote:
> Hi team.
> I received a question the other day about searching in Chinese
> Bibles. It appears that clucene does word-based search & so if you
> search for a specific character in Chinese, it will only find it if
> there is a space before and after the character. To me, this sounds
> like the correct behaviour, but I'm not sure if it is? Should I be
> suggesting to this guy that he should do a C* search, where C is the
> chinese character? or C~ ? or what do other people do when
> searching in Chinese texts?
> Any help anyone can give would be greatly appreciated. :)
> Thanks, ybic
> nic... :)
> Nic Carter
> PocketSword Developer - an iPhone Bible Study app
> sword-devel mailing list: sword-devel at crosswire.org
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel