[sword-devel] Search in any language (was: Search in Chinese modules)

David Haslam d.haslam at ukonline.co.uk
Thu Feb 11 06:21:51 MST 2010


There are similar issues when it comes to searching the Burmese Judson module
(Myanmar script is very complex).

KS, (a contact with a close interest in Myanmar languages) recently wrote
this in an email to me.

"The normal Sword library has the option to use Lucene for searching.
However, the StandardAnalyser assumes that words are space based and seems
to ignore Unicode marks. This results in very bad search results for any
language based on the Myanmar script. I've therefore downloaded the CLucene
library 0.9.23 from git and patched it to call a Myanmar specific tokenizer
if the LanguageBasedAnalyzer is used. The LanguageBasedAnalyzer defaults to
the StandardTokenizer if no language specific tokenizer is found. Once I've
tested this some more, I hope to submit it to the CLucene project and see if
they will incorporate it. Would Crosswire be interested in accepting patches
to use CLucene 0.9.23 (rather than 0.9.21 as seems to be the case at
present) and the LanguageBasedAnalyzer?"

David
-- 
View this message in context: http://n4.nabble.com/Search-in-Chinese-modules-tp1476753p1477236.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list