public class SmartChineseLuceneAnalyzer extends AbstractBookAnalyzer
SmartChineseAnalyzer
, which takes overlapping
two character tokenization approach which leads to larger index size, like
org.apache.lucene.analyzer.cjk.CJKAnalyzer
. This analyzer's stop list
is merely of punctuation. It does stemming of English.The GNU Lesser General Public License for details.
Modifier and Type | Field and Description |
---|---|
private org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer |
myAnalyzer |
book, doStemming, doStopWords, stopSet
Constructor and Description |
---|
SmartChineseLuceneAnalyzer() |
Modifier and Type | Method and Description |
---|---|
org.apache.lucene.analysis.TokenStream |
reusableTokenStream(String fieldName,
Reader reader) |
org.apache.lucene.analysis.TokenStream |
tokenStream(String fieldName,
Reader reader) |
getBook, getDoStopWords, setBook, setDoStemming, setDoStopWords, setStopWords
public final org.apache.lucene.analysis.TokenStream tokenStream(String fieldName, Reader reader)
tokenStream
in class org.apache.lucene.analysis.Analyzer
public final org.apache.lucene.analysis.TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
reusableTokenStream
in class org.apache.lucene.analysis.Analyzer
IOException