public class ConfigurableSnowballAnalyzer extends AbstractBookAnalyzer
TokenStream is built from a
LowerCaseTokenizer filtered with SnowballFilter (optional)
and StopFilter (optional) Default behavior: Stemming is done, Stop
words not removed A snowball stemmer is configured according to the language
of the Book. Currently it takes following stemmer names (available stemmers
in lucene snowball package net.sf.snowball.ext)
Danish
Dutch
English
Finnish
French
German2
German
Italian
Kp
Lovins
Norwegian
Porter
Portuguese
Russian
Spanish
Swedish
This list is expected to expand, as and when Snowball project support more
languagesThe GNU Lesser General Public License for details.| Modifier and Type | Field and Description |
|---|---|
private static HashMap<String,Set<?>> |
defaultStopWordMap |
private static Map<String,String> |
languageCodeToStemmerLanguageNameMap |
private org.apache.lucene.util.Version |
matchVersion |
private String |
stemmerName
The name of the stemmer to use.
|
book, doStemming, doStopWords, stopSet| Constructor and Description |
|---|
ConfigurableSnowballAnalyzer() |
| Modifier and Type | Method and Description |
|---|---|
void |
pickStemmer(String languageCode)
Given the name of a stemmer, use that one.
|
org.apache.lucene.analysis.TokenStream |
reusableTokenStream(String fieldName,
Reader reader) |
void |
setBook(Book newBook)
The book for which analysis is being performed.
|
org.apache.lucene.analysis.TokenStream |
tokenStream(String fieldName,
Reader reader)
Filters
LowerCaseTokenizer with StopFilter if enabled and
SnowballFilter. |
getBook, getDoStopWords, setDoStemming, setDoStopWords, setStopWordsprivate String stemmerName
private static Map<String,String> languageCodeToStemmerLanguageNameMap
private final org.apache.lucene.util.Version matchVersion
public final org.apache.lucene.analysis.TokenStream tokenStream(String fieldName, Reader reader)
LowerCaseTokenizer with StopFilter if enabled and
SnowballFilter.tokenStream in class org.apache.lucene.analysis.Analyzerpublic org.apache.lucene.analysis.TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
reusableTokenStream in class org.apache.lucene.analysis.AnalyzerIOExceptionpublic void setBook(Book newBook)
AbstractBookAnalyzersetBook in class AbstractBookAnalyzerpublic void pickStemmer(String languageCode)
languageCode -