org.crosswire.jsword.index.lucene.analysis
Class ConfigurableSnowballAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.crosswire.jsword.index.lucene.analysis.AbstractBookAnalyzer
org.crosswire.jsword.index.lucene.analysis.ConfigurableSnowballAnalyzer
- All Implemented Interfaces:
- Closeable
public class ConfigurableSnowballAnalyzer
- extends AbstractBookAnalyzer
An Analyzer whose TokenStream
is built from a
LowerCaseTokenizer
filtered with SnowballFilter
(optional)
and StopFilter
(optional) Default behavior: Stemming is done, Stop
words not removed A snowball stemmer is configured according to the language
of the Book. Currently it takes following stemmer names (available stemmers
in lucene snowball package net.sf.snowball.ext)
Danish
Dutch
English
Finnish
French
German2
German
Italian
Kp
Lovins
Norwegian
Porter
Portuguese
Russian
Spanish
Swedish
This list is expected to expand, as and when Snowball project support more
languages
- Author:
- sijo cherian
- See Also:
The GNU Lesser General Public License for details.
Fields inherited from class org.apache.lucene.analysis.Analyzer |
overridesTokenStreamMethod |
Method Summary |
void |
pickStemmer(String languageCode)
Given the name of a stemmer, use that one. |
org.apache.lucene.analysis.TokenStream |
reusableTokenStream(String fieldName,
Reader reader)
|
void |
setBook(Book newBook)
The book for which analysis is being performed. |
org.apache.lucene.analysis.TokenStream |
tokenStream(String fieldName,
Reader reader)
Filters LowerCaseTokenizer with StopFilter if enabled and
SnowballFilter . |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setOverridesTokenStreamMethod, setPreviousTokenStream |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
stemmerName
private String stemmerName
- The name of the stemmer to use.
languageCodeToStemmerLanguageNameMap
private static Map<String,String> languageCodeToStemmerLanguageNameMap
defaultStopWordMap
private static HashMap<String,Set<?>> defaultStopWordMap
matchVersion
private final org.apache.lucene.util.Version matchVersion
ConfigurableSnowballAnalyzer
public ConfigurableSnowballAnalyzer()
tokenStream
public final org.apache.lucene.analysis.TokenStream tokenStream(String fieldName,
Reader reader)
- Filters
LowerCaseTokenizer
with StopFilter
if enabled and
SnowballFilter
.
- Specified by:
tokenStream
in class org.apache.lucene.analysis.Analyzer
reusableTokenStream
public org.apache.lucene.analysis.TokenStream reusableTokenStream(String fieldName,
Reader reader)
throws IOException
- Overrides:
reusableTokenStream
in class org.apache.lucene.analysis.Analyzer
- Throws:
IOException
setBook
public void setBook(Book newBook)
- Description copied from class:
AbstractBookAnalyzer
- The book for which analysis is being performed.
- Overrides:
setBook
in class AbstractBookAnalyzer
pickStemmer
public void pickStemmer(String languageCode)
- Given the name of a stemmer, use that one.
- Parameters:
languageCode
-