org.apache.lucene.analysis.ja
Class JapaneseAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.ReusableAnalyzerBase
org.apache.lucene.analysis.StopwordAnalyzerBase
org.apache.lucene.analysis.ja.JapaneseAnalyzer
- All Implemented Interfaces:
- Closeable
public class JapaneseAnalyzer
- extends org.apache.lucene.analysis.StopwordAnalyzerBase
Analyzer for Japanese which uses "Sen" morphological analyzer.
| Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase |
org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents |
| Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase |
matchVersion, stopwords |
|
Constructor Summary |
JapaneseAnalyzer(org.apache.lucene.util.Version version)
Create a JapaneseAnalyzer with the default stopwords and stoptags and no stemExclusionSet |
JapaneseAnalyzer(org.apache.lucene.util.Version version,
Set<?> stopwords,
Set<String> stoptags,
Set<?> stemExclusionSet,
String dictionaryDir)
Create a JapaneseAnalyzer with the specified stopwords, stoptags, and stemExclusionSet |
JapaneseAnalyzer(org.apache.lucene.util.Version version,
String dictionaryDir)
Create a JapaneseAnalyzer with the default stopwords and stoptags and no stemExclusionSet
and argument of dictionaryDir. |
| Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase |
getStopwordSet, loadStopwordSet |
| Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase |
initReader, reusableTokenStream, tokenStream |
| Methods inherited from class org.apache.lucene.analysis.Analyzer |
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
JapaneseAnalyzer
public JapaneseAnalyzer(org.apache.lucene.util.Version version)
- Create a JapaneseAnalyzer with the default stopwords and stoptags and no stemExclusionSet
JapaneseAnalyzer
public JapaneseAnalyzer(org.apache.lucene.util.Version version,
String dictionaryDir)
- Create a JapaneseAnalyzer with the default stopwords and stoptags and no stemExclusionSet
and argument of dictionaryDir.
JapaneseAnalyzer
public JapaneseAnalyzer(org.apache.lucene.util.Version version,
Set<?> stopwords,
Set<String> stoptags,
Set<?> stemExclusionSet,
String dictionaryDir)
- Create a JapaneseAnalyzer with the specified stopwords, stoptags, and stemExclusionSet
- Parameters:
version - lucene compatibility versionstopwords - a stopword set: words matching these (Surfstoptags - a stoptags set: words containing these parts of speech will be removed from the stream.stemExclusionSet - a stemming exclusion set: these words are ignored by
JapaneseBasicFormFilter and JapaneseKatakanaStemFilterdictionaryDir - a directory of dictionary
getDefaultStopSet
public static Set<?> getDefaultStopSet()
getDefaultStopTags
public static Set<String> getDefaultStopTags()
createComponents
protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents(String field,
Reader reader)
- Creates
org.apache.lucene.analysis.util.ReusableAnalyzerBase.TokenStreamComponents
used to tokenize all the text in the provided Reader.
- Specified by:
createComponents in class org.apache.lucene.analysis.ReusableAnalyzerBase
- Returns:
org.apache.lucene.analysis.util.ReusableAnalyzerBase.TokenStreamComponents
built from a JapaneseTokenizer filtered with
JapaneseWidthFilter, JapanesePunctuationFilter,
JapanesePartOfSpeechStopFilter, JapaneseStopFilter,
KeywordMarkerFilter if a stem exclusion set is provided,
JapaneseBasicFormFilter, JapaneseKatakanaStemFilter,
and LowerCaseFilter
Copyright © 2012. All Rights Reserved.