org.apache.lucene.analysis.ja
Class JapaneseAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.ReusableAnalyzerBase
          extended by org.apache.lucene.analysis.StopwordAnalyzerBase
              extended by org.apache.lucene.analysis.ja.JapaneseAnalyzer
All Implemented Interfaces:
Closeable

public class JapaneseAnalyzer
extends org.apache.lucene.analysis.StopwordAnalyzerBase

Analyzer for Japanese which uses "Sen" morphological analyzer.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
matchVersion, stopwords
 
Constructor Summary
JapaneseAnalyzer(org.apache.lucene.util.Version version)
          Create a JapaneseAnalyzer with the default stopwords and stoptags and no stemExclusionSet
JapaneseAnalyzer(org.apache.lucene.util.Version version, Set<?> stopwords, Set<String> stoptags, Set<?> stemExclusionSet, String dictionaryDir)
          Create a JapaneseAnalyzer with the specified stopwords, stoptags, and stemExclusionSet
JapaneseAnalyzer(org.apache.lucene.util.Version version, String dictionaryDir)
          Create a JapaneseAnalyzer with the default stopwords and stoptags and no stemExclusionSet
and argument of dictionaryDir.
 
Method Summary
protected  org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents(String field, Reader reader)
          Creates org.apache.lucene.analysis.util.ReusableAnalyzerBase.TokenStreamComponents used to tokenize all the text in the provided Reader.
static Set<?> getDefaultStopSet()
           
static Set<String> getDefaultStopTags()
           
 
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet
 
Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
initReader, reusableTokenStream, tokenStream
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

JapaneseAnalyzer

public JapaneseAnalyzer(org.apache.lucene.util.Version version)
Create a JapaneseAnalyzer with the default stopwords and stoptags and no stemExclusionSet


JapaneseAnalyzer

public JapaneseAnalyzer(org.apache.lucene.util.Version version,
                        String dictionaryDir)
Create a JapaneseAnalyzer with the default stopwords and stoptags and no stemExclusionSet
and argument of dictionaryDir.


JapaneseAnalyzer

public JapaneseAnalyzer(org.apache.lucene.util.Version version,
                        Set<?> stopwords,
                        Set<String> stoptags,
                        Set<?> stemExclusionSet,
                        String dictionaryDir)
Create a JapaneseAnalyzer with the specified stopwords, stoptags, and stemExclusionSet

Parameters:
version - lucene compatibility version
stopwords - a stopword set: words matching these (Surf
stoptags - a stoptags set: words containing these parts of speech will be removed from the stream.
stemExclusionSet - a stemming exclusion set: these words are ignored by JapaneseBasicFormFilter and JapaneseKatakanaStemFilter
dictionaryDir - a directory of dictionary
Method Detail

getDefaultStopSet

public static Set<?> getDefaultStopSet()

getDefaultStopTags

public static Set<String> getDefaultStopTags()

createComponents

protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents(String field,
                                                                                                 Reader reader)
Creates org.apache.lucene.analysis.util.ReusableAnalyzerBase.TokenStreamComponents used to tokenize all the text in the provided Reader.

Specified by:
createComponents in class org.apache.lucene.analysis.ReusableAnalyzerBase
Returns:
org.apache.lucene.analysis.util.ReusableAnalyzerBase.TokenStreamComponents built from a JapaneseTokenizer filtered with JapaneseWidthFilter, JapanesePunctuationFilter, JapanesePartOfSpeechStopFilter, JapaneseStopFilter, KeywordMarkerFilter if a stem exclusion set is provided, JapaneseBasicFormFilter, JapaneseKatakanaStemFilter, and LowerCaseFilter


Copyright © 2012. All Rights Reserved.