org.apache.lucene.analysis.ja
Class JapaneseKatakanaStemFilter

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.TokenFilter
              extended by org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter
All Implemented Interfaces:
Closeable

public final class JapaneseKatakanaStemFilter
extends org.apache.lucene.analysis.TokenFilter

Convert a katakana word to a normalized form by stemming KATAKANA-HIRAGANA PROLONGED SOUND MARK (U+30FC) which exists at the last of the string. In general, most of Japanese full-text search engine uses more complicated method which needs dictionaries. I think they are better than this filter in quality, but they needs a well-tuned dictionary. In contract, this filter is simple and maintenance-free.

Note: This filter don't supports hankaku katakana characters, so you must convert them before using this filter. And this filter support only pre-composed characters.

To prevent terms from being stemmed use an instance of KeywordMarkerFilter or a custom TokenFilter that sets the KeywordAttribute before this TokenStream.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
 
Constructor Summary
JapaneseKatakanaStemFilter(org.apache.lucene.analysis.TokenStream in)
           
 
Method Summary
 boolean incrementToken()
          Returns the next input Token, after being stemmed
 
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, reset
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

JapaneseKatakanaStemFilter

public JapaneseKatakanaStemFilter(org.apache.lucene.analysis.TokenStream in)
Method Detail

incrementToken

public boolean incrementToken()
                       throws IOException
Returns the next input Token, after being stemmed

Specified by:
incrementToken in class org.apache.lucene.analysis.TokenStream
Throws:
IOException


Copyright © 2012. All Rights Reserved.