org.apache.jackrabbit.oak.plugins.index.lucene.util
Class CompoundWordTokenFilterBase
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.jackrabbit.oak.plugins.index.lucene.util.CompoundWordTokenFilterBase
- All Implemented Interfaces:
- Closeable
- Direct Known Subclasses:
- OakWordTokenFilter
public abstract class CompoundWordTokenFilterBase
- extends org.apache.lucene.analysis.TokenFilter
Base class for decomposition token filters.
You must specify the required Version compatibility when creating
CompoundWordTokenFilterBase:
- As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0
supplementary characters in strings and char arrays provided as compound word
dictionaries.
| Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State |
| Fields inherited from class org.apache.lucene.analysis.TokenFilter |
input |
|
Constructor Summary |
protected |
CompoundWordTokenFilterBase(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.analysis.TokenStream input,
org.apache.lucene.analysis.util.CharArraySet dictionary)
|
protected |
CompoundWordTokenFilterBase(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.analysis.TokenStream input,
org.apache.lucene.analysis.util.CharArraySet dictionary,
boolean onlyLongestMatch)
|
protected |
CompoundWordTokenFilterBase(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.analysis.TokenStream input,
org.apache.lucene.analysis.util.CharArraySet dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
| Methods inherited from class org.apache.lucene.analysis.TokenFilter |
close, end |
| Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString |
DEFAULT_MIN_WORD_SIZE
public static final int DEFAULT_MIN_WORD_SIZE
- The default for minimal word length that gets decomposed
- See Also:
- Constant Field Values
DEFAULT_MIN_SUBWORD_SIZE
public static final int DEFAULT_MIN_SUBWORD_SIZE
- The default for minimal length of subwords that get propagated to the output of this filter
- See Also:
- Constant Field Values
DEFAULT_MAX_SUBWORD_SIZE
public static final int DEFAULT_MAX_SUBWORD_SIZE
- The default for maximal length of subwords that get propagated to the output of this filter
- See Also:
- Constant Field Values
dictionary
protected final org.apache.lucene.analysis.util.CharArraySet dictionary
tokens
protected final LinkedList<CompoundWordTokenFilterBase.CompoundToken> tokens
minWordSize
protected final int minWordSize
minSubwordSize
protected final int minSubwordSize
maxSubwordSize
protected final int maxSubwordSize
onlyLongestMatch
protected final boolean onlyLongestMatch
termAtt
protected final org.apache.lucene.analysis.tokenattributes.CharTermAttribute termAtt
offsetAtt
protected final org.apache.lucene.analysis.tokenattributes.OffsetAttribute offsetAtt
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.analysis.TokenStream input,
org.apache.lucene.analysis.util.CharArraySet dictionary,
boolean onlyLongestMatch)
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.analysis.TokenStream input,
org.apache.lucene.analysis.util.CharArraySet dictionary)
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.analysis.TokenStream input,
org.apache.lucene.analysis.util.CharArraySet dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
incrementToken
public final boolean incrementToken()
throws IOException
- Specified by:
incrementToken in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
decompose
protected abstract void decompose()
- Decomposes the current
termAtt and places CompoundWordTokenFilterBase.CompoundToken instances in the tokens list.
The original token may not be placed in the list, as it is automatically passed through this filter.
reset
public void reset()
throws IOException
- Overrides:
reset in class org.apache.lucene.analysis.TokenFilter
- Throws:
IOException
Copyright © 2012-2014 The Apache Software Foundation. All Rights Reserved.