A B C D E F G H I J K L M N O P R S T U V W

A

accept() - Method in class org.apache.lucene.analysis.ja.JapanesePartOfSpeechKeepFilter
 
accept() - Method in class org.apache.lucene.analysis.ja.JapanesePartOfSpeechStopFilter
 
accept() - Method in class org.apache.lucene.analysis.ja.JapanesePunctuationFilter
 
add(String, CToken) - Method in class net.java.sen.compiler.VirtualTupleList
Adds a StringCTokenTuple to the list.
addFilter(int, ReadingFilter) - Method in class net.java.sen.ReadingProcessor
Adds a reading filter to be applied during processing
addFilter(StreamFilter) - Method in class net.java.sen.StreamTagger
Adds a StreamFilter
addFilter(StreamFilter) - Method in class net.java.sen.StringTagger
Add a StreamFilter to be applied during analysis
analyze(String, List<Token>) - Method in class net.java.sen.StringTagger
Decompose a string into its most likely constituent morphemes
analyze(String) - Method in class net.java.sen.StringTagger
Deprecated. use StringTagger.analyze(String, List) instead.
analyze(char[], List<Token>) - Method in class net.java.sen.StringTagger
Decompose a string into its most likely constituent morphemes
analyze(char[]) - Method in class net.java.sen.StringTagger
Deprecated. use StringTagger.analyze(char[], List) instead.
append(String) - Method in class net.java.sen.util.CSVData
Appends a value to the line

B

baseReadings - Variable in class net.java.sen.filter.ReadingNode
A sorted list of readings within the covered range of morphemes.
BasicFormAttribute - Interface in org.apache.lucene.analysis.ja.tokenAttributes
Attribute for Morpheme.getBasicForm().
BasicFormAttributeImpl - Class in org.apache.lucene.analysis.ja.tokenAttributes
 
BasicFormAttributeImpl() - Constructor for class org.apache.lucene.analysis.ja.tokenAttributes.BasicFormAttributeImpl
 
bosNode - Variable in class net.java.sen.dictionary.Tokenizer
A Node representing a beginning-of-string
build(String) - Method in class net.java.sen.compiler.IpadicPreprocessor
Preprocesses the dictionary
build(String) - Method in class net.java.sen.trie.TrieBuilder
Builds The trie data file
buildConnectionCSV(String) - Method in class net.java.sen.compiler.IpadicPreprocessor
Builds a connection CSV file from an unpacked ipadic
buildTable(BufferedReader, int, int, String) - Static method in class net.java.sen.tools.CompoundWordTableCompiler
Builds a compound word table

C

CharIterator - Interface in net.java.sen.trie
An iterator interface for chars
clear() - Method in class net.java.sen.util.CSVData
Removes all values from the line
clear() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.BasicFormAttributeImpl
 
clear() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.ConjugationAttributeImpl
 
clear() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.CostAttributeImpl
 
clear() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.PartOfSpeechAttributeImpl
 
clear() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.PronunciationsAttributeImpl
 
clear() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.ReadingsAttributeImpl
 
clear() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.SentenceStartAttributeImpl
 
clearFilters() - Method in class net.java.sen.ReadingProcessor
Removes any previously set reading filters
clone() - Method in class net.java.sen.dictionary.CToken
 
clone() - Method in class net.java.sen.dictionary.Node
 
CommentFilter - Class in net.java.sen.filter.stream
A filter to ignore delimited comments in the input sentence
CommentFilter() - Constructor for class net.java.sen.filter.stream.CommentFilter
 
commonPrefixSearch(CharIterator) - Method in class net.java.sen.dictionary.Dictionary
Searches for possible morphemes starting at the current position of a CharIterator.
commonPrefixSearch(IntBuffer, CharIterator, int[]) - Static method in class net.java.sen.trie.TrieSearcher
Searches for Trie keys forming a complete substring of the given sentence, starting at the given position within the sentence
compareTo(StringCTokenTuple) - Method in class net.java.sen.compiler.StringCTokenTuple
 
CompositeTokenFilter - Class in net.java.sen.filter.stream
A Filter that replaces multiple similar Tokens with a single composite Token
CompositeTokenFilter() - Constructor for class net.java.sen.filter.stream.CompositeTokenFilter
 
CompoundWordFilter - Class in net.java.sen.filter.stream
A Filter that replaces a single Token with one or more alternative Tokens.
CompoundWordFilter(String) - Constructor for class net.java.sen.filter.stream.CompoundWordFilter
Creates a CompoundWordFilter from the given file
CompoundWordTableCompiler - Class in net.java.sen.tools
Compiles a table for the CompoundWordFilter
CompoundWordTableCompiler() - Constructor for class net.java.sen.tools.CompoundWordTableCompiler
 
ConjugationAttribute - Interface in org.apache.lucene.analysis.ja.tokenAttributes
Attribute for Morpheme.getConjugationalForm() and Morpheme.getConjugationalType().
ConjugationAttributeImpl - Class in org.apache.lucene.analysis.ja.tokenAttributes
 
ConjugationAttributeImpl() - Constructor for class org.apache.lucene.analysis.ja.tokenAttributes.ConjugationAttributeImpl
 
copyTo(AttributeImpl) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.BasicFormAttributeImpl
 
copyTo(AttributeImpl) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.ConjugationAttributeImpl
 
copyTo(AttributeImpl) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.CostAttributeImpl
 
copyTo(AttributeImpl) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.PartOfSpeechAttributeImpl
 
copyTo(AttributeImpl) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.PronunciationsAttributeImpl
 
copyTo(AttributeImpl) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.ReadingsAttributeImpl
 
copyTo(AttributeImpl) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.SentenceStartAttributeImpl
 
cost - Variable in class net.java.sen.dictionary.CToken
The cost of this CToken
cost - Variable in class net.java.sen.dictionary.Node
The cost of the best path through this Node, comprising this.prev, this Node, and this.next.
CostAttribute - Interface in org.apache.lucene.analysis.ja.tokenAttributes
Attribute for Token.getCost().
CostAttributeImpl - Class in org.apache.lucene.analysis.ja.tokenAttributes
 
CostAttributeImpl() - Constructor for class org.apache.lucene.analysis.ja.tokenAttributes.CostAttributeImpl
 
create(TokenStream) - Method in class org.apache.solr.analysis.JapaneseBasicFormFilterFactory
 
create(TokenStream) - Method in class org.apache.solr.analysis.JapaneseKatakanaStemFilterFactory
 
create(TokenStream) - Method in class org.apache.solr.analysis.JapanesePartOfSpeechKeepFilterFactory
 
create(TokenStream) - Method in class org.apache.solr.analysis.JapanesePartOfSpeechStopFilterFactory
 
create(TokenStream) - Method in class org.apache.solr.analysis.JapanesePunctuationFilterFactory
 
create(Reader) - Method in class org.apache.solr.analysis.JapaneseTokenizerFactory
 
create(TokenStream) - Method in class org.apache.solr.analysis.JapaneseWidthFilterFactory
 
createComponents(String, Reader) - Method in class org.apache.lucene.analysis.ja.JapaneseAnalyzer
Creates org.apache.lucene.analysis.util.ReusableAnalyzerBase.TokenStreamComponents used to tokenize all the text in the provided Reader.
CSVData - Class in net.java.sen.util
A class used to build a line of CSV data
CSVData() - Constructor for class net.java.sen.util.CSVData
 
CSVParser - Class in net.java.sen.util
parse CSV file and extract token.
CSVParser(InputStream, String) - Constructor for class net.java.sen.util.CSVParser
Constructor for a parser that reads lines from an InputStream
CSVParser(String) - Constructor for class net.java.sen.util.CSVParser
Constructor for a parser that reads lines from a String
CToken - Class in net.java.sen.dictionary
Represents an entry in the token file.
CToken() - Constructor for class net.java.sen.dictionary.CToken
 
current() - Method in interface net.java.sen.dictionary.SentenceIterator
Returns the character at the current character cursor position
currentLine() - Method in class net.java.sen.util.CSVParser
Returns the unparsed current line of text

D

Dictionary - Class in net.java.sen.dictionary
The Dictionary class wraps access to a compiled Sen dictionary
Dictionary(ShortBuffer, ByteBuffer, ByteBuffer, IntBuffer, String[], String[], String[]) - Constructor for class net.java.sen.dictionary.Dictionary
 
dictionary - Variable in class net.java.sen.dictionary.Tokenizer
The Dictionary used to find possible morphemes
DictionaryBuilder - Class in net.java.sen.compiler
Compiles CSV source data into the data files used for analysis
DictionaryBuilder(String[]) - Constructor for class net.java.sen.compiler.DictionaryBuilder
Compiles CSV source data into the data files used for analysis
DictionaryCompiler - Class in net.java.sen.tools
Compiles source CSV data into the dictionary data files used for analysis
DictionaryCompiler() - Constructor for class net.java.sen.tools.DictionaryCompiler
 
dictionaryCost - Variable in class net.java.sen.dictionary.Node
 
DictionaryPreprocessor - Class in net.java.sen.tools
Preprocesses an input dictionary into the intermediate CSV format used by the dictionary compiler.
DictionaryPreprocessor() - Constructor for class net.java.sen.tools.DictionaryPreprocessor
 
DictionaryUtil - Class in net.java.sen.dictionary
encoding methods for packing the POS file (mostly from Lucene)
DictionaryUtil() - Constructor for class net.java.sen.dictionary.DictionaryUtil
 
displayReadings - Variable in class net.java.sen.filter.ReadingNode
A sorted list of visible reading fragments within the covered range of morphemes

E

elements - Variable in class net.java.sen.util.CSVData
The values comprising the line
end() - Method in class net.java.sen.dictionary.Token
Gets the end of the character range of this Token within the underlying sentence
end() - Method in class org.apache.lucene.analysis.ja.JapaneseTokenizer
 
end() - Method in class org.apache.lucene.analysis.ja.StreamTagger2
 
enquote(String) - Method in class net.java.sen.util.CSVData
Surrounds a string with double quotes if it contains either a double quote or a comma; replaces double quotes with a pair of double quotes
eosNode - Variable in class net.java.sen.dictionary.Tokenizer
A Node representing an end-of-string
equals(Object) - Method in class net.java.sen.dictionary.Morpheme
 
equals(Object) - Method in class net.java.sen.dictionary.Reading
 
equals(Object) - Method in class net.java.sen.dictionary.Token
 

F

filterReadings(List<Token>, ReadingNode) - Method in class net.java.sen.filter.reading.NumberFilter
 
filterReadings(List<Token>, ReadingNode) - Method in class net.java.sen.filter.reading.OverrideFilter
 
filterReadings(List<Token>, ReadingNode) - Method in interface net.java.sen.filter.ReadingFilter
Filters readings
firstToken - Variable in class net.java.sen.filter.ReadingNode
The index of the first token covered by this node

G

get(int) - Method in class net.java.sen.compiler.VirtualTupleList
Retrieves an entry from the list.
getAdditionalInformation() - Method in class net.java.sen.dictionary.Morpheme
Gets the additional information string
getBaseReadings() - Method in class net.java.sen.ReadingProcessor.ReadingResult
Gets the base readings resulting from processing of the result's text.
getBasicForm() - Method in class net.java.sen.dictionary.Morpheme
Gets the unconjugated form of the morpheme
getBasicForm() - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.BasicFormAttribute
 
getBasicForm() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.BasicFormAttributeImpl
 
getBestTokens(Sentence, List<Token>) - Method in class net.java.sen.dictionary.Viterbi
Analyses a sentence to find the most likely sequence of morphemes
getBestTokens(Sentence) - Method in class net.java.sen.dictionary.Viterbi
Deprecated. use Viterbi.getBestTokens(Sentence, List) instead
getBOSNode() - Method in class net.java.sen.dictionary.Tokenizer
Creates a unique beginning-of-string Node.
getBOSToken() - Method in class net.java.sen.dictionary.Dictionary
Gets a unique beginning-of-string CToken.
getCharacters() - Method in class net.java.sen.dictionary.Sentence
Returns the underlying characters of this Sentence
getConjFormTranslation(String) - Static method in class org.apache.lucene.analysis.ja.ToStringUtil
Get the english form of a conjugated form
getConjTypeTranslation(String) - Static method in class org.apache.lucene.analysis.ja.ToStringUtil
Get the english form of a conjugational type
getConjugationalForm() - Method in class net.java.sen.dictionary.Morpheme
Gets the conjugation form of the morpheme
getConjugationalForm() - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.ConjugationAttribute
 
getConjugationalForm() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.ConjugationAttributeImpl
 
getConjugationalType() - Method in class net.java.sen.dictionary.Morpheme
Gets the conjugation type of the morpheme
getConjugationalType() - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.ConjugationAttribute
 
getConjugationalType() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.ConjugationAttributeImpl
 
getCost(Node, Node, Node) - Method in class net.java.sen.dictionary.Dictionary
Retrieves the cost between three Nodes from the connection cost matrix
getCost() - Method in class net.java.sen.dictionary.Token
Gets the Viterbi cost of this Token
getCost() - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.CostAttribute
 
getCost() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.CostAttributeImpl
 
getDefaultStopSet() - Static method in class org.apache.lucene.analysis.ja.JapaneseAnalyzer
 
getDefaultStopTags() - Static method in class org.apache.lucene.analysis.ja.JapaneseAnalyzer
 
getDictionary() - Method in class net.java.sen.dictionary.Tokenizer
 
getDisplayReadings() - Method in class net.java.sen.ReadingProcessor
Returns a list of readings generated from the current text.
getDisplayReadings() - Method in class net.java.sen.ReadingProcessor.ReadingResult
Gets the visible reading fragments resulting from processing of the result's text.
getEOSNode() - Method in class net.java.sen.dictionary.Tokenizer
Creates a unique end-of-string Node.
getEOSToken() - Method in class net.java.sen.dictionary.Dictionary
Gets a unique end-of-string CToken.
getFilters() - Method in class net.java.sen.ReadingProcessor
Returns the complete set of reading filters currently applied during processing
getInstance(String) - Static method in class net.java.sen.SenFactory
Get the singleton factory instance
getLength() - Method in class net.java.sen.dictionary.Token
Gets the length of the character range of this Token within the underlying sentence
getMorpheme() - Method in class net.java.sen.dictionary.Token
Gets the morpheme data for this Token
getPartOfSpeech() - Method in class net.java.sen.dictionary.Morpheme
Gets the part-of-speech in Chasen format
getPartOfSpeech() - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.PartOfSpeechAttribute
 
getPartOfSpeech() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.PartOfSpeechAttributeImpl
 
getPossibleTokens(Sentence, int) - Method in class net.java.sen.dictionary.Viterbi
Gets the possible tokens from a Sentence at a given position.
getPossibleTokens(int) - Method in class net.java.sen.ReadingProcessor.ReadingResult
Searches for possible tokens starting at the given position within the result's text
getPOSTranslation(String) - Static method in class org.apache.lucene.analysis.ja.ToStringUtil
Get the english form of a POS tag
getPronunciations() - Method in class net.java.sen.dictionary.Morpheme
Gets the pronunciations of the morpheme
getPronunciations() - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.PronunciationsAttribute
 
getPronunciations() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.PronunciationsAttributeImpl
 
getReadingConstraint(int) - Method in class net.java.sen.dictionary.Sentence
Gets the reading constraint at the given position, if any
getReadingConstraint(int) - Method in class net.java.sen.ReadingProcessor
Gets a reading constraint set on the currently analysed text
getReadings() - Method in class net.java.sen.dictionary.Morpheme
Gets the readings of the morpheme
getReadings() - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.ReadingsAttribute
 
getReadings() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.ReadingsAttributeImpl
 
getRomanization(String) - Static method in class org.apache.lucene.analysis.ja.ToStringUtil
Romanize katakana with modified hepburn
getSentenceStart() - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.SentenceStartAttribute
 
getSentenceStart() - Method in class org.apache.lucene.analysis.ja.tokenAttributes.SentenceStartAttributeImpl
 
getStart() - Method in class net.java.sen.dictionary.Token
Gets the start of the character range of this Token within the underlying sentence
getStringTagger(String) - Static method in class net.java.sen.SenFactory
Creates a StringTagger from the given configuration
getSurface() - Method in class net.java.sen.dictionary.Token
Gets the character range of this Token within the underlying sentence
getTokens() - Method in class net.java.sen.ReadingProcessor.ReadingResult
Gets the tokens resulting from analysis of the result's text.
getUnknownNode(char[], int, int, int) - Method in class net.java.sen.dictionary.Tokenizer
Creates an "unknown morpheme" Node with the specified characteristics.
getUnknownToken() - Method in class net.java.sen.dictionary.Dictionary
Gets a unique unknown-morpheme CToken.
getVisibleTokens() - Method in class net.java.sen.ReadingProcessor.ReadingResult
Gets the set of tokens that contain at least one visible reading.

H

hasNext() - Method in class net.java.sen.StreamTagger
Tests if more Tokens are available
hasNext() - Method in interface net.java.sen.trie.CharIterator
Reports whether more characters are available
hasNextOrigin() - Method in interface net.java.sen.dictionary.SentenceIterator
Reports whether the sentence has any more origins

I

incrementToken() - Method in class org.apache.lucene.analysis.ja.JapaneseBasicFormFilter
 
incrementToken() - Method in class org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter
Returns the next input Token, after being stemmed
incrementToken() - Method in class org.apache.lucene.analysis.ja.JapaneseTokenizer
 
incrementToken() - Method in class org.apache.lucene.analysis.ja.JapaneseWidthFilter
 
inform(ResourceLoader) - Method in class org.apache.solr.analysis.JapanesePartOfSpeechKeepFilterFactory
 
inform(ResourceLoader) - Method in class org.apache.solr.analysis.JapanesePartOfSpeechStopFilterFactory
 
inform(ResourceLoader) - Method in class org.apache.solr.analysis.JapaneseTokenizerFactory
 
init(Map<String, String>) - Method in class org.apache.solr.analysis.JapanesePunctuationFilterFactory
 
init(Map<String, String>) - Method in class org.apache.solr.analysis.JapaneseTokenizerFactory
 
insert(int, String) - Method in class net.java.sen.util.CSVData
Inserts a value into the line at a given index
invertKanaCase(String) - Static method in class net.java.sen.util.TextUtil
Swap hiragana and katakana
IpadicPreprocessor - Class in net.java.sen.compiler
Preprocesses an unpacked Ipadic dictionary into the CSV form used for compilation
IpadicPreprocessor(String, String) - Constructor for class net.java.sen.compiler.IpadicPreprocessor
Creates a new preprocessor for the unpacked dictionary in the given directory
isSentenceStart() - Method in class net.java.sen.dictionary.Token
Returns whether or not this Token begins a new sentence.
iterator() - Method in class net.java.sen.dictionary.Sentence
Returns a SentenceIterator that obeys the defined breaking ignore spans, reading constraints, and skips space characters

J

JapaneseAnalyzer - Class in org.apache.lucene.analysis.ja
Analyzer for Japanese which uses "Sen" morphological analyzer.
JapaneseAnalyzer(Version) - Constructor for class org.apache.lucene.analysis.ja.JapaneseAnalyzer
Create a JapaneseAnalyzer with the default stopwords and stoptags and no stemExclusionSet
JapaneseAnalyzer(Version, String) - Constructor for class org.apache.lucene.analysis.ja.JapaneseAnalyzer
Create a JapaneseAnalyzer with the default stopwords and stoptags and no stemExclusionSet
and argument of dictionaryDir.
JapaneseAnalyzer(Version, Set<?>, Set<String>, Set<?>, String) - Constructor for class org.apache.lucene.analysis.ja.JapaneseAnalyzer
Create a JapaneseAnalyzer with the specified stopwords, stoptags, and stemExclusionSet
JapaneseBasicFormFilter - Class in org.apache.lucene.analysis.ja
Replaces term text with the BasicFormAttribute.
JapaneseBasicFormFilter(TokenStream) - Constructor for class org.apache.lucene.analysis.ja.JapaneseBasicFormFilter
 
JapaneseBasicFormFilterFactory - Class in org.apache.solr.analysis
Factory for JapaneseBasicFormFilter.
JapaneseBasicFormFilterFactory() - Constructor for class org.apache.solr.analysis.JapaneseBasicFormFilterFactory
 
JapaneseKatakanaStemFilter - Class in org.apache.lucene.analysis.ja
Convert a katakana word to a normalized form by stemming KATAKANA-HIRAGANA PROLONGED SOUND MARK (U+30FC) which exists at the last of the string.
JapaneseKatakanaStemFilter(TokenStream) - Constructor for class org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter
 
JapaneseKatakanaStemFilterFactory - Class in org.apache.solr.analysis
Factory for JapaneseKatakanaStemFilter.
JapaneseKatakanaStemFilterFactory() - Constructor for class org.apache.solr.analysis.JapaneseKatakanaStemFilterFactory
 
JapanesePartOfSpeechKeepFilter - Class in org.apache.lucene.analysis.ja
Removes tokens that do NOT match a set of POS tags.
JapanesePartOfSpeechKeepFilter(boolean, TokenStream, Set<String>) - Constructor for class org.apache.lucene.analysis.ja.JapanesePartOfSpeechKeepFilter
 
JapanesePartOfSpeechKeepFilterFactory - Class in org.apache.solr.analysis
Factory for JapanesePartOfSpeechKeepFilter.
JapanesePartOfSpeechKeepFilterFactory() - Constructor for class org.apache.solr.analysis.JapanesePartOfSpeechKeepFilterFactory
 
JapanesePartOfSpeechStopFilter - Class in org.apache.lucene.analysis.ja
Removes tokens that match a set of POS tags.
JapanesePartOfSpeechStopFilter(boolean, TokenStream, Set<String>) - Constructor for class org.apache.lucene.analysis.ja.JapanesePartOfSpeechStopFilter
 
JapanesePartOfSpeechStopFilterFactory - Class in org.apache.solr.analysis
Factory for JapanesePartOfSpeechStopFilter.
JapanesePartOfSpeechStopFilterFactory() - Constructor for class org.apache.solr.analysis.JapanesePartOfSpeechStopFilterFactory
 
JapanesePunctuationFilter - Class in org.apache.lucene.analysis.ja
Removes punctuation tokens
JapanesePunctuationFilter(boolean, TokenStream) - Constructor for class org.apache.lucene.analysis.ja.JapanesePunctuationFilter
 
JapanesePunctuationFilterFactory - Class in org.apache.solr.analysis
Factory for JapanesePunctuationFilter.
JapanesePunctuationFilterFactory() - Constructor for class org.apache.solr.analysis.JapanesePunctuationFilterFactory
 
JapaneseTokenizer - Class in net.java.sen.tokenizers.ja
A Tokenizer for Japanese text
JapaneseTokenizer(Dictionary, String) - Constructor for class net.java.sen.tokenizers.ja.JapaneseTokenizer
Creates a JapaneseTokenizer with the given Dictionary
JapaneseTokenizer - Class in org.apache.lucene.analysis.ja
This is a Japanese tokenizer which uses "Sen" morphological analyzer.
JapaneseTokenizer(Reader) - Constructor for class org.apache.lucene.analysis.ja.JapaneseTokenizer
 
JapaneseTokenizer(Reader, StreamFilter) - Constructor for class org.apache.lucene.analysis.ja.JapaneseTokenizer
 
JapaneseTokenizer(Reader, StreamFilter, String) - Constructor for class org.apache.lucene.analysis.ja.JapaneseTokenizer
 
JapaneseTokenizerFactory - Class in org.apache.solr.analysis
Factory for JapaneseTokenizer.
JapaneseTokenizerFactory() - Constructor for class org.apache.solr.analysis.JapaneseTokenizerFactory
 
JapaneseWidthFilter - Class in org.apache.lucene.analysis.ja
A TokenFilter that normalizes CJK width differences: Folds fullwidth ASCII variants into the equivalent basic latin Folds halfwidth Katakana variants into the equivalent kana NOTE: this filter can be viewed as a (practical) subset of NFKC/NFKD Unicode normalization.
JapaneseWidthFilter(TokenStream) - Constructor for class org.apache.lucene.analysis.ja.JapaneseWidthFilter
 
JapaneseWidthFilterFactory - Class in org.apache.solr.analysis
Factory for JapaneseWidthFilter.
JapaneseWidthFilterFactory() - Constructor for class org.apache.solr.analysis.JapaneseWidthFilterFactory
 

K

key - Variable in class net.java.sen.compiler.StringCTokenTuple
The tuple's String

L

lastToken - Variable in class net.java.sen.filter.ReadingNode
The index of the last token covered by this node
lcAttr - Variable in class net.java.sen.dictionary.CToken
Used in Viterbi path cost calculation
lcAttr - Variable in class net.java.sen.dictionary.Node
Used in Viterbi path cost calculation
length - Variable in class net.java.sen.dictionary.CToken
The length of the morpheme this CToken wraps
length - Variable in class net.java.sen.dictionary.Node
The number of characters this Node covers
length - Variable in class net.java.sen.dictionary.Reading
The number of characters of the sentence covered by the reading
length() - Method in interface net.java.sen.dictionary.SentenceIterator
Returns the length of the underlying character range being iterated over, including any ignored characters
lnext - Variable in class net.java.sen.dictionary.Node
The next Node returned for the same ending position within the sentence by the Dictionary
lookup(SentenceIterator, char[]) - Method in class net.java.sen.dictionary.Tokenizer
Searches for possible morphemes from the given SentenceIterator.
lookup(SentenceIterator, char[]) - Method in class net.java.sen.tokenizers.ja.JapaneseTokenizer
 

M

main(String[]) - Static method in class net.java.sen.tools.CompoundWordTableCompiler
Main method
main(String[]) - Static method in class net.java.sen.tools.DictionaryCompiler
Main method
main(String[]) - Static method in class net.java.sen.tools.DictionaryPreprocessor
Precompiles a dictionary into the intermediate form used by the dictionary compiler
Morpheme - Class in net.java.sen.dictionary
A class representing part-of-speech data for a morpheme.
Morpheme(Dictionary, int) - Constructor for class net.java.sen.dictionary.Morpheme
Builds a lazy proxy onto a part-of-speech stored in a Dictionary
Morpheme(String, String, String, String, String[], String[], String) - Constructor for class net.java.sen.dictionary.Morpheme
Creates a literal Morpheme that does not link to any Dictionary
Morpheme() - Constructor for class net.java.sen.dictionary.Morpheme
Creates a blank, modifiable Morpheme that does not link to any Dictionary
morpheme - Variable in class net.java.sen.dictionary.Node
The Morpheme that is contained within this Node

N

net.java.sen - package net.java.sen
 
net.java.sen.compiler - package net.java.sen.compiler
 
net.java.sen.dictionary - package net.java.sen.dictionary
 
net.java.sen.filter - package net.java.sen.filter
 
net.java.sen.filter.reading - package net.java.sen.filter.reading
 
net.java.sen.filter.stream - package net.java.sen.filter.stream
 
net.java.sen.tokenizers.ja - package net.java.sen.tokenizers.ja
 
net.java.sen.tools - package net.java.sen.tools
 
net.java.sen.trie - package net.java.sen.trie
 
net.java.sen.util - package net.java.sen.util
 
next - Variable in class net.java.sen.dictionary.Node
The next node on the best path through the Node lattice
next - Variable in class net.java.sen.filter.ReadingNode
The next node in the list
next() - Method in class net.java.sen.StreamTagger
Returns the next available token
next() - Method in interface net.java.sen.trie.CharIterator
Returns the next available character
next() - Method in class org.apache.lucene.analysis.ja.StreamTagger2
 
nextOrigin() - Method in interface net.java.sen.dictionary.SentenceIterator
Moves the origin forward to the next available position.
nextRow() - Method in class net.java.sen.util.CSVParser
Advances to the next line of CSV data, if any, skipping any remaining values on the current row
nextToken() - Method in class net.java.sen.util.CSVParser
Reads the next value from the current line
nextTokens() - Method in class net.java.sen.util.CSVParser
Returns an array of all values from the next line of the input
Node - Class in net.java.sen.dictionary
A node within the Viterbi cost lattice
Node() - Constructor for class net.java.sen.dictionary.Node
 
NumberFilter - Class in net.java.sen.filter.reading
A ReadingFilter that adapts the basic dictionary-based reading output to account for the reading behaviour of number kanji and numeric suffixes
NumberFilter() - Constructor for class net.java.sen.filter.reading.NumberFilter
 

O

org.apache.lucene.analysis.ja - package org.apache.lucene.analysis.ja
 
org.apache.lucene.analysis.ja.tokenAttributes - package org.apache.lucene.analysis.ja.tokenAttributes
 
org.apache.solr.analysis - package org.apache.solr.analysis
 
origin() - Method in interface net.java.sen.dictionary.SentenceIterator
Returns the current origin position.
OverrideFilter - Class in net.java.sen.filter.reading
A reading filter that overrides decisions on reading visibility made by earlier filters.
OverrideFilter() - Constructor for class net.java.sen.filter.reading.OverrideFilter
 

P

PartOfSpeechAttribute - Interface in org.apache.lucene.analysis.ja.tokenAttributes
Attribute for Morpheme.getPartOfSpeech().
PartOfSpeechAttributeImpl - Class in org.apache.lucene.analysis.ja.tokenAttributes
 
PartOfSpeechAttributeImpl() - Constructor for class org.apache.lucene.analysis.ja.tokenAttributes.PartOfSpeechAttributeImpl
 
partOfSpeechIndex - Variable in class net.java.sen.dictionary.CToken
The file index in the part-of-speech information file of the morpheme data this CToken wraps
postProcess(List<Token>) - Method in class net.java.sen.filter.stream.CommentFilter
 
postProcess(List<Token>) - Method in class net.java.sen.filter.stream.CompositeTokenFilter
 
postProcess(List<Token>) - Method in class net.java.sen.filter.stream.CompoundWordFilter
 
postProcess(List<Token>) - Method in interface net.java.sen.filter.StreamFilter
Post-processes analysed tokens
preProcess(Sentence) - Method in class net.java.sen.filter.stream.CommentFilter
 
preProcess(Sentence) - Method in class net.java.sen.filter.stream.CompositeTokenFilter
 
preProcess(Sentence) - Method in class net.java.sen.filter.stream.CompoundWordFilter
 
preProcess(Sentence) - Method in interface net.java.sen.filter.StreamFilter
Pre-processes a sentence
prev - Variable in class net.java.sen.dictionary.Node
The previous node on the best path through the Node lattice
prev - Variable in class net.java.sen.filter.ReadingNode
The previous node in the list
process() - Method in class net.java.sen.ReadingProcessor
Performs full reading processing and returns a ReadingProcessor.ReadingResult object isolated from further changes to the reading processor
PronunciationsAttribute - Interface in org.apache.lucene.analysis.ja.tokenAttributes
Attribute for Morpheme.getPronunciations().
PronunciationsAttributeImpl - Class in org.apache.lucene.analysis.ja.tokenAttributes
 
PronunciationsAttributeImpl() - Constructor for class org.apache.lucene.analysis.ja.tokenAttributes.PronunciationsAttributeImpl
 

R

rcAttr1 - Variable in class net.java.sen.dictionary.CToken
Used in Viterbi path cost calculation
rcAttr1 - Variable in class net.java.sen.dictionary.Node
Used in Viterbi path cost calculation
rcAttr2 - Variable in class net.java.sen.dictionary.CToken
Used in Viterbi path cost calculation
rcAttr2 - Variable in class net.java.sen.dictionary.Node
Used in Viterbi path cost calculation
read(ByteBuffer) - Method in class net.java.sen.dictionary.CToken
Read a CToken from a ByteBuffer
Reading - Class in net.java.sen.dictionary
A class representing a reading applied to a set of characters within a sentence
Reading(int, int, String) - Constructor for class net.java.sen.dictionary.Reading
 
ReadingFilter - Interface in net.java.sen.filter
An interface to filters used during reading processing
ReadingNode - Class in net.java.sen.filter
A class used by reading filters during reading processing
ReadingNode() - Constructor for class net.java.sen.filter.ReadingNode
 
ReadingProcessor - Class in net.java.sen
A text processor that builds reading data suitable for application as furigana.
ReadingProcessor(Tokenizer) - Constructor for class net.java.sen.ReadingProcessor
 
ReadingProcessor.ReadingResult - Class in net.java.sen
The result of reading processing.
ReadingsAttribute - Interface in org.apache.lucene.analysis.ja.tokenAttributes
 
ReadingsAttributeImpl - Class in org.apache.lucene.analysis.ja.tokenAttributes
Attribute for Morpheme.getReadings().
ReadingsAttributeImpl() - Constructor for class org.apache.lucene.analysis.ja.tokenAttributes.ReadingsAttributeImpl
 
readKatakana(ByteBuffer, char[], int, int) - Static method in class net.java.sen.dictionary.DictionaryUtil
 
readRules(BufferedReader) - Method in class net.java.sen.filter.stream.CommentFilter
Reads the rules to apply as space-delimited text
readRules(BufferedReader) - Method in class net.java.sen.filter.stream.CompositeTokenFilter
Reads the rules to apply as space-delimited text
readString(ByteBuffer, char[], int, int) - Static method in class net.java.sen.dictionary.DictionaryUtil
 
readVInt(ByteBuffer) - Static method in class net.java.sen.dictionary.DictionaryUtil
Reads an int stored in variable-length format.
reflectWith(AttributeReflector) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.BasicFormAttributeImpl
 
reflectWith(AttributeReflector) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.ConjugationAttributeImpl
 
reflectWith(AttributeReflector) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.PartOfSpeechAttributeImpl
 
reflectWith(AttributeReflector) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.PronunciationsAttributeImpl
 
reflectWith(AttributeReflector) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.ReadingsAttributeImpl
 
remove(int) - Method in class net.java.sen.util.CSVData
Removes the value at the given index of the line
removeFilter(int) - Method in class net.java.sen.ReadingProcessor
Removes the filter with the given priority, if it exists
removeFilters() - Method in class net.java.sen.StringTagger
Remove all current StreamFilters
removeReadingConstraint(int) - Method in class net.java.sen.dictionary.Sentence
Removes the reading constraint at the given position, if any
removeReadingConstraint(int) - Method in class net.java.sen.ReadingProcessor
Remove the reading constraint at the given position
reset() - Method in class net.java.sen.filter.reading.NumberFilter
 
reset() - Method in class net.java.sen.filter.reading.OverrideFilter
 
reset() - Method in interface net.java.sen.filter.ReadingFilter
Resets any sentence specific state held by the filter.
reset(Reader) - Method in class org.apache.lucene.analysis.ja.JapaneseTokenizer
 
reset() - Method in class org.apache.lucene.analysis.ja.StreamTagger2
 
reset(Reader) - Method in class org.apache.lucene.analysis.ja.StreamTagger2
 
rewindToOrigin() - Method in interface net.java.sen.dictionary.SentenceIterator
Returns to the current origin position.
rnext - Variable in class net.java.sen.dictionary.Node
The next Node returned for the same starting position within the sentence by the Dictionary
ruleList - Variable in class net.java.sen.filter.stream.CommentFilter
The list of rules defining the start and end of comments, and the part-of-speech code to be used in the Tokens used to replace them

S

SenFactory - Class in net.java.sen
A factory to manage creation of Viterbi, StringTagger, and ReadingProcessor objects

Thread Safety: This class and all its public methods are thread safe.
Sentence - Class in net.java.sen.dictionary
A Sentence represents a character array to be morphologically analysed.
Sentence(char[]) - Constructor for class net.java.sen.dictionary.Sentence
Creates a sentence with the given characters
Sentence(String) - Constructor for class net.java.sen.dictionary.Sentence
Creates a sentence with the given string
SentenceIterator - Interface in net.java.sen.dictionary
An iterator over a sequence of characters, consisting of subsequences that may overlap, and that do not necessarily cover every character in the underlying sequence.
SentenceStartAttribute - Interface in org.apache.lucene.analysis.ja.tokenAttributes
Specifies if this token starts a new sentence: this can be useful if you want to adjust position increment to prevent phrase queries from matching across sentence boundaries without slop.
SentenceStartAttributeImpl - Class in org.apache.lucene.analysis.ja.tokenAttributes
 
SentenceStartAttributeImpl() - Constructor for class org.apache.lucene.analysis.ja.tokenAttributes.SentenceStartAttributeImpl
 
set(int, String) - Method in class net.java.sen.util.CSVData
Replaces the value at the index of the line with a new value
setAdditionalInformation(String) - Method in class net.java.sen.dictionary.Morpheme
Sets an arbitrary string of additional information
setBreakingIgnoreSpan(int, short) - Method in class net.java.sen.dictionary.Sentence
Sets a breaking ignore span.
setCost(int) - Method in class net.java.sen.dictionary.Token
Sets the Viterbi cost of this Token
setCost(int) - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.CostAttribute
 
setCost(int) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.CostAttributeImpl
 
setCToken(CToken) - Method in class net.java.sen.dictionary.Node
 
setFilters(Map<Integer, ReadingFilter>) - Method in class net.java.sen.ReadingProcessor
Sets all reading filters to be applied during processing.
setLength(int) - Method in class net.java.sen.dictionary.Token
Sets the length of the character range of this Token within the underlying sentence
setMorpheme(Morpheme) - Method in class net.java.sen.dictionary.Token
Sets the morpheme data for this Token
setMorpheme(Morpheme) - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.BasicFormAttribute
 
setMorpheme(Morpheme) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.BasicFormAttributeImpl
 
setMorpheme(Morpheme) - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.ConjugationAttribute
 
setMorpheme(Morpheme) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.ConjugationAttributeImpl
 
setMorpheme(Morpheme) - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.PartOfSpeechAttribute
 
setMorpheme(Morpheme) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.PartOfSpeechAttributeImpl
 
setMorpheme(Morpheme) - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.PronunciationsAttribute
 
setMorpheme(Morpheme) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.PronunciationsAttributeImpl
 
setMorpheme(Morpheme) - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.ReadingsAttribute
 
setMorpheme(Morpheme) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.ReadingsAttributeImpl
 
setReadingConstraint(Reading) - Method in class net.java.sen.dictionary.Sentence
Sets a reading constraint on the Sentence starting at position; any existing constraints that overlap the new constraint will be removed.
setReadingConstraint(Reading) - Method in class net.java.sen.ReadingProcessor
Sets a reading constraint on the currently analysed text.
setSentenceStart(boolean) - Method in class net.java.sen.dictionary.Token
Sets whether or not this token begins a new sentence.
setSentenceStart(boolean) - Method in interface org.apache.lucene.analysis.ja.tokenAttributes.SentenceStartAttribute
 
setSentenceStart(boolean) - Method in class org.apache.lucene.analysis.ja.tokenAttributes.SentenceStartAttributeImpl
 
setStart(int) - Method in class net.java.sen.dictionary.Token
Sets the start of the character range of this Token within the underlying sentence
setSurface(String) - Method in class net.java.sen.dictionary.Token
Sets the character range of this Token within the underlying sentence
setText(String) - Method in class net.java.sen.ReadingProcessor
Sets the currently analysed text.
setVisible(int, Boolean) - Method in class net.java.sen.filter.reading.OverrideFilter
Sets a visibility override at a given character index
size() - Method in class net.java.sen.compiler.VirtualTupleList
Returns the number of entries in the list
SIZE - Static variable in class net.java.sen.dictionary.CToken
The length in bytes of a stored CToken
skippedCharCount() - Method in interface net.java.sen.dictionary.SentenceIterator
Returns the number of characters skipped between the previous and current character spans
sort() - Method in class net.java.sen.compiler.VirtualTupleList
Sorts the list
span - Variable in class net.java.sen.dictionary.Node
The number of characters between the end of the previous Node and the end of this one, including any ignored characters that do not form part of the Morpheme
start - Variable in class net.java.sen.dictionary.Node
The index of the first character of this Node within the surface
start - Variable in class net.java.sen.dictionary.Reading
The starting point within the sentence
StreamFilter - Interface in net.java.sen.filter
Represents a Node filter capable of both pre- and post-processing.
StreamTagger - Class in net.java.sen
Tokenizes text read from a java.io.Reader See examples.StreamTaggerDemo in the Sen source for an example of how to use this class Thread Safety: Objects of this class are NOT thread safe and should not be accessed simultaneously by multiple threads.
StreamTagger(StringTagger, Reader) - Constructor for class net.java.sen.StreamTagger
 
StreamTagger2 - Class in org.apache.lucene.analysis.ja
Breaks text into sentences according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
StreamTagger2(StringTagger, Reader) - Constructor for class org.apache.lucene.analysis.ja.StreamTagger2
Construct a new StreamTagger2 that breaks text into words from the given Reader.
StringCTokenTuple - Class in net.java.sen.compiler
A tuple comprising a String and a CToken
StringCTokenTuple(String, CToken) - Constructor for class net.java.sen.compiler.StringCTokenTuple
 
StringTagger - Class in net.java.sen
Tokenizes strings See examples.StringTaggerDemo in the Sen source for an example of how to use this class Thread Safety: Objects of this class are NOT thread safe and should not be accessed simultaneously by multiple threads.
StringTagger(Tokenizer) - Constructor for class net.java.sen.StringTagger
 

T

terminator - Variable in class net.java.sen.dictionary.CToken
 
text - Variable in class net.java.sen.dictionary.Reading
The reading text applied to the covered span
TextUtil - Class in net.java.sen.util
Miscellaneous text utilities
TextUtil() - Constructor for class net.java.sen.util.TextUtil
 
Token - Class in net.java.sen.dictionary
A single token from an analysed sentence Thread Safety: Objects of this class are NOT thread safe and should not be accessed simultaneously by multiple threads.
Token(String, Node) - Constructor for class net.java.sen.dictionary.Token
Creates a Token from a Node
Token(String, int, int, int, Morpheme) - Constructor for class net.java.sen.dictionary.Token
Creates a Token with explicit parameters
Token() - Constructor for class net.java.sen.dictionary.Token
Creates a blank Token
Tokenizer - Class in net.java.sen.dictionary
A String Tokenizer The Tokenizer uses a Dictionary to assist the decomposition of strings into potential morphemes
Tokenizer(Dictionary, String) - Constructor for class net.java.sen.dictionary.Tokenizer
Constructs a new Tokenizer that uses the specified Dictionary to find possible morphemes within a given string
toString() - Method in class net.java.sen.dictionary.Morpheme
 
toString() - Method in class net.java.sen.dictionary.Reading
 
toString() - Method in class net.java.sen.dictionary.Token
Returns the character range of this Token within the underlying sentence
toString() - Method in class net.java.sen.util.CSVData
Returns the line of CSV data represented by this class
ToStringUtil - Class in org.apache.lucene.analysis.ja
 
ToStringUtil() - Constructor for class org.apache.lucene.analysis.ja.ToStringUtil
 
TrieBuilder - Class in net.java.sen.trie
Constructs a Trie from the supplied sorted key and value arrays
TrieBuilder(String[], int[], int) - Constructor for class net.java.sen.trie.TrieBuilder
Creates a TrieBuilder to build the given data
TrieSearcher - Class in net.java.sen.trie
Searches a Trie data file
TrieSearcher() - Constructor for class net.java.sen.trie.TrieSearcher
 

U

unconstrainedIterator(int) - Method in class net.java.sen.dictionary.Sentence
Returns a SentenceIterator that obeys the defined breaking ignore spans, skips space characters, but ignores reading constraints
unknownCToken - Variable in class net.java.sen.dictionary.Tokenizer
A CToken representing an unknown morpheme
unknownPartOfSpeechDescription - Variable in class net.java.sen.dictionary.Tokenizer
The part-of-speech code to use for unknown tokens
unknownPOS - Static variable in class net.java.sen.SenFactory
 

V

value - Variable in class net.java.sen.compiler.StringCTokenTuple
The tuple's CToken
VirtualTupleList - Class in net.java.sen.compiler
A file-mapped list of StringCTokenTuples.
VirtualTupleList() - Constructor for class net.java.sen.compiler.VirtualTupleList
 
visible - Variable in class net.java.sen.filter.ReadingNode
true if the stored readings, if any, are to be shown, otherwise false
Viterbi - Class in net.java.sen.dictionary
An implementation of the Viterbi algorithm used to find the most likely sequence of morphemes comprising a sentence Thread Safety: Objects of this class are NOT thread safe and should not be accessed simultaneously by multiple threads.
Viterbi(Tokenizer) - Constructor for class net.java.sen.dictionary.Viterbi
Creates a Viterbi instance using the given Tokenizer

W

write(DataOutput, CToken) - Static method in class net.java.sen.dictionary.CToken
Write a CToken to a DataOutput
writeKatakana(DataOutput, String) - Static method in class net.java.sen.dictionary.DictionaryUtil
 
writeVInt(DataOutput, int) - Static method in class net.java.sen.dictionary.DictionaryUtil
Writes an int in a variable-length format.

A B C D E F G H I J K L M N O P R S T U V W

Copyright © 2012. All Rights Reserved.