net.java.sen.tokenizers.ja
Class JapaneseTokenizer

java.lang.Object
  extended by net.java.sen.dictionary.Tokenizer
      extended by net.java.sen.tokenizers.ja.JapaneseTokenizer

public class JapaneseTokenizer
extends Tokenizer

A Tokenizer for Japanese text


Field Summary
 
Fields inherited from class net.java.sen.dictionary.Tokenizer
bosNode, dictionary, eosNode, unknownCToken, unknownPartOfSpeechDescription
 
Constructor Summary
JapaneseTokenizer(Dictionary dictionary, String unknownPartOfSpeechDescription)
          Creates a JapaneseTokenizer with the given Dictionary
 
Method Summary
 Node lookup(SentenceIterator iterator, char[] surface)
          Searches for possible morphemes from the given SentenceIterator.
 
Methods inherited from class net.java.sen.dictionary.Tokenizer
getBOSNode, getDictionary, getEOSNode, getUnknownNode
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

JapaneseTokenizer

public JapaneseTokenizer(Dictionary dictionary,
                         String unknownPartOfSpeechDescription)
Creates a JapaneseTokenizer with the given Dictionary

Parameters:
dictionary - The Dictionary in which to search for possible morphemes
unknownPartOfSpeechDescription - The part-of-speech code to use for unknown tokens
Method Detail

lookup

public Node lookup(SentenceIterator iterator,
                   char[] surface)
Description copied from class: Tokenizer
Searches for possible morphemes from the given SentenceIterator. The Node that is returned links through Node.rnext to a list of matches which may be of varying lengths

Specified by:
lookup in class Tokenizer
Parameters:
iterator - The iterator to search from
surface - The underlying character surface
Returns:
The head of a chain of Nodes representing the possible morphemes beginning at the given index


Copyright © 2012. All Rights Reserved.