|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectnet.java.sen.dictionary.Tokenizer
public abstract class Tokenizer
A String Tokenizer
The Tokenizer uses a Dictionary to assist the decomposition of
strings into potential morphemes
| Field Summary | |
|---|---|
protected Node |
bosNode
A Node representing a beginning-of-string |
protected Dictionary |
dictionary
The Dictionary used to find possible morphemes |
protected Node |
eosNode
A Node representing an end-of-string |
protected CToken |
unknownCToken
A CToken representing an unknown morpheme |
protected String |
unknownPartOfSpeechDescription
The part-of-speech code to use for unknown tokens |
| Constructor Summary | |
|---|---|
Tokenizer(Dictionary dictionary,
String unknownPartOfSpeechDescription)
Constructs a new Tokenizer that uses the specified
Dictionary to find possible morphemes within a given string |
|
| Method Summary | |
|---|---|
Node |
getBOSNode()
Creates a unique beginning-of-string Node. |
Dictionary |
getDictionary()
|
Node |
getEOSNode()
Creates a unique end-of-string Node. |
Node |
getUnknownNode(char[] surface,
int start,
int length,
int span)
Creates an "unknown morpheme" Node with the specified
characteristics. |
abstract Node |
lookup(SentenceIterator iterator,
char[] surface)
Searches for possible morphemes from the given SentenceIterator. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected final Dictionary dictionary
Dictionary used to find possible morphemes
protected final CToken unknownCToken
CToken representing an unknown morpheme
protected final Node bosNode
Node representing a beginning-of-string
protected final Node eosNode
Node representing an end-of-string
protected final String unknownPartOfSpeechDescription
| Constructor Detail |
|---|
public Tokenizer(Dictionary dictionary,
String unknownPartOfSpeechDescription)
Tokenizer that uses the specified
Dictionary to find possible morphemes within a given string
dictionary - The Dictionary to search withinunknownPartOfSpeechDescription - The part-of-speech code to use for
unknown tokens| Method Detail |
|---|
public Dictionary getDictionary()
public Node getBOSNode()
Node. The Node
returned by this method is freshly cloned and not an alias of any
other Node
Nodepublic Node getEOSNode()
Node. The Node returned by
this method is freshly cloned and not an alias of any other Node
public Node getUnknownNode(char[] surface,
int start,
int length,
int span)
Node with the specified
characteristics. The Node returned by this method is freshly
cloned and not an alias of any other Node
surface - The underlying surface of which the Node is partstart - The index of the first character of the surface within the
Nodelength - The length of the Nodespan - The span of the Node
Node
public abstract Node lookup(SentenceIterator iterator,
char[] surface)
throws IOException
Node that is returned links through
Node.rnext to a list of matches which may be of varying
lengths
iterator - The iterator to search fromsurface - The underlying character surface
Nodes representing the possible
morphemes beginning at the given index
IOException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||