See: Description
| Interface | Description |
|---|---|
| TextProcessor |
TextProcessor allows applying pre-processing to input tokens for natural language
applications. |
| Tokenizer |
Tokenizer interface provides the ability to break-down sentences into embeddable tokens. |
| Class | Description |
|---|---|
| HyphenNormalizer |
Unicode normalization does not take care of "exotic" hyphens that we normally do not want in NLP
input.
|
| LambdaProcessor |
TextProcessor will apply user defined lambda function on input tokens. |
| LowerCaseConvertor |
LowerCaseConvertor converts every character of the input tokens to it's respective lower
case character. |
| PunctuationSeparator |
PunctuationSeparator separates punctuation into a separate token. |
| SimpleTokenizer |
SimpleTokenizer is an implementation of the Tokenizer interface that converts
sentences into token by splitting them by a given delimiter. |
| TextCleaner |
Applies remove or replace of certain characters based on condition.
|
| TextTerminator |
A
TextProcessor that adds a beginning of string and end of string token. |
| TextTruncator |
TextProcessor that truncates text to a maximum size. |
| UnicodeNormalizer |
Applies unicode normalization to input strings.
|