| Class | Description |
|---|---|
| AnnotationFilterAnnotator |
Filters
Keeps that enclose the configured Annotations. |
| EtAlAnnotator |
Text-level filtering.
|
| FilterFrameworkUtils |
Filtering framework.
|
| FrequencyFilterAnnotator |
Removes
Keep annotations that are too frequent or not frequent enough
(aka "hapax"), based on a frequency list (that has been generated with
FrequencyFilterWriter. |
| FrequencyFilterWriter |
Generates a frequency list, based on
Keep.getNormalizedText(), and on
whether to filter the whole document with BlueCasUtil.keepDoc(JCas). |
| KeepsCleaner |
Performs cleaning on
Keep.getNormalizedText(). |
| KeepsDumper |
Dumps all
Keeps to sysout, useful for debugging |
| KeepsWriter |
Writes all
Keeps to a file. |
| LeaveOnlyKeepsEnclosedAnnotationsAnnotator |
Remove ALL
Annotations, except the ones referenced by Keep
annotations. |
| MeasureNormalizerAnnotator |
Normalizes
Keep.getNormalizedText() that cover a Measure by
removing the numeric part and leaving only the unit. |
| PunctuationFilterAnnotator | |
| ReferencesFinderAnnotator | Deprecated |
| SectionAnnotator |
/** Annotates
DocumentBlocks with ContentSection like
Acknowlegments, Correspondance, etc. |
| SectionFilterAnnotator |
Removes
Keep annotations that are located into a
DocumentBlock whose ContentSection is non-content, e.g. |
| SectionRegexAnnotator |
Annotates
DocumentBlocks with ContentSection like
Acknowlegments, Correspondance, etc. |
| SentenceFilterAnnotator |
Remove sentences that do not contain a given regex
|
| SimpleNormalizerAnnotator |
Lemmatizes every
Keep annotation and sets its
Keep#setNormalizedText(), by (just) trimming the annotation's text. |
| SnowballStemmerNormalizerAnnotator |
Stems every
Keep annotation and sets its
Keep#setNormalizedText(), using Snowball/Porter's algorithm. |
| StopwordFilterAnnotator |
Removes
Keep annotations whose Keep.getNormalizedText()
belongs to a stopword list.Stoplist format: one word per line. |
| Tokens2KeepAnnotator | |
| TooFewTokensFilterAnnotator |
Document-level filtering.Flags a document that has too few tokens per page (<
50 on average).
|
| TooMuchOOVFilterAnnotator |
Document-level filtering.
|
Copyright © 2015 Bluebrain Project. All rights reserved.