- All Implemented Interfaces:
- org.apache.uima.analysis_component.AnalysisComponent
public class KeepsCleaner
extends org.apache.uima.fit.component.JCasAnnotator_ImplBase
Performs cleaning on Keep.getNormalizedText(). Should be performed at
the end of the filtering process. Detailed steps:
Remove Keep if normalizedText length < minLength or does not contain balanced
parenthesis (e.g. "(hello") or text starts with 'www.' or text consist only
of punctuation and numbers.
Then optionally lowercase and finally removes punctuation if it starts or
ends with punctuation.
- Author:
- renaud.richardet@epfl.ch