public class FilterCoocurrencesInLongSentences
extends org.apache.uima.fit.component.JCasAnnotator_ImplBase
Cooccurrences if the enclosing scope (e.g. Sentence
length) is larger than PARAM_MAXIMUM_SCOPE_LENGTH.
Here some sentence length statistics from a random sample of ~50k abstract and
pdfs:
Full text Min. 1st Qu. Median Mean 3rd Qu. Max. 1.0 22.0 81.0 109.8 158.0 16380.0 Abstracts Min. 1st Qu. Median Mean 3rd Qu. Max. 2.0 96.0 136.0 145.5 185.0 1215.0
| Modifier and Type | Field and Description |
|---|---|
protected Class<? extends org.apache.uima.jcas.tcas.Annotation> |
enclosingScope |
protected String |
enclosingScopeStr |
protected int |
maximumEnclosingScopeLength |
static String |
PARAM_MAXIMUM_SCOPE_LENGTH |
| Constructor and Description |
|---|
FilterCoocurrencesInLongSentences() |
| Modifier and Type | Method and Description |
|---|---|
void |
initialize(org.apache.uima.UimaContext context) |
void |
process(org.apache.uima.jcas.JCas jCas) |
getRequiredCasInterface, processgetCasInstancesRequired, hasNext, nextprotected String enclosingScopeStr
protected Class<? extends org.apache.uima.jcas.tcas.Annotation> enclosingScope
public static final String PARAM_MAXIMUM_SCOPE_LENGTH
protected int maximumEnclosingScopeLength
public void initialize(org.apache.uima.UimaContext context)
throws org.apache.uima.resource.ResourceInitializationException
initialize in interface org.apache.uima.analysis_component.AnalysisComponentinitialize in class org.apache.uima.fit.component.JCasAnnotator_ImplBaseorg.apache.uima.resource.ResourceInitializationExceptionpublic void process(org.apache.uima.jcas.JCas jCas)
throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process in class org.apache.uima.analysis_component.JCasAnnotator_ImplBaseorg.apache.uima.analysis_engine.AnalysisEngineProcessExceptionCopyright © 2015 Bluebrain Project. All rights reserved.