Class TaggedFilterConfiguration
- java.lang.Object
-
- net.sf.okapi.filters.abstractmarkup.config.TaggedFilterConfiguration
-
public class TaggedFilterConfiguration extends Object
Defines extraction rules useful for markup languages such as HTML and XML.Extraction rules can handle the following cases:
Default rule - don't extract it.
INLINE - Elements that are included with text.
EXCLUDED -Element and children that should be excluded from extraction.
INCLUDED - Elements and children within EXCLUDED ranges that should be extracted.
GROUP - Elements that are grouped together structurally such as lists, tables etc.
ATTRIBUTES - Attributes on specific elements which should be extracted. May be translatable or localizable.
ATTRIBUTES ANY ELEMENT - Convenience rule for attributes which can occur on any element. May be translatable or localize.
TEXT UNIT - Elements whose start and end tags become part of a
TextUnitrather thanDocumentPart.Any of the above rules may have conditional rules based on attribute names and/or values. Conditional rules may be attached to both elements and attributes. More than one conditional rules are evaluated as OR expressions. For example, "type=button" OR "type=default".
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classTaggedFilterConfiguration.RULE_TYPEAbstractMarkupFilterrule types.
-
Field Summary
Fields Modifier and Type Field Description static EnumSet<TaggedFilterConfiguration.RULE_TYPE>ATTRIBUTE_ON_ELEMENT_RULESstatic EnumSet<TaggedFilterConfiguration.RULE_TYPE>FAILEDstatic EnumSet<TaggedFilterConfiguration.RULE_TYPE>INLINE_AND_EXCLUDEstatic EnumSet<TaggedFilterConfiguration.RULE_TYPE>INLINE_AND_EXCLUDE_FAILstatic EnumSet<TaggedFilterConfiguration.RULE_TYPE>INLINE_AND_INCLUDEstatic EnumSet<TaggedFilterConfiguration.RULE_TYPE>INLINE_AND_INCLUDE_FAIL
-
Constructor Summary
Constructors Constructor Description TaggedFilterConfiguration()TaggedFilterConfiguration(File configurationFile)TaggedFilterConfiguration(String configurationScript)TaggedFilterConfiguration(URL configurationPathAsResource)
-
Method Summary
-
-
-
Field Detail
-
ATTRIBUTE_ON_ELEMENT_RULES
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> ATTRIBUTE_ON_ELEMENT_RULES
-
INLINE_AND_EXCLUDE
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> INLINE_AND_EXCLUDE
-
INLINE_AND_EXCLUDE_FAIL
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> INLINE_AND_EXCLUDE_FAIL
-
INLINE_AND_INCLUDE
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> INLINE_AND_INCLUDE
-
INLINE_AND_INCLUDE_FAIL
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> INLINE_AND_INCLUDE_FAIL
-
FAILED
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> FAILED
-
-
Constructor Detail
-
TaggedFilterConfiguration
public TaggedFilterConfiguration()
-
TaggedFilterConfiguration
public TaggedFilterConfiguration(URL configurationPathAsResource)
-
TaggedFilterConfiguration
public TaggedFilterConfiguration(File configurationFile)
-
TaggedFilterConfiguration
public TaggedFilterConfiguration(String configurationScript)
-
-
Method Detail
-
getConfigReader
public YamlConfigurationReader getConfigReader()
-
isGlobalPreserveWhitespace
public boolean isGlobalPreserveWhitespace()
-
isGlobalExcludeByDefault
public boolean isGlobalExcludeByDefault()
-
isWellformed
public boolean isWellformed()
-
isInlineCdata
public boolean isInlineCdata()
-
isUseCodeFinder
public boolean isUseCodeFinder()
-
getBooleanParameter
public boolean getBooleanParameter(String parameterName)
-
getIntegerParameter
public int getIntegerParameter(String parameterName)
-
getGlobalPCDATASubfilter
public String getGlobalPCDATASubfilter()
-
getGlobalCDATASubfilter
public String getGlobalCDATASubfilter()
-
getCodeFinderRules
public String getCodeFinderRules()
-
getElementType
public String getElementType(net.htmlparser.jericho.Tag element)
-
getAttributeRuleTypes
public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getAttributeRuleTypes(String attribute, String tag, Map<String,String> attributes)
-
getAttributeRuleTypes
public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getAttributeRuleTypes(String attribute)
-
getAttributeRuleTypes
public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getAttributeRuleTypes(String attribute, String tag)
-
getAttributeOnElementRuleTypes
public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getAttributeOnElementRuleTypes(String tag, String attribute, Map<String,String> attributes)
Get all theTaggedFilterConfiguration.RULE_TYPEs for attributes found on element rules.- Parameters:
tag-attribute-attributes-- Returns:
-
getElementRuleTypes
public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getElementRuleTypes(String tag, Map<String,String> attributes, boolean isStartTag)
-
getElementRuleTypes
public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getElementRuleTypes(String tag, boolean isStartTag)
Go through all matched rules (including regex) and record theTaggedFilterConfiguration.RULE_TYPEAny rules with conditions are automatically false since we have no attributes.- Parameters:
tag- the markup tag (converted to lowercase for search)isEndTag- is this tag an ending tag?- Returns:
- all matching
TaggedFilterConfiguration.RULE_TYPEas anEnumSet
-
doesElementRuleConditionApply
public boolean doesElementRuleConditionApply(Map elementRule, Map<String,String> attributes)
-
isTranslatableAttribute
public boolean isTranslatableAttribute(String tag, String attribute, Map<String,String> attributes)
-
isReadOnlyLocalizableAttribute
public boolean isReadOnlyLocalizableAttribute(String tag, String attribute, Map<String,String> attributes)
-
isWritableLocalizableAttribute
public boolean isWritableLocalizableAttribute(String tag, String attribute, Map<String,String> attributes)
-
isIdAttribute
public boolean isIdAttribute(String tag, String attribute, Map<String,String> attributes)
-
isPreserveWhitespaceCondition
public boolean isPreserveWhitespaceCondition(String attribute, Map<String,String> attributes)
-
isDefaultWhitespaceCondition
public boolean isDefaultWhitespaceCondition(String attribute, Map<String,String> attributes)
-
getSimplifierRules
public String getSimplifierRules()
-
setSimplifierRules
public void setSimplifierRules(String rules)
-
getQuoteModeDefined
public boolean getQuoteModeDefined()
-
setQuoteModeDefined
public void setQuoteModeDefined(boolean defined)
-
getQuoteMode
public int getQuoteMode()
-
setQuoteMode
public void setQuoteMode(String quoteMode)
-
-