Package org.htmlunit.cyberneko
Class HTMLTagBalancer
- java.lang.Object
-
- org.htmlunit.cyberneko.HTMLTagBalancer
-
- All Implemented Interfaces:
HTMLComponent,XMLComponent,XMLDocumentFilter,XMLDocumentSource,XMLDocumentHandler
public class HTMLTagBalancer extends Object implements XMLDocumentFilter, HTMLComponent
Balances tags in an HTML document. This component receives document events and tries to correct many common mistakes that human (and computer) HTML document authors make. This tag balancer can:- add missing parent elements;
- automatically close elements with optional end tags; and
- handle mis-matched inline element tags.
This component recognizes the following features:
- http://cyberneko.org/html/features/augmentations
- http://cyberneko.org/html/features/report-errors
- http://cyberneko.org/html/features/balance-tags/document-fragment
- http://cyberneko.org/html/features/balance-tags/ignore-outside-content
This component recognizes the following properties:
- http://cyberneko.org/html/properties/names/elems
- http://cyberneko.org/html/properties/names/attrs
- http://cyberneko.org/html/properties/error-reporter
- http://cyberneko.org/html/properties/balance-tags/current-stack
- Author:
- Andy Clark, Marc Guillemot, Ronald Brill
- See Also:
HTMLElements
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classHTMLTagBalancer.InfoElement info for each start element.static classHTMLTagBalancer.InfoStackUnsynchronized stack of element information.
-
Field Summary
Fields Modifier and Type Field Description protected static StringAUGMENTATIONSInclude infoset augmentations.protected static StringDOCUMENT_FRAGMENTDocument fragment balancing only.protected static StringERROR_REPORTERError reporter.protected booleanfAllowSelfclosingIframeAllows self closing iframe tags.protected booleanfAllowSelfclosingScriptAllows self closing script tags.protected booleanfAllowSelfclosingTagsAllows self closing tags.protected booleanfAugmentationsInclude infoset augmentations.protected booleanfDocumentFragmentDocument fragment balancing only.protected HTMLTagBalancer.InfoStackfElementStackThe element stack.protected HTMLErrorReporterfErrorReporterError reporter.protected booleanfIgnoreOutsideContentIgnore outside content.protected HTMLTagBalancer.InfoStackfInlineStackThe inline stack.protected shortfNamesElemsModify HTML element names.protected booleanfNamespacesNamespaces.protected booleanfOpenedFormTrue if a form is in the stack (allow to discard opening of nested forms)protected booleanfOpenedSelectTrue if a select is in the stackprotected booleanfOpenedSvgTrue if a svg is in the stack (no parent checking takes place)static StringFRAGMENT_CONTEXT_STACK<font color="red">EXPERIMENTAL: may change in next release</font><br/> Name of the property holding the stack of elements in which context a document fragment should be parsed.protected booleanfReportErrorsReport errors.protected booleanfSeenAnythingTrue if seen anything.protected booleanfSeenBodyElementTrue if seenbodyelement.protected booleanfSeenDoctypeTrue if root element has been seen.protected booleanfSeenHeadElementTrue if seenheadelement.protected booleanfSeenRealHtmlElementTrue if seenheadelement.protected booleanfSeenRootElementTrue if root element has been seen.protected booleanfSeenRootElementEndTrue if seen the end of the document element.protected booleanfTemplateFragmentTemplate document fragment balancing only.protected static StringIGNORE_OUTSIDE_CONTENTIgnore outside content.protected static StringNAMES_ATTRSModify HTML attribute names: { "upper", "lower", "default" }.protected static StringNAMES_ELEMSModify HTML element names: { "upper", "lower", "default" }.protected static StringNAMESPACESNamespaces.protected static StringREPORT_ERRORSReport errors.protected HTMLTagBalancingListenertagBalancingListener
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidcallEndElement(QName element, Augmentations augs)protected voidcallStartElement(QName element, XMLAttributes attrs, Augmentations augs)voidcharacters(XMLString text, Augmentations augs)Characters.voidcomment(XMLString text, Augmentations augs)Comment.voiddoctypeDecl(String rootElementName, String publicId, String systemId, Augmentations augs)Doctype declaration.voidemptyElement(QName element, XMLAttributes attrs, Augmentations augs)Empty element.voidendCDATA(Augmentations augs)End CDATA section.voidendDocument(Augmentations augs)End document.voidendElement(QName element, Augmentations augs)End element.XMLDocumentHandlergetDocumentHandler()Returns the document handler.XMLDocumentSourcegetDocumentSource()protected HTMLElements.ElementgetElement(QName elementName)protected intgetElementDepth(HTMLElements.Element element)BooleangetFeatureDefault(String featureId)Returns the default state for a feature.protected static shortgetNamesValue(String value)protected intgetParentDepth(HTMLElements.Element element)ObjectgetPropertyDefault(String propertyId)Returns the default state for a property.String[]getRecognizedFeatures()Returns recognized features.String[]getRecognizedProperties()Returns recognized properties.protected static StringmodifyName(String name, short mode)voidprocessingInstruction(String target, XMLString data, Augmentations augs)Processing instruction.voidreset(XMLComponentManager manager)Resets the component.voidsetDocumentHandler(XMLDocumentHandler handler)Sets the document handler.voidsetDocumentSource(XMLDocumentSource source)Sets the document source.voidsetFeature(String featureId, boolean state)Sets a feature.voidsetProperty(String propertyId, Object value)Sets a property.voidstartCDATA(Augmentations augs)Start CDATA section.voidstartDocument(XMLLocator locator, String encoding, NamespaceContext nscontext, Augmentations augs)Start document.voidstartElement(QName elem, XMLAttributes attrs, Augmentations augs)Start element.protected AugmentationssynthesizedAugs()voidxmlDecl(String version, String encoding, String standalone, Augmentations augs)XML declaration.
-
-
-
Field Detail
-
NAMESPACES
protected static final String NAMESPACES
Namespaces.- See Also:
- Constant Field Values
-
AUGMENTATIONS
protected static final String AUGMENTATIONS
Include infoset augmentations.- See Also:
- Constant Field Values
-
REPORT_ERRORS
protected static final String REPORT_ERRORS
Report errors.- See Also:
- Constant Field Values
-
DOCUMENT_FRAGMENT
protected static final String DOCUMENT_FRAGMENT
Document fragment balancing only.- See Also:
- Constant Field Values
-
IGNORE_OUTSIDE_CONTENT
protected static final String IGNORE_OUTSIDE_CONTENT
Ignore outside content.- See Also:
- Constant Field Values
-
NAMES_ELEMS
protected static final String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.- See Also:
- Constant Field Values
-
NAMES_ATTRS
protected static final String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.- See Also:
- Constant Field Values
-
ERROR_REPORTER
protected static final String ERROR_REPORTER
Error reporter.- See Also:
- Constant Field Values
-
FRAGMENT_CONTEXT_STACK
public static final String FRAGMENT_CONTEXT_STACK
<font color="red">EXPERIMENTAL: may change in next release</font><br/> Name of the property holding the stack of elements in which context a document fragment should be parsed.- See Also:
- Constant Field Values
-
fNamespaces
protected boolean fNamespaces
Namespaces.
-
fAugmentations
protected boolean fAugmentations
Include infoset augmentations.
-
fReportErrors
protected boolean fReportErrors
Report errors.
-
fDocumentFragment
protected boolean fDocumentFragment
Document fragment balancing only.
-
fTemplateFragment
protected boolean fTemplateFragment
Template document fragment balancing only.
-
fIgnoreOutsideContent
protected boolean fIgnoreOutsideContent
Ignore outside content.
-
fAllowSelfclosingIframe
protected boolean fAllowSelfclosingIframe
Allows self closing iframe tags.
-
fAllowSelfclosingScript
protected boolean fAllowSelfclosingScript
Allows self closing script tags.
-
fAllowSelfclosingTags
protected boolean fAllowSelfclosingTags
Allows self closing tags.
-
fNamesElems
protected short fNamesElems
Modify HTML element names.
-
fErrorReporter
protected HTMLErrorReporter fErrorReporter
Error reporter.
-
fElementStack
protected final HTMLTagBalancer.InfoStack fElementStack
The element stack.
-
fInlineStack
protected final HTMLTagBalancer.InfoStack fInlineStack
The inline stack.
-
fSeenAnything
protected boolean fSeenAnything
True if seen anything. Important for xml declaration.
-
fSeenDoctype
protected boolean fSeenDoctype
True if root element has been seen.
-
fSeenRootElement
protected boolean fSeenRootElement
True if root element has been seen.
-
fSeenRootElementEnd
protected boolean fSeenRootElementEnd
True if seen the end of the document element. In other words, this variable is set to false until the end </HTML> tag is seen (or synthesized). This is used to ensure that extraneous events after the end of the document element do not make the document stream ill-formed.
-
fSeenRealHtmlElement
protected boolean fSeenRealHtmlElement
True if seenheadelement.
-
fSeenHeadElement
protected boolean fSeenHeadElement
True if seenheadelement.
-
fSeenBodyElement
protected boolean fSeenBodyElement
True if seenbodyelement.
-
fOpenedForm
protected boolean fOpenedForm
True if a form is in the stack (allow to discard opening of nested forms)
-
fOpenedSvg
protected boolean fOpenedSvg
True if a svg is in the stack (no parent checking takes place)
-
fOpenedSelect
protected boolean fOpenedSelect
True if a select is in the stack
-
tagBalancingListener
protected HTMLTagBalancingListener tagBalancingListener
-
-
Method Detail
-
getFeatureDefault
public Boolean getFeatureDefault(String featureId)
Returns the default state for a feature.- Specified by:
getFeatureDefaultin interfaceHTMLComponent- Specified by:
getFeatureDefaultin interfaceXMLComponent- Parameters:
featureId- The feature identifier.- Returns:
- the default state for a feature, or null if this component does not want to report a default value for this feature.
-
getPropertyDefault
public Object getPropertyDefault(String propertyId)
Returns the default state for a property.- Specified by:
getPropertyDefaultin interfaceHTMLComponent- Specified by:
getPropertyDefaultin interfaceXMLComponent- Parameters:
propertyId- The property identifier.- Returns:
- the default state for a property, or null if this component does not want to report a default value for this property
-
getRecognizedFeatures
public String[] getRecognizedFeatures()
Returns recognized features.- Specified by:
getRecognizedFeaturesin interfaceXMLComponent- Returns:
- an array of feature identifiers that are recognized by this component. This method may return null if no features are recognized by this component.
-
getRecognizedProperties
public String[] getRecognizedProperties()
Returns recognized properties.- Specified by:
getRecognizedPropertiesin interfaceXMLComponent- Returns:
- an array of property identifiers that are recognized by this component. This method may return null if no properties are recognized by this component.
-
reset
public void reset(XMLComponentManager manager) throws XMLConfigurationException
Resets the component.- Specified by:
resetin interfaceXMLComponent- Parameters:
manager- The component manager.- Throws:
XMLConfigurationException
-
setFeature
public void setFeature(String featureId, boolean state) throws XMLConfigurationException
Sets a feature.- Specified by:
setFeaturein interfaceXMLComponent- Parameters:
featureId- The feature identifier.state- The state of the feature.- Throws:
XMLConfigurationException- Thrown for configuration error. In general, components should only throw this exception if it is really a critical error.
-
setProperty
public void setProperty(String propertyId, Object value) throws XMLConfigurationException
Sets a property.- Specified by:
setPropertyin interfaceXMLComponent- Parameters:
propertyId- The property identifier.value- The value of the property.- Throws:
XMLConfigurationException- Thrown for configuration error. In general, components should only throw this exception if it is really a critical error.
-
setDocumentHandler
public void setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.- Specified by:
setDocumentHandlerin interfaceXMLDocumentSource- Parameters:
handler- the new handler
-
getDocumentHandler
public XMLDocumentHandler getDocumentHandler()
Returns the document handler.- Specified by:
getDocumentHandlerin interfaceXMLDocumentSource- Returns:
- the document handler
-
setDocumentSource
public void setDocumentSource(XMLDocumentSource source)
Sets the document source.- Specified by:
setDocumentSourcein interfaceXMLDocumentHandler- Parameters:
source- the new source
-
getDocumentSource
public XMLDocumentSource getDocumentSource()
- Specified by:
getDocumentSourcein interfaceXMLDocumentHandler- Returns:
- the document source.
-
startDocument
public void startDocument(XMLLocator locator, String encoding, NamespaceContext nscontext, Augmentations augs) throws XNIException
Start document.- Specified by:
startDocumentin interfaceXMLDocumentHandler- Parameters:
locator- The document locator, or null if the document location cannot be reported during the parsing of this document. However, it is strongly recommended that a locator be supplied that can at least report the system identifier of the document.encoding- The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).nscontext- The namespace context in effect at the start of this document. This object represents the current context. Implementors of this class are responsible for copying the namespace bindings from the the current context (and its parent contexts) if that information is important.augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by handler to signal an error.
-
xmlDecl
public void xmlDecl(String version, String encoding, String standalone, Augmentations augs) throws XNIException
XML declaration.- Specified by:
xmlDeclin interfaceXMLDocumentHandler- Parameters:
version- The XML version.encoding- The IANA encoding name of the document, or null if not specified.standalone- The standalone value, or null if not specified.augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by handler to signal an error.
-
doctypeDecl
public void doctypeDecl(String rootElementName, String publicId, String systemId, Augmentations augs) throws XNIException
Doctype declaration.- Specified by:
doctypeDeclin interfaceXMLDocumentHandler- Parameters:
rootElementName- The name of the root element.publicId- The public identifier if an external DTD or null if the external DTD is specified using SYSTEM.systemId- The system identifier if an external DTD, null otherwise.augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by handler to signal an error.
-
endDocument
public void endDocument(Augmentations augs) throws XNIException
End document.- Specified by:
endDocumentin interfaceXMLDocumentHandler- Parameters:
augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by handler to signal an error.
-
comment
public void comment(XMLString text, Augmentations augs) throws XNIException
Comment.- Specified by:
commentin interfaceXMLDocumentHandler- Parameters:
text- The text in the comment.augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by application to signal an error.
-
processingInstruction
public void processingInstruction(String target, XMLString data, Augmentations augs) throws XNIException
Processing instruction.- Specified by:
processingInstructionin interfaceXMLDocumentHandler- Parameters:
target- The target.data- The data or null if none specified.augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by handler to signal an error.
-
startElement
public void startElement(QName elem, XMLAttributes attrs, Augmentations augs) throws XNIException
Start element.- Specified by:
startElementin interfaceXMLDocumentHandler- Parameters:
elem- The name of the element.attrs- The element attributes.augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by handler to signal an error.
-
emptyElement
public void emptyElement(QName element, XMLAttributes attrs, Augmentations augs) throws XNIException
Empty element.- Specified by:
emptyElementin interfaceXMLDocumentHandler- Parameters:
element- The name of the element.attrs- The element attributes.augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by handler to signal an error.
-
startCDATA
public void startCDATA(Augmentations augs) throws XNIException
Start CDATA section.- Specified by:
startCDATAin interfaceXMLDocumentHandler- Parameters:
augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by handler to signal an error.
-
endCDATA
public void endCDATA(Augmentations augs) throws XNIException
End CDATA section.- Specified by:
endCDATAin interfaceXMLDocumentHandler- Parameters:
augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by handler to signal an error.
-
characters
public void characters(XMLString text, Augmentations augs) throws XNIException
Characters.- Specified by:
charactersin interfaceXMLDocumentHandler- Parameters:
text- The content.augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by handler to signal an error.
-
endElement
public void endElement(QName element, Augmentations augs) throws XNIException
End element.- Specified by:
endElementin interfaceXMLDocumentHandler- Parameters:
element- The name of the element.augs- Additional information that may include infoset augmentations- Throws:
XNIException- Thrown by handler to signal an error.
-
getElement
protected HTMLElements.Element getElement(QName elementName)
-
callStartElement
protected final void callStartElement(QName element, XMLAttributes attrs, Augmentations augs) throws XNIException
- Throws:
XNIException
-
callEndElement
protected final void callEndElement(QName element, Augmentations augs) throws XNIException
- Throws:
XNIException
-
getElementDepth
protected final int getElementDepth(HTMLElements.Element element)
- Parameters:
element- The element.- Returns:
- the depth of the open tag associated with the specified element name or -1 if no matching element is found.
-
getParentDepth
protected int getParentDepth(HTMLElements.Element element)
- Parameters:
element- the element to get the parents from.- Returns:
- the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.
-
synthesizedAugs
protected final Augmentations synthesizedAugs()
-
getNamesValue
protected static short getNamesValue(String value)
-
-