Class HTMLTagBalancer

  • All Implemented Interfaces:
    HTMLComponent, XMLComponent, XMLDocumentFilter, XMLDocumentSource, XMLDocumentHandler

    public class HTMLTagBalancer
    extends Object
    implements XMLDocumentFilter, HTMLComponent
    Balances tags in an HTML document. This component receives document events and tries to correct many common mistakes that human (and computer) HTML document authors make. This tag balancer can:
    • add missing parent elements;
    • automatically close elements with optional end tags; and
    • handle mis-matched inline element tags.

    This component recognizes the following features:

    • http://cyberneko.org/html/features/augmentations
    • http://cyberneko.org/html/features/report-errors
    • http://cyberneko.org/html/features/balance-tags/document-fragment
    • http://cyberneko.org/html/features/balance-tags/ignore-outside-content

    This component recognizes the following properties:

    • http://cyberneko.org/html/properties/names/elems
    • http://cyberneko.org/html/properties/names/attrs
    • http://cyberneko.org/html/properties/error-reporter
    • http://cyberneko.org/html/properties/balance-tags/current-stack
    Author:
    Andy Clark, Marc Guillemot, Ronald Brill
    See Also:
    HTMLElements
    • Field Detail

      • NAMES_ELEMS

        protected static final String NAMES_ELEMS
        Modify HTML element names: { "upper", "lower", "default" }.
        See Also:
        Constant Field Values
      • NAMES_ATTRS

        protected static final String NAMES_ATTRS
        Modify HTML attribute names: { "upper", "lower", "default" }.
        See Also:
        Constant Field Values
      • FRAGMENT_CONTEXT_STACK

        public static final String FRAGMENT_CONTEXT_STACK
        <font color="red">EXPERIMENTAL: may change in next release</font><br/> Name of the property holding the stack of elements in which context a document fragment should be parsed.
        See Also:
        Constant Field Values
      • fNamespaces

        protected boolean fNamespaces
        Namespaces.
      • fAugmentations

        protected boolean fAugmentations
        Include infoset augmentations.
      • fReportErrors

        protected boolean fReportErrors
        Report errors.
      • fDocumentFragment

        protected boolean fDocumentFragment
        Document fragment balancing only.
      • fTemplateFragment

        protected boolean fTemplateFragment
        Template document fragment balancing only.
      • fIgnoreOutsideContent

        protected boolean fIgnoreOutsideContent
        Ignore outside content.
      • fAllowSelfclosingIframe

        protected boolean fAllowSelfclosingIframe
        Allows self closing iframe tags.
      • fAllowSelfclosingScript

        protected boolean fAllowSelfclosingScript
        Allows self closing script tags.
      • fAllowSelfclosingTags

        protected boolean fAllowSelfclosingTags
        Allows self closing tags.
      • fNamesElems

        protected short fNamesElems
        Modify HTML element names.
      • fSeenAnything

        protected boolean fSeenAnything
        True if seen anything. Important for xml declaration.
      • fSeenDoctype

        protected boolean fSeenDoctype
        True if root element has been seen.
      • fSeenRootElement

        protected boolean fSeenRootElement
        True if root element has been seen.
      • fSeenRootElementEnd

        protected boolean fSeenRootElementEnd
        True if seen the end of the document element. In other words, this variable is set to false until the end </HTML> tag is seen (or synthesized). This is used to ensure that extraneous events after the end of the document element do not make the document stream ill-formed.
      • fSeenRealHtmlElement

        protected boolean fSeenRealHtmlElement
        True if seen head element.
      • fSeenHeadElement

        protected boolean fSeenHeadElement
        True if seen head element.
      • fSeenBodyElement

        protected boolean fSeenBodyElement
        True if seen body element.
      • fOpenedForm

        protected boolean fOpenedForm
        True if a form is in the stack (allow to discard opening of nested forms)
      • fOpenedSvg

        protected boolean fOpenedSvg
        True if a svg is in the stack (no parent checking takes place)
      • fOpenedSelect

        protected boolean fOpenedSelect
        True if a select is in the stack
    • Method Detail

      • getFeatureDefault

        public Boolean getFeatureDefault​(String featureId)
        Returns the default state for a feature.
        Specified by:
        getFeatureDefault in interface HTMLComponent
        Specified by:
        getFeatureDefault in interface XMLComponent
        Parameters:
        featureId - The feature identifier.
        Returns:
        the default state for a feature, or null if this component does not want to report a default value for this feature.
      • getPropertyDefault

        public Object getPropertyDefault​(String propertyId)
        Returns the default state for a property.
        Specified by:
        getPropertyDefault in interface HTMLComponent
        Specified by:
        getPropertyDefault in interface XMLComponent
        Parameters:
        propertyId - The property identifier.
        Returns:
        the default state for a property, or null if this component does not want to report a default value for this property
      • getRecognizedFeatures

        public String[] getRecognizedFeatures()
        Returns recognized features.
        Specified by:
        getRecognizedFeatures in interface XMLComponent
        Returns:
        an array of feature identifiers that are recognized by this component. This method may return null if no features are recognized by this component.
      • getRecognizedProperties

        public String[] getRecognizedProperties()
        Returns recognized properties.
        Specified by:
        getRecognizedProperties in interface XMLComponent
        Returns:
        an array of property identifiers that are recognized by this component. This method may return null if no properties are recognized by this component.
      • setFeature

        public void setFeature​(String featureId,
                               boolean state)
                        throws XMLConfigurationException
        Sets a feature.
        Specified by:
        setFeature in interface XMLComponent
        Parameters:
        featureId - The feature identifier.
        state - The state of the feature.
        Throws:
        XMLConfigurationException - Thrown for configuration error. In general, components should only throw this exception if it is really a critical error.
      • startDocument

        public void startDocument​(XMLLocator locator,
                                  String encoding,
                                  NamespaceContext nscontext,
                                  Augmentations augs)
                           throws XNIException
        Start document.
        Specified by:
        startDocument in interface XMLDocumentHandler
        Parameters:
        locator - The document locator, or null if the document location cannot be reported during the parsing of this document. However, it is strongly recommended that a locator be supplied that can at least report the system identifier of the document.
        encoding - The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).
        nscontext - The namespace context in effect at the start of this document. This object represents the current context. Implementors of this class are responsible for copying the namespace bindings from the the current context (and its parent contexts) if that information is important.
        augs - Additional information that may include infoset augmentations
        Throws:
        XNIException - Thrown by handler to signal an error.
      • xmlDecl

        public void xmlDecl​(String version,
                            String encoding,
                            String standalone,
                            Augmentations augs)
                     throws XNIException
        XML declaration.
        Specified by:
        xmlDecl in interface XMLDocumentHandler
        Parameters:
        version - The XML version.
        encoding - The IANA encoding name of the document, or null if not specified.
        standalone - The standalone value, or null if not specified.
        augs - Additional information that may include infoset augmentations
        Throws:
        XNIException - Thrown by handler to signal an error.
      • doctypeDecl

        public void doctypeDecl​(String rootElementName,
                                String publicId,
                                String systemId,
                                Augmentations augs)
                         throws XNIException
        Doctype declaration.
        Specified by:
        doctypeDecl in interface XMLDocumentHandler
        Parameters:
        rootElementName - The name of the root element.
        publicId - The public identifier if an external DTD or null if the external DTD is specified using SYSTEM.
        systemId - The system identifier if an external DTD, null otherwise.
        augs - Additional information that may include infoset augmentations
        Throws:
        XNIException - Thrown by handler to signal an error.
      • getElementDepth

        protected final int getElementDepth​(HTMLElements.Element element)
        Parameters:
        element - The element.
        Returns:
        the depth of the open tag associated with the specified element name or -1 if no matching element is found.
      • getParentDepth

        protected int getParentDepth​(HTMLElements.Element element)
        Parameters:
        element - the element to get the parents from.
        Returns:
        the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.
      • synthesizedAugs

        protected final Augmentations synthesizedAugs()
      • modifyName

        protected static String modifyName​(String name,
                                           short mode)
      • getNamesValue

        protected static short getNamesValue​(String value)