Package org.htmlunit.cyberneko
Class HTMLNamedEntitiesParser
- java.lang.Object
-
- org.htmlunit.cyberneko.HTMLNamedEntitiesParser
-
public final class HTMLNamedEntitiesParser extends Object
This is a very specialized class for recognizing HTML named entities with the ability to look them up in stages. It is stateless and hence memory friendly. Additionally, it is not generated code rather it sets itself up from a file at first use and stays fixed from now on. Technically, it is not a parser anymore, because it does not have a state that matches the HTML standard: 12.2.5.72 Character reference stateBecause it is stateless, it delegates the state handling to the user in the sense of how many characters one saw and when to stop doing things.
- Author:
- René Schwietzke, Ronald Brill
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classHTMLNamedEntitiesParser.RootStateThis is our initial state and has a special optimization applied.static classHTMLNamedEntitiesParser.StateOur "level" in the treeish structure that keeps its static state and the next level underneath.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static HTMLNamedEntitiesParserget()Returns the singleton.HTMLNamedEntitiesParser.Statelookup(int character, HTMLNamedEntitiesParser.State state)Pseudo parses and entity character by character.HTMLNamedEntitiesParser.Statelookup(String entityName)Utility method, mostly for testing, that allows us to look up and entity from a string instead from single characters.StringlookupEntityRefFor(String key)
-
-
-
Method Detail
-
get
public static HTMLNamedEntitiesParser get()
Returns the singleton. The singleton is stateless and can safely be used in a multi-threaded context.- Returns:
- the singleton instance of the parser, can never be null
-
lookup
public HTMLNamedEntitiesParser.State lookup(String entityName)
Utility method, mostly for testing, that allows us to look up and entity from a string instead from single characters.- Parameters:
entityName- the entity to look up- Returns:
- a state that resembles the result, will never be null
-
lookup
public HTMLNamedEntitiesParser.State lookup(int character, HTMLNamedEntitiesParser.State state)
Pseudo parses and entity character by character. We assume that we get presented with the chars after the starting ampersand. This parser does not supported unicode entities, hence this has to be handled differently.- Parameters:
character- the next character, should not be the ampersand everstate- the last known state or null in case we start to parse- Returns:
- the current state, which might be a valid final result, see
HTMLNamedEntitiesParser.State
-
-