public interface HtmlParsingContext
extends com.univocity.parsers.common.Context
A class that returns information about HtmlParser’s parsing process.
HtmlParser,
Context,
HtmlElement| Modifier and Type | Method and Description |
|---|---|
Set<String> |
binaryFields()
Returns the names of the fields set to download content, i.e.
|
HtmlElement |
currentElement()
Returns the element that the parser is currently visiting.
|
int |
currentNodeDepth()
Returns the current node depth of the parser.
|
Object |
documentSource()
Returns the source of the current document being parsed.
|
String |
entityName()
Returns the name of the HTML entity that the
HtmlParser is using to parse the HTML document. |
File |
getFile(String binaryFieldName)
Returns the file that was last downloaded for a given binary field.
|
Map<String,HtmlElement[]> |
getMatchedElements()
Returns a
Map of fields associated with the current sequence of HtmlElements that have been matched by the parser, i.e. |
Map<String,String> |
matchedData()
Returns a
Map of matched data where the value is the field name and the value is the data that was matched. |
HtmlElement |
pageRoot()
Returns the root element of the HTML tree being processed by the parser.
|
com.univocity.parsers.common.ResultRecordMetaData |
recordMetaData()
Returns the metadata information associated with records produced by the current parsing process.
|
com.univocity.api.net.HttpResponse |
response()
If the
HtmlParser is reading from a web page, returns the HttpResponse that it is using to process the input HTML. |
HtmlRecord |
toRecord(String[] row)
Converts the given parsed row to a
HtmlRecord |
com.univocity.api.net.HttpResponse response()
If the HtmlParser is reading from a web page, returns the HttpResponse that it is using to process the input HTML. Otherwise it will return null.
HttpResponse that the parser is using, or null if parsing local file.Map<String,String> matchedData()
Returns a Map of matched data where the value is the field name and the value is the data that was matched. Values are matched when the HtmlParser encounters a value defined by a path set by an HtmlEntitySettings’ added field.
Map of data that was matched for the current record of the current HTML entity while parsingMap<String,HtmlElement[]> getMatchedElements()
Returns a Map of fields associated with the current sequence of HtmlElements that have been matched by the parser, i.e. the parser collected a value for each field in this map.
Will only return a map with field names and their matched element sequence from inside an HtmlParserListener.elementMatched(HtmlElement, HtmlParsingContext) callback, otherwise an empty map will be returned.
Map of fields whose paths have been matched by a given HtmlElement.String entityName()
Returns the name of the HTML entity that the HtmlParser is using to parse the HTML document.
HtmlParserint currentNodeDepth()
Returns the current node depth of the parser. Node depth is how many layers deep the currently visited HTML element is. For example given a simple HTML document like: <div><span><span><div>. When the parser visits the span element, the current node depth would be 1 (the node depth of div would be 0).
HtmlElement currentElement()
Returns the element that the parser is currently visiting.
HtmlElement pageRoot()
Returns the root element of the HTML tree being processed by the parser. Typically this is the <html> element.
File getFile(String binaryFieldName)
Returns the file that was last downloaded for a given binary field. Binary fields are defined using FieldContentTransform.download().
binaryFieldName - name that identifies a field configured to download binary content.Set<String> binaryFields()
Returns the names of the fields set to download content, i.e. names of fields defined using FieldContentTransform.download()
HtmlRecord toRecord(String[] row)
Converts the given parsed row to a HtmlRecord
toRecord in interface com.univocity.parsers.common.ContextHtmlRecord representing the given row.com.univocity.parsers.common.ResultRecordMetaData recordMetaData()
Returns the metadata information associated with records produced by the current parsing process.
recordMetaData in interface com.univocity.parsers.common.ContextObject documentSource()
Returns the source of the current document being parsed. If running locally, will return the File whose contents are being read, if remote the UrlReaderProvider will be returned. Otherwise will return the given input back i.e. a Reader implementation, or the HtmlElement if the parser is running over a HTML tree.
Copyright © 2018 uniVocity Software Pty Ltd. All rights reserved.