public interface ElementContentHandler
An ElementContentHandler allows values defined for capture by a ContentReader’s methods to be returned as actual values instead of needing to define an EntitySettings and retrieving them as a Result with rows.
| Modifier and Type | Method and Description |
|---|---|
void |
download(com.univocity.api.net.HttpResponseReader contentReader)
Specifies that the parser will download content from the URL in the HTML element defined by the
path.
|
void |
download(com.univocity.api.net.UrlReaderProvider baseUrlProvider,
com.univocity.api.net.HttpResponseReader contentReader)
Specifies that the parser will download content from the URL in the HTML element defined by the
path.
|
String |
getValue()
Get the first value from the first node matched using a specific
ElementPath. |
List<String> |
getValues()
Get all values from the nodes matched using a specific
ElementPath. |
T |
transform(com.univocity.api.common.StringTransformation transformation)
Assigns a
StringTransformation to the current field. |
List<String> getValues()
Get all values from the nodes matched using a specific ElementPath.
For example, to get the text of all of p elements from an input such as:
<h1>Header</h1>
<p>First P tag</p>
<span>Strong text</span>
<p>Second P tag</p>
<p>Third P tag</p>
We can write the code to do the following:
String input = "<h1>Header</h1><p>First P tag</p><span>Strong text</span><p>Second P tag</p><p>Third P tag</p>";
HtmlElement root = HtmlParser.parse(new StringReaderProvider(input));
List<String> allPTagText = root.query().match("p").getText().getValues();
The resulting allPTagText list will contain the values:
["First P tag", "Second P tag", "Third P tag"]
ElementPath.String getValue()
Get the first value from the first node matched using a specific ElementPath.
For example, to get the text of the first of p element from an input such as:
<h1>Header</h1>
<p>First P tag</p>
<span>Strong text</span>
<p>Second P tag</p>
<p>Third P tag</p>
We can write the code to do the following:
String input = "<h1>Header</h1><p>First P tag</p><span>Strong text</span><p>Second P tag</p><p>Third P tag</p>";
HtmlElement root = HtmlParser.parse(new StringReaderProvider(input));
String value = root.query().match("p").getText().getValue();
The resulting value will contain:
"First P tag"
ElementPath.void download(com.univocity.api.net.HttpResponseReader contentReader)
HttpResponseReader, provided by the user.contentReader - a user-provided callback to process the remote content.void download(com.univocity.api.net.UrlReaderProvider baseUrlProvider,
com.univocity.api.net.HttpResponseReader contentReader)
HttpResponseReader, provided by the user.baseUrlProvider - the base URL and associated configuration to be used for downloading the content.
Required for downloading content wile parsing data from local files.contentReader - a user-provided callback to process the remote content.T transform(com.univocity.api.common.StringTransformation transformation)
StringTransformation to the current field. Once the parser collects a value for the field,
it will invoke the Transformation.transform(Object) to modify it. The result of the transformation
will be assigned to the fieldtransformation - the transformation to be applied over the content parsed for a given field.Copyright © 2018 uniVocity Software Pty Ltd. All rights reserved.