public interface FieldDefinition
Provides the options available for adding fields into a HTML entity, which are defined with the help of HtmlEntitySettings, a Group or a PartialPath associated with the given entity.
Group,
PartialPath,
PathStart,
HtmlEntitySettings| Modifier and Type | Method and Description |
|---|---|
PathStart |
addField(String fieldName)
Associates a regular field with an entity.
|
void |
addField(String fieldName,
String constantValue)
Creates a field that with a specified value.
|
PathStart |
addPersistentField(String fieldName)
Associates a persistent field with an entity.
|
PathStart |
addSilentField(String fieldName)
Associates a “silent” field with an entity.
|
PathStart addField(String fieldName)
Associates a regular field with an entity. Regular fields are used by the parser to retain values for a row. When all values of a row are collected, the parser submits the row to the output, and clears all values collected for all fields. If the parser collects a value for a field that already contains data, the record will be submitted to the output and the incoming value will be associated with the given field in a new row.
For example, you could define a field called “headings” then match h1 elements to get their text. When the parser runs, the h1 elements found the HTML document will be returned and be available in the field “headings”, e.g.:
HtmlEntityList entityList = new HtmlEntityList();
entityList.configureEntity("heading)
.addField("headings")
.match("h1")
.getText();
PathStart addPersistentField(String fieldName)
Associates a persistent field with an entity. A persistent field is a field that retains its value until it is overwritten by the parser. When all values of a row are collected, the parser submits the row to the output, and clears the values collected for all fields, except the persistent ones, so they will be reused in subsequent records.
An example of using persistent fields can be explained by viewing this HTML:
<div id="55">
<article>
<h1>first</h1>
<p>lorem</p>
</article>
<article>
<h1>second</h1>
<p>ipsum</p>
</article>
</div>
In this example, we want get two rows with three columns: [55, first, lorem] and [55, second, ipsum]. The value “55” in both records should come from the id of the div. The following rules can be defined to produce this output:
HtmlEntityList entities = new HtmlEntityList();
HtmlEntitySettings entity = entities.configureEntity("test");
entity.addPersistentField("persistentID").match("div").getAttribute("id");
entity.addField("title").match("h1").getText();
entity.addField("text").match("p").getText();
As the “persistentID” field was created as a persistent field, it will retain its value and the parser will reapply it into subsequent rows. If a regular addField(String) were used instead, the output would be [55, first, lorem] and [null, second, ipsum] as the div and its id would be matched once only.
NOTE: A persistent field is also “silent” and does not trigger new rows (see addSilentField(String). If a persistent field’s path finds another match while processing the same record, the first value will be replaced by the new one, and no new records will be generated.
A RecordTrigger can be used to force new rows to be generated.
PathStart addSilentField(String fieldName)
Associates a “silent” field with an entity. A silent field does not trigger new records when values of a field are overwritten, i.e. if the parser collects a value for a field that already contains data, and the field is silent, it won’t submit a new record. The parser will simply replace the previously collected value with the newly parsed value.
A RecordTrigger can be used to force new rows to be generated.
A usage example of silent fields can be shown with this HTML document:
<div>
<article class="feature">
<h1>first</h1>
<p>lorem</p>
<h1>second</h1>
</article>
</div>
To get the text of the p element along with the second header:
HtmlEntityList entities = new HtmlEntityList();
HtmlEntitySettings entity = entities.configureEntity("test");
entity.addSilentField("silent")
.match("h1")
.containedBy("article")
.getText();
entity.addField("text").match("article").match("p").getText();
The parser will return [second, lorem]. When the parser finishes parsing the p element, the row will actually be [first, lorem]. As soon as the parser finds the second h1 element, instead of creating a new row with this value, it will replace the “first” String with “second” generating the row [second, lorem].
If addField was used in this example instead of addSilentField, two rows would be produced: [first, lorem] and [second, null]
void addField(String fieldName, String constantValue)
Creates a field that with a specified value. An example to use this method can be shown with this HTML document:
<div>
<article>
<h1>first</h1>
<p>lorem</p>
</article>
<article>
<h1>second</h1>
<p>ipsum</p>
</article>
<article>
<h1>third</h1>
<p>lol</p>
</article>
</div>
And the following code:
HtmlEntityList entities = new HtmlEntityList();
HtmlEntitySettings entity = entities.configureEntity("test");
// creates a constant field
entity.addField("constant","cool article");
// regular fields
entity.addField("title").match("h1").getText();
entity.addField("content").match("p").getText();
When the parser runs, it will get the text from each article heading and p element. It will also attach the constant “cool article” to the first column of each row, producing:
[cool article, first, lorem]
[cool article, second, ipsum]
[cool article, third, lol]
fieldName - name of the field to be createdconstantValue - a constant value associated with the given fieldCopyright © 2018 uniVocity Software Pty Ltd. All rights reserved.