- childOf(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should be a child of a given element.
- children() - Method in interface com.univocity.api.entity.html.HtmlElement
-
Returns a copy of all children of this element in an array.
- classes(String, String...) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should contain the given CSS class names.
- classes() - Method in interface com.univocity.api.entity.html.HtmlElement
-
Returns the set of CSS classes of this element, or an empty set if has element has no class attribute defined.
- clone() - Method in class com.univocity.api.entity.html.FetchOptions
-
- clone() - Method in class com.univocity.api.entity.html.HtmlEntitySettings
-
- clone() - Method in class com.univocity.api.entity.html.HtmlParserSettings
-
- ColumnProcessor - Class in com.univocity.api.entity.html.processor
-
A simple
RowProcessor implementation that stores values of columns.
- ColumnProcessor() - Constructor for class com.univocity.api.entity.html.processor.ColumnProcessor
-
Constructs a column processor, pre-allocating room for 1000 rows.
- ColumnProcessor(int) - Constructor for class com.univocity.api.entity.html.processor.ColumnProcessor
-
Constructs a column processor pre-allocating room for the expected number of rows to be processed
- com.univocity.api.entity.html - package com.univocity.api.entity.html
-
- com.univocity.api.entity.html.builders - package com.univocity.api.entity.html.builders
-
- com.univocity.api.entity.html.builders.annotations - package com.univocity.api.entity.html.builders.annotations
-
- com.univocity.api.entity.html.processor - package com.univocity.api.entity.html.processor
-
- containedBy(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should be in the hierarchy of a given element.
- containedBy(String, int) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should be in the hierarchy of a given element, up to a given limit of parent nodes to visit.
- containing(String, String...) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should contain of one or more given elements in its hierarchy.
- containing(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should contain a given element in its hierarchy.
- containing(String, int) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should contain a given element in its hierarchy, provided it occurs within a given search depth.
- containsElementInHierarchy(HtmlElement) - Method in interface com.univocity.api.entity.html.HtmlElement
-
Returns true if the specified element is a descendant of the current element.
- ContentReader<T extends com.univocity.api.entity.html.builders.ContentHandler> - Interface in com.univocity.api.entity.html.builders
-
A
ContentReader defines what content will be read from the last element matched in a
FieldPath by the
HtmlParser.
- createGlobalSettings() - Method in class com.univocity.api.entity.html.HtmlParserSettings
-
- currentElement() - Method in interface com.univocity.api.entity.html.HtmlParsingContext
-
Returns the element that the parser is currently visiting.
- currentNodeDepth() - Method in interface com.univocity.api.entity.html.HtmlParsingContext
-
Returns the current node depth of the parser.
- FetchOptions - Class in com.univocity.api.entity.html
-
- FetchOptions() - Constructor for class com.univocity.api.entity.html.FetchOptions
-
Default constructor for FetchOptions Defaults to not flattening directory and accepting any String
- FetchOutput - Class in com.univocity.api.entity.html
-
- FetchOutput(HtmlElement, File, Map<File, URL>) - Constructor for class com.univocity.api.entity.html.FetchOutput
-
- fetchResources(FileProvider, FetchOptions) - Method in interface com.univocity.api.entity.html.HtmlElement
-
Saves the element to a local file using the FileProvider, searching all child nodes for external resources (e.g.
- fetchResources(File, FetchOptions) - Method in interface com.univocity.api.entity.html.HtmlElement
-
Save the element to a local file using the
File, searching all child nodes for external resources (e.g.
- fetchResources(File, String, FetchOptions) - Method in interface com.univocity.api.entity.html.HtmlElement
-
Save the element to a local file using the
File, searching all child nodes for external resources (e.g.
- fetchResources(File, Charset, FetchOptions) - Method in interface com.univocity.api.entity.html.HtmlElement
-
Save the element to a local file using the
File, searching all child nodes for external resources (e.g.
- fetchResources(String, FetchOptions) - Method in interface com.univocity.api.entity.html.HtmlElement
-
Save the element to a local file at the path pathToFile, searching all child nodes for external resources (e.g.
- fetchResources(String, String, FetchOptions) - Method in interface com.univocity.api.entity.html.HtmlElement
-
Save the element a local file at the path pathToFile, searching all child nodes for external resources (e.g.
- fetchResources(String, Charset, FetchOptions) - Method in interface com.univocity.api.entity.html.HtmlElement
-
Save the element to a local file at the path pathToFile, searching all child nodes for external resources (e.g.
- fetchResourcesBeforeParsing(FetchOptions) - Method in class com.univocity.api.entity.html.HtmlParserSettings
-
- fetchResourcesBeforeParsingEnabled() - Method in class com.univocity.api.entity.html.HtmlParserSettings
-
- FieldContentTransform - Interface in com.univocity.api.entity.html.builders
-
Allows the content captured for a given field, by a
ContentReader, to be transformed by a
StringTransformation to clean up or transform values or to obtain very specific textual content from the original value.
- FieldDefinition - Interface in com.univocity.api.entity.html.builders
-
Provides the options available for adding fields into a HTML entity, which are defined with the help of
HtmlEntitySettings, a
Group or a
PartialPath associated with the given entity.
- FieldPath - Interface in com.univocity.api.entity.html.builders
-
A path to a field of an entity.
- filter(HtmlElementMatcher) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
- flattenDirectories(boolean) - Method in class com.univocity.api.entity.html.FetchOptions
-
Option to flatten the path section of a fetched resource into the new filename.
- flattenDirectoryStructure() - Method in class com.univocity.api.entity.html.FetchOptions
-
Whether or not the resource filenames should be ‘flattened’.
- followedBy(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should have a given element placed after it, at any distance.
- followedBy(String, int) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should have a given element at a given distance after it.
- followedByText(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element must have a given text placed in an element before it.
- followedImmediatelyBy(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should have a given element placed directly after it.
- followLink() - Method in interface com.univocity.api.entity.html.builders.FieldContentTransform
-
Creates a
HtmlLinkFollower that will parse linked pages, each linked page URL is defined by the values retrieved by this field.
- followLink(UrlReaderProvider) - Method in interface com.univocity.api.entity.html.builders.FieldContentTransform
-
Creates a
HtmlLinkFollower that will parse linked pages, each linked page URL is defined by inserting the value retrieved by this field into the supplied
UrlReaderProvider as a parameter.
- followLink(String, UrlReaderProvider) - Method in class com.univocity.api.entity.html.HtmlEntitySettings
-
- getAttribute(String) - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Captures the value of an attribute of the HTML elements matched by the path.
- getBaseUri() - Method in class com.univocity.api.entity.html.FetchOptions
-
The current base URI associated with the document whose resources are being fetched.
- getCurrentPageRoot() - Method in interface com.univocity.api.entity.html.HtmlPaginationContext
-
Returns the root element of the HTML tree being processed by the parser.
- getDefaultFileExtension() - Method in class com.univocity.api.entity.html.HtmlParserSettings
-
- getDownloadHandler() - Method in class com.univocity.api.entity.html.FetchOptions
-
Returns the
DownloadHandler callback to be used by the fetch resources operation.
- getElement(HtmlElementTransformation) - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Gets the
HtmlElement described by the path and passes it to a custom
Transformation.transform(Object).
- getElement() - Method in interface com.univocity.api.entity.html.builders.ElementContentReader
-
Gets the first matching
HtmlElement when applying the previously defined matching rules.
- getElements() - Method in interface com.univocity.api.entity.html.builders.ElementContentReader
-
Gets all matching
HtmlElements when applying the previously defined matching rules.
- getFetchOptions() - Method in class com.univocity.api.entity.html.HtmlParserSettings
-
- getFieldNames() - Method in class com.univocity.api.entity.html.HtmlEntitySettings
-
- getFile(String) - Method in interface com.univocity.api.entity.html.HtmlParsingContext
-
Returns the file that was last downloaded for a given binary field.
- getFollowingText() - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Gets the text from the HTML element that is placed directly after the HTML elements matched by the path.
- getFollowingText(int) - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Gets the text from the specified number of HTMl elements following the HTML element matched by the path.
- getHeadingText() - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Used to get the text of a table header above a matched element.
- getHeadingText(int) - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Captures the text in the same column of the matched element, but in another row of the same table.
- getInternalSettings() - Method in class com.univocity.api.entity.html.HtmlEntitySettings
-
- getLinkedEntityData(int) - Method in interface com.univocity.api.entity.html.HtmlParserResult
-
- getLinkedEntityData() - Method in interface com.univocity.api.entity.html.HtmlRecord
-
- getLinkedFieldData(int) - Method in interface com.univocity.api.entity.html.HtmlParserResult
-
- getLinkedFieldData() - Method in interface com.univocity.api.entity.html.HtmlRecord
-
- getListener() - Method in class com.univocity.api.entity.html.HtmlEntitySettings
-
- getMatchedElements() - Method in interface com.univocity.api.entity.html.HtmlParsingContext
-
Returns a
Map of fields associated with the current sequence of
HtmlElements that have been matched by the parser, i.e.
- getOwnText() - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Specifies that the parser will return the text from HTML element specified by the path without including the text of its child nodes.
- getPaginationContext() - Method in class com.univocity.api.entity.html.HtmlParser
-
- getPaginationContext() - Method in interface com.univocity.api.entity.html.HtmlParserInterface
-
- getPaginator() - Method in class com.univocity.api.entity.html.HtmlEntityList
-
Returns the
HtmlPaginator associated with the
HtmlParserSettings of this
HtmlEntityList
- getPaginator() - Method in class com.univocity.api.entity.html.HtmlLinkFollower
-
Returns the
HtmlPaginator associated with the
HtmlParserSettings of this
HtmlEntityList
- getPaginator() - Method in class com.univocity.api.entity.html.HtmlParserSettings
-
Returns the
HtmlPaginator associated with this
HtmlParserSettings
- getParserSettings() - Method in class com.univocity.api.entity.html.HtmlEntityList
-
- getParserThreadCount() - Method in class com.univocity.api.entity.html.HtmlParserSettings
-
Returns the maximum number of threads used by the parser when processing data of multiple entities from the same HTML input.
- getPrecedingText() - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Specifies that the parser will return the text from the node that appears before the HTML element specified by the path.
- getPrecedingText(int) - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Collects the text from the specified number of HTML elements placed before the element that is matched by the path.
- getRemoteInterval() - Method in class com.univocity.api.entity.html.FetchOptions
-
Returns the minimum interval of time to wait between each download request.
- getResourceMap() - Method in class com.univocity.api.entity.html.FetchOutput
-
Returns the mapping of each local File that has been downloaded to its original remote URL
- getSharedResourceDir() - Method in class com.univocity.api.entity.html.FetchOptions
-
Returns the shared resource directory used to store files referenced by one or more HTML pages and CSS files.
- getText() - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Specifies that the parser will return the text contained within the HTML element defined by the path.
- getText(int) - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Specifies that the parser will return the text contained within the HTML elements matched by the path in addition to the text in the specified amount of following siblings.
- getTextAbove() - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Captures the text contained in the row and column above the HTML element matched by the path.
- getTextAbove(int) - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Specifies that the parser will return the text contained in the HTML element at a given distance above a matched element.
- getTextAbove(String, String...) - Method in interface com.univocity.api.entity.html.builders.ContentReader
-
Specifies that the parser will return the content of a row, given it contains some expected text, above a matched element.
- getTreeHtmlFile() - Method in class com.univocity.api.entity.html.FetchOutput
-
Returns File pointing to where the new HTML has been saved.
- getTreeRoot() - Method in class com.univocity.api.entity.html.FetchOutput
-
Returns the root
HtmlElement of the new HTML structure.
- getValue() - Method in interface com.univocity.api.entity.html.builders.ElementContentHandler
-
Get the first value from the first node matched using a specific
ElementPath.
- getValues() - Method in interface com.univocity.api.entity.html.builders.ElementContentHandler
-
Get all values from the nodes matched using a specific
ElementPath.
- Group - Interface in com.univocity.api.entity.html.builders
-
A group defines the boundaries where a given set of fields should be processed.
- GroupStart - Interface in com.univocity.api.entity.html.builders
-
Defines the first step in the creation of a
Group.
- id(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should contain an id attribute with a given value.
- id() - Method in interface com.univocity.api.entity.html.HtmlElement
-
Returns the id of this element or an empty String if the element does not have an id attribute.
- inputValues() - Method in interface com.univocity.api.entity.html.HtmlElement
-
Runs through the hierarchy of this element and collects the values of any input elements, including select lists, radio buttons and checkboxes.
- inputValuesById() - Method in interface com.univocity.api.entity.html.HtmlElement
-
Runs through the hierarchy of this element and collects the values of any input elements, including select lists, radio buttons and checkboxes.
- InputValueSwitch - Class in com.univocity.api.entity.html.processor
-
A concrete implementation of
RowProcessorSwitch that allows switching among different implementations of
RowProcessor based on values found on the rows parsed from the input.
- InputValueSwitch() - Constructor for class com.univocity.api.entity.html.processor.InputValueSwitch
-
Creates a switch that will analyze the first column of rows found in the input to determine which
RowProcessor to use for each parsed row
- InputValueSwitch(int) - Constructor for class com.univocity.api.entity.html.processor.InputValueSwitch
-
Creates a switch that will analyze a column of rows parsed from the input to determine which
RowProcessor to use.
- InputValueSwitch(String) - Constructor for class com.univocity.api.entity.html.processor.InputValueSwitch
-
Creates a switch that will analyze a column in rows parsed from the input to determine which
RowProcessor to use.
- isComment() - Method in interface com.univocity.api.entity.html.HtmlElement
-
Returns true if this HtmlElement consists of comments, i.e.
- isData() - Method in interface com.univocity.api.entity.html.HtmlElement
-
Returns true if this HtmlElement consists of data, i.e.
- isDownloadBlacklistingEnabled() - Method in class com.univocity.api.entity.html.FetchOptions
-
Indicates whether URLs of resources that resulted in a download failure (such as a 404) should be blacklisted while the parser is running, so no further attempts to access the same URL will be made.
- isOverwriteSharedResources() - Method in class com.univocity.api.entity.html.FetchOptions
-
Returns a flag indicating whether resources that have been downloaded and are shared among multiple pages should be overwritten during a new fetch resources operation.
- isText() - Method in interface com.univocity.api.entity.html.HtmlElement
-
Returns true if this HtmlElement consists solely of text and false otherwise.
- MasterDetailListProcessor - Class in com.univocity.api.entity.html.processor
-
A convenience
MasterDetailProcessor implementation for storing all
MasterDetailRecord generated form the parsed input into a list.
- MasterDetailListProcessor(RowPlacement, AbstractObjectListProcessor) - Constructor for class com.univocity.api.entity.html.processor.MasterDetailListProcessor
-
Creates a MasterDetailListProcessor
- MasterDetailListProcessor(AbstractObjectListProcessor) - Constructor for class com.univocity.api.entity.html.processor.MasterDetailListProcessor
-
Creates a MasterDetailListProcessor
- MasterDetailProcessor - Class in com.univocity.api.entity.html.processor
-
A
RowProcessor implementation for associating rows extracted from any implementation of
HtmlParser into
MasterDetailRecord instances.
- MasterDetailProcessor(RowPlacement, ObjectRowListProcessor) - Constructor for class com.univocity.api.entity.html.processor.MasterDetailProcessor
-
Creates a MasterDetailProcessor
- MasterDetailProcessor(ObjectRowListProcessor) - Constructor for class com.univocity.api.entity.html.processor.MasterDetailProcessor
-
Creates a MasterDetailProcessor.
- match(String) - Method in interface com.univocity.api.entity.html.builders.ElementFilterStart
-
Matches a given tag name at any distance from the current element.
- match(String, int) - Method in interface com.univocity.api.entity.html.builders.ElementFilterStart
-
Matches a given tag name and its occurrence index among neighboring nodes within the same parent.
- match(HtmlElementMatcher) - Method in interface com.univocity.api.entity.html.builders.ElementFilterStart
-
Specifies what element the parser must match based on the return value supplied by the given
HtmlElementMatcher.
- match(HtmlElement, HtmlElement) - Method in interface com.univocity.api.entity.html.HtmlElementMatcher
-
Used to determine if the currentElement should be matched by the parser.
- matchCurrent() - Method in interface com.univocity.api.entity.html.builders.ElementFilterStart
-
Matches the current node defined in the path.
- matchedData() - Method in interface com.univocity.api.entity.html.HtmlParsingContext
-
Returns a Map of matched data where the value is the field name and the value is the data that was matched.
- Matcher - Annotation Type in com.univocity.api.entity.html.builders.annotations
-
Basic annotation used internally to classify methods of the public API based on their purpose.
- Matcher.Type - Enum in com.univocity.api.entity.html.builders.annotations
-
The general type of matching algorithm associated with the method.
- matchFirst(String) - Method in interface com.univocity.api.entity.html.builders.ElementFilterStart
-
Matches the first occurrence of the given tag name among neighboring nodes within the same parent.
- matchLast(String) - Method in interface com.univocity.api.entity.html.builders.ElementFilterStart
-
Matches the last occurrence of the given tag name among neighboring nodes within the same parent.
- matchNext(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Matches an element that must immediately follow the previously matched element, i.e.
- MultiBeanListProcessor - Class in com.univocity.api.entity.html.processor
-
A
RowProcessor implementation for converting rows extracted from the
HtmlParser into java objects, storing them into lists.
- MultiBeanListProcessor(Class...) - Constructor for class com.univocity.api.entity.html.processor.MultiBeanListProcessor
-
Creates a processor for java beans of multiple types
- MultiBeanProcessor - Class in com.univocity.api.entity.html.processor
-
A
RowProcessor implementation for converting rows extracted from any implementation of
HtmlParser into java objects.
- MultiBeanProcessor(Class...) - Constructor for class com.univocity.api.entity.html.processor.MultiBeanProcessor
-
Creates a processor for java beans of multiple types
- MultiBeanRowProcessor - Class in com.univocity.api.entity.html.processor
-
- MultiBeanRowProcessor(Class...) - Constructor for class com.univocity.api.entity.html.processor.MultiBeanRowProcessor
-
Creates a processor for java beans of multiple types
- pageRoot() - Method in interface com.univocity.api.entity.html.HtmlParsingContext
-
Returns the root element of the HTML tree being processed by the parser.
- PaginationGroup - Interface in com.univocity.api.entity.html.builders
-
- PaginationGroupStart - Interface in com.univocity.api.entity.html.builders
-
- PaginationParams - Interface in com.univocity.api.entity.html.builders
-
Methods to enable the specification of internal fields of the
HtmlPaginator.
- PaginationPath - Interface in com.univocity.api.entity.html.builders
-
- PaginationPathStart - Interface in com.univocity.api.entity.html.builders
-
- parent() - Method in interface com.univocity.api.entity.html.HtmlElement
-
Returns the parent of this Element.
- parentCssFile() - Method in interface com.univocity.api.entity.html.DownloadContext
-
Returns the CSS file that is going to be updated after the fetch resources operation is complete.
- parentDir() - Method in interface com.univocity.api.entity.html.DownloadContext
-
- parentHtmlFile() - Method in interface com.univocity.api.entity.html.DownloadContext
-
Returns the HTML file that is going to be updated/generated after the fetch resources operation is complete.
- parentOf(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should be the parent of a given element.
- parse(ReaderProvider) - Method in class com.univocity.api.entity.html.HtmlParser
-
Given an input, made available from a
ReaderProvider, parses all records of all entities defined in the
HtmlEntityList of the
HtmlParserSettings object provided in the constructor of this class, submitting them to the
Processor implementation associated with each entity (through
EntitySettings.setProcessor(Processor).
- parse(FileProvider) - Method in class com.univocity.api.entity.html.HtmlParser
-
Given an input, made available from a
FileProvider, parses all records of all entities defined in the
HtmlEntityList of the
HtmlParserSettings object provided in the constructor of this class, submitting them to the
Processor implementation associated with each entity (through
EntitySettings.setProcessor(Processor).
- parse(File) - Method in class com.univocity.api.entity.html.HtmlParser
-
Given a
File, parses all records of all entities defined in the
HtmlEntityList of the
HtmlParserSettings object provided in the constructor of this class, submitting them to the
Processor implementation associated with each entity (through
EntitySettings.setProcessor(Processor).
- parse(File, Charset) - Method in class com.univocity.api.entity.html.HtmlParser
-
Given a
File, parses all records of all entities defined in the
HtmlEntityList of the
HtmlParserSettings object provided in the constructor of this class, submitting them to the
Processor implementation associated with each entity (through
EntitySettings.setProcessor(Processor).
- parse(File, String) - Method in class com.univocity.api.entity.html.HtmlParser
-
Given a
File, parses all records of all entities defined in the
HtmlEntityList of the
HtmlParserSettings object provided in the constructor of this class, submitting them to the
Processor implementation associated with each entity (through
EntitySettings.setProcessor(Processor).
- parse(Reader) - Method in class com.univocity.api.entity.html.HtmlParser
-
Given a
Reader, parses all records of all entities defined in the
HtmlEntityList of the
HtmlParserSettings object provided in the constructor of this class, submitting them to the
Processor implementation associated with each entity (through
EntitySettings.setProcessor(Processor).
- parse(InputStream) - Method in class com.univocity.api.entity.html.HtmlParser
-
Given an
InputStream, parses all records of all entities defined in the
HtmlEntityList of the
HtmlParserSettings object provided in the constructor of this class, submitting them to the
Processor implementation associated with each entity (through
EntitySettings.setProcessor(Processor).
- parse(InputStream, Charset) - Method in class com.univocity.api.entity.html.HtmlParser
-
Given an
InputStream, parses all records of all entities defined in the
HtmlEntityList of the
HtmlParserSettings object provided in the constructor of this class, submitting them to the
Processor implementation associated with each entity (through
EntitySettings.setProcessor(Processor).
- parse(InputStream, String) - Method in class com.univocity.api.entity.html.HtmlParser
-
Given an
InputStream, parses all records of all entities defined in the
HtmlEntityList of the
HtmlParserSettings object provided in the constructor of this class, submitting them to the
Processor implementation associated with each entity (through
EntitySettings.setProcessor(Processor).
- parse(HtmlElement) - Method in class com.univocity.api.entity.html.HtmlParser
-
Given a
HtmlElement, parses all records of all entities defined in the
EntityList of this parser, submitting them to the
Processor implementation associated with each entity (through
EntitySettings.setProcessor(Processor).
- parse(HtmlElement) - Method in interface com.univocity.api.entity.html.HtmlParserInterface
-
Given a
HtmlElement, parses all records of all entities defined in the
EntityList of this parser, and returns them in a map.
- parseTree(ReaderProvider) - Static method in class com.univocity.api.entity.html.HtmlParser
-
Generates a DOM tree from the input made available by a ReaderProvider.
- parseTree(FileProvider) - Static method in class com.univocity.api.entity.html.HtmlParser
-
Generates a DOM tree from the input made available by a FileProvider.
- parseTree(Reader) - Static method in class com.univocity.api.entity.html.HtmlParser
-
Generates a DOM tree from the input made available by a
Reader.
- parseTree(InputStream) - Static method in class com.univocity.api.entity.html.HtmlParser
-
Generates a DOM tree from the input made available by a
InputStream.
- parseTree(InputStream, Charset) - Static method in class com.univocity.api.entity.html.HtmlParser
-
Generates a DOM tree from the input made available by a
InputStream.
- parseTree(InputStream, String) - Static method in class com.univocity.api.entity.html.HtmlParser
-
Generates a DOM tree from the input made available by a
InputStream.
- parseTree(File) - Static method in class com.univocity.api.entity.html.HtmlParser
-
Generates a DOM tree from the input made available by a
File.
- parseTree(File, Charset) - Static method in class com.univocity.api.entity.html.HtmlParser
-
Generates a DOM tree from the input made available by a
File.
- parseTree(File, String) - Static method in class com.univocity.api.entity.html.HtmlParser
-
Generates a DOM tree from the input made available by a
File.
- parseTree(ReaderProvider) - Method in interface com.univocity.api.entity.html.HtmlTreeParser
-
Generates a HTML tree from the input made available by a ReaderProvider.
- parseTree(FileProvider) - Method in interface com.univocity.api.entity.html.HtmlTreeParser
-
Generates a HTML tree from the input made available by a FileProvider.
- parseTree(Reader) - Method in interface com.univocity.api.entity.html.HtmlTreeParser
-
Generates a HTML tree from the input made available by a
Reader.
- parseTree(InputStream) - Method in interface com.univocity.api.entity.html.HtmlTreeParser
-
Generates a HTML tree from the input made available by a
InputStream.
- parseTree(InputStream, Charset) - Method in interface com.univocity.api.entity.html.HtmlTreeParser
-
Generates a HTML tree from the input made available by a
InputStream.
- parseTree(InputStream, String) - Method in interface com.univocity.api.entity.html.HtmlTreeParser
-
Generates a HTML tree from the input made available by a
InputStream.
- parseTree(File) - Method in interface com.univocity.api.entity.html.HtmlTreeParser
-
Generates a HTML tree from the input made available by a
File.
- parseTree(File, Charset) - Method in interface com.univocity.api.entity.html.HtmlTreeParser
-
Generates a HTML tree from the input made available by a
File.
- parseTree(File, String) - Method in interface com.univocity.api.entity.html.HtmlTreeParser
-
Generates a HTML tree from the input made available by a
File.
- parsingEnded(HtmlParsingContext) - Method in class com.univocity.api.entity.html.HtmlParserListener
-
A method that runs when the parsing process has ended.
- parsingStarted(HtmlParsingContext) - Method in class com.univocity.api.entity.html.HtmlParserListener
-
A method that runs when the
HtmlParser begins parsing a web page.
- PartialGroup - Interface in com.univocity.api.entity.html.builders
-
Allows further specification of exactly which element a
Group starts at, as well as where the group will end.
- PartialPaginationGroup - Interface in com.univocity.api.entity.html.builders
-
A class that allows further specification of exactly which element the
PartialPaginationGroup starts at, as well as where the group will end.
- PartialPath - Interface in com.univocity.api.entity.html.builders
-
- PartialPathStart - Interface in com.univocity.api.entity.html.builders
-
- PathStart - Interface in com.univocity.api.entity.html.builders
-
- precededBy(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should have a given element placed before it, at any distance.
- precededBy(String, int) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should have a given element at a given distance before it.
- precededByText(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element must have a given text placed in an element after it.
- precededImmediatelyBy(String) - Method in interface com.univocity.api.entity.html.builders.BasicElementFilter
-
Establishes that the matched HTML element should have a given element placed directly before it.
- previousSibling() - Method in interface com.univocity.api.entity.html.HtmlElement
-
Returns the HtmlElement that is located just before this element.
- processEnded(HtmlParsingContext) - Method in interface com.univocity.api.entity.html.processor.RowProcessor
-
This method will by invoked by the parser once, after the parsing process stopped and all resources were closed.
- processStarted(HtmlParsingContext) - Method in interface com.univocity.api.entity.html.processor.RowProcessor
-
This method will by invoked by the parser once, when it is ready to start processing the input.
- select(String) - Method in interface com.univocity.api.entity.html.builders.ElementFilterStart
-
Selects what HTML element the parser must match using a CSS query.
- setBaseUri(String) - Method in interface com.univocity.api.entity.html.DownloadContext
-
Modifies the current base URI associated with the document whose resources are being fetched.
- setBaseUri(String) - Method in class com.univocity.api.entity.html.FetchOptions
-
Modifies the current base URI associated with the document whose resources are being fetched.
- setCurrentPage() - Method in interface com.univocity.api.entity.html.builders.PaginationParams
-
Creates a new field for the current page and returns a
PathStart which can be used to define the path to the ‘current page’ element.
- setCurrentPage() - Method in class com.univocity.api.entity.html.HtmlPaginator
-
Creates a new field for the current page and returns a
PathStart which can be used to define the path to the ‘current page’ element.
- setCurrentPageNumber() - Method in interface com.univocity.api.entity.html.builders.PaginationParams
-
Creates a new field for the current page and returns a
PathStart which can be used to define the path to the ‘current page’ element as a number.
- setCurrentPageNumber() - Method in class com.univocity.api.entity.html.HtmlPaginator
-
Creates a new field for the current page and returns a
PathStart which can be used to define the path to the ‘current page’ element as a number.
- setDownloadBlacklistingEnabled(boolean) - Method in class com.univocity.api.entity.html.FetchOptions
-
Configures whether URLs of resources that resulted in a download failure (such as a 404) should be blacklisted while the parser is running, so no further attempts to access the same URL will be made.
- setDownloadHandler(DownloadHandler) - Method in class com.univocity.api.entity.html.FetchOptions
-
Defines a
DownloadHandler to manipulate the downloads performed by the fetch resources operation.
- setListener(HtmlParserListener) - Method in class com.univocity.api.entity.html.HtmlEntitySettings
-
- setNextPage() - Method in interface com.univocity.api.entity.html.builders.PaginationParams
-
Creates a new field for the next page and returns a
PathStart which can be used to define the path to the next page element.
- setNextPage() - Method in class com.univocity.api.entity.html.HtmlPaginator
-
Creates a new field for the next page and returns a
PathStart which can be used to define the path to the next page element.
- setNextPageNumber() - Method in interface com.univocity.api.entity.html.builders.PaginationParams
-
Creates a new field for the next page number and returns a
PathStart which can be used to define the path to the next page number element.
- setNextPageNumber() - Method in class com.univocity.api.entity.html.HtmlPaginator
-
Creates a new field for the next page number and returns a
PathStart which can be used to define the path to the next page number element.
- setOverwriteSharedResources(boolean) - Method in class com.univocity.api.entity.html.FetchOptions
-
Defines whether resources that have been downloaded and are shared among multiple pages should be overwritten during a new fetch resources operation.
- setPaginator(HtmlPaginator) - Method in class com.univocity.api.entity.html.HtmlParserSettings
-
Configures a
HtmlPaginator to handle multiple pages of remote content that needs to parsed.
- setParserThreadCount(int) - Method in class com.univocity.api.entity.html.HtmlParserSettings
-
Explicitly defines a maximum number of threads that should be used by the parser when processing data of multiple entities from the same HTML input.
- setRemoteInterval(long) - Method in class com.univocity.api.entity.html.FetchOptions
-
Defines the minimum interval of time to wait between each download request.
- setRequestParameter(String, String) - Method in interface com.univocity.api.entity.html.builders.PaginationGroup
-
Associates a constant value to a request parameter.
- setRequestParameter(String, String) - Method in class com.univocity.api.entity.html.HtmlPaginator
-
Associates a constant value to a request parameter.
- setRequestParameterData(String, Object) - Method in class com.univocity.api.entity.html.HtmlPaginator
-
Defines a request parameter name and data value to be used when requesting the next page.
- setSharedResourceDir(String) - Method in class com.univocity.api.entity.html.FetchOptions
-
Defines the shared resource directory used to store files referenced by one or more HTML pages and CSS files.
- setSharedResourceDir(File) - Method in class com.univocity.api.entity.html.FetchOptions
-
Defines the shared resource directory used to store files referenced by one or more HTML pages and CSS files.
- setTargetFile(File) - Method in interface com.univocity.api.entity.html.DownloadContext
-
Changes the download destination to a new location.
- skipDownload() - Method in interface com.univocity.api.entity.html.DownloadContext
-
Skips this download and moves on to the next.
- sourceElement() - Method in interface com.univocity.api.entity.html.DownloadContext
-
Returns the specific
HtmlElement of the HTML that has a reference to the resource being downloaded.
- startAt(String) - Method in interface com.univocity.api.entity.html.builders.GroupStart
-
Specifies where on the HTML the group will start.
- startAt(String) - Method in interface com.univocity.api.entity.html.builders.PaginationGroupStart
-
Specifies where on the HTML that the group will start.
- stopAllDownloads() - Method in interface com.univocity.api.entity.html.DownloadContext
-
Skips this download and stops any active downloads, finalizing the fetch operation