public final class HtmlParser extends Object implements HtmlParserInterface
A very fast HTML parser.
HtmlParserSettings,
ReaderProvider,
Record,
HtmlEntitySettings| Constructor and Description |
|---|
HtmlParser(HtmlEntityList entityList)
Creates a new HtmlParser with the entity configuration provided by a
HtmlEntityList. |
| Modifier and Type | Method and Description |
|---|---|
HtmlPaginationContext |
getPaginationContext()
Returns the
HtmlPaginationContext object with information collected for the configured HtmlPaginator, if any. |
com.univocity.parsers.common.Results<HtmlParserResult> |
parse(File file)
Given a
File, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). |
com.univocity.parsers.common.Results<HtmlParserResult> |
parse(File file,
Charset encoding)
Given a
File, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). |
com.univocity.parsers.common.Results<HtmlParserResult> |
parse(com.univocity.api.io.FileProvider fileProvider)
Given an input, made available from a
FileProvider, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). |
com.univocity.parsers.common.Results<HtmlParserResult> |
parse(File file,
String encoding)
Given a
File, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). |
com.univocity.parsers.common.Results<HtmlParserResult> |
parse(HtmlElement htmlTree)
Given a
HtmlElement, parses all records of all entities defined in the EntityList of this parser, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). |
com.univocity.parsers.common.Results<HtmlParserResult> |
parse(InputStream inputStream)
Given an
InputStream, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). |
com.univocity.parsers.common.Results<HtmlParserResult> |
parse(InputStream inputStream,
Charset encoding)
Given an
InputStream, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). |
com.univocity.parsers.common.Results<HtmlParserResult> |
parse(InputStream inputStream,
String encoding)
Given an
InputStream, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). |
com.univocity.parsers.common.Results<HtmlParserResult> |
parse(Reader reader)
Given a
Reader, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). |
com.univocity.parsers.common.Results<HtmlParserResult> |
parse(com.univocity.api.io.ReaderProvider readerProvider)
Given an input, made available from a
ReaderProvider, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). |
static HtmlElement |
parseTree(File file)
Generates a DOM tree from the input made available by a
File. |
static HtmlElement |
parseTree(File file,
Charset encoding)
Generates a DOM tree from the input made available by a
File. |
static HtmlElement |
parseTree(com.univocity.api.io.FileProvider fileProvider)
Generates a DOM tree from the input made available by a
FileProvider. |
static HtmlElement |
parseTree(File file,
String encoding)
Generates a DOM tree from the input made available by a
File. |
static HtmlElement |
parseTree(InputStream inputStream)
Generates a DOM tree from the input made available by a
InputStream. |
static HtmlElement |
parseTree(InputStream inputStream,
Charset encoding)
Generates a DOM tree from the input made available by a
InputStream. |
static HtmlElement |
parseTree(InputStream inputStream,
String encoding)
Generates a DOM tree from the input made available by a
InputStream. |
static HtmlElement |
parseTree(Reader reader)
Generates a DOM tree from the input made available by a
Reader. |
static HtmlElement |
parseTree(com.univocity.api.io.ReaderProvider readerProvider)
Generates a DOM tree from the input made available by a
ReaderProvider. |
public HtmlParser(HtmlEntityList entityList)
Creates a new HtmlParser with the entity configuration provided by a HtmlEntityList. The HtmlParser gets all configuration from this list and from HtmlEntityList.getParserSettings().
entityList - The list of entities to be parsed by the HtmlParser, and their configurationpublic final com.univocity.parsers.common.Results<HtmlParserResult> parse(com.univocity.api.io.ReaderProvider readerProvider)
Given an input, made available from a ReaderProvider, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). The Processor implementation will handle the rows as they come, in its Processor.rowProcessed(String[], Context) method which can accumulate/transform the rows on demand. The behavior and way to collect results is determined by the Processor implementation used.
parse in interface com.univocity.parsers.common.EntityParserInterface<HtmlRecord,HtmlParsingContext,HtmlParserResult>readerProvider - an input provider with content to be parsedpublic final com.univocity.parsers.common.Results<HtmlParserResult> parse(com.univocity.api.io.FileProvider fileProvider)
Given an input, made available from a FileProvider, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). The Processor implementation will handle the rows as they come, in its Processor.rowProcessed(String[], Context) method which can accumulate/transform the rows on demand. The behavior and way to collect results is determined by the Processor implementation used.
parse in interface com.univocity.parsers.common.EntityParserInterface<HtmlRecord,HtmlParsingContext,HtmlParserResult>fileProvider - the input file with content to be parsedpublic final com.univocity.parsers.common.Results<HtmlParserResult> parse(File file)
Given a File, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). The Processor implementation will handle the rows as they come, in its Processor.rowProcessed(String[], Context) method which can accumulate/transform the rows on demand. The behavior and way to collect results is determined by the Processor implementation used.
parse in interface com.univocity.parsers.common.EntityParserInterface<HtmlRecord,HtmlParsingContext,HtmlParserResult>file - the input with content to be parsedpublic final com.univocity.parsers.common.Results<HtmlParserResult> parse(File file, Charset encoding)
Given a File, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). The Processor implementation will handle the rows as they come, in its Processor.rowProcessed(String[], Context) method which can accumulate/transform the rows on demand. The behavior and way to collect results is determined by the Processor implementation used.
parse in interface com.univocity.parsers.common.EntityParserInterface<HtmlRecord,HtmlParsingContext,HtmlParserResult>file - the input with content to be parsedencoding - the encoding to be used when reading text from the given input.public final com.univocity.parsers.common.Results<HtmlParserResult> parse(File file, String encoding)
Given a File, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). The Processor implementation will handle the rows as they come, in its Processor.rowProcessed(String[], Context) method which can accumulate/transform the rows on demand. The behavior and way to collect results is determined by the Processor implementation used.
parse in interface com.univocity.parsers.common.EntityParserInterface<HtmlRecord,HtmlParsingContext,HtmlParserResult>file - the input with content to be parsedencoding - the encoding to be used when reading text from the given input.public final com.univocity.parsers.common.Results<HtmlParserResult> parse(Reader reader)
Given a Reader, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). The Processor implementation will handle the rows as they come, in its Processor.rowProcessed(String[], Context) method which can accumulate/transform the rows on demand. The behavior and way to collect results is determined by the Processor implementation used.
parse in interface com.univocity.parsers.common.EntityParserInterface<HtmlRecord,HtmlParsingContext,HtmlParserResult>reader - the input with content to be parsedpublic final com.univocity.parsers.common.Results<HtmlParserResult> parse(InputStream inputStream)
Given an InputStream, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). The Processor implementation will handle the rows as they come, in its Processor.rowProcessed(String[], Context) method which can accumulate/transform the rows on demand. The behavior and way to collect results is determined by the Processor implementation used.
parse in interface com.univocity.parsers.common.EntityParserInterface<HtmlRecord,HtmlParsingContext,HtmlParserResult>inputStream - the input with content to be parsedpublic final com.univocity.parsers.common.Results<HtmlParserResult> parse(InputStream inputStream, Charset encoding)
Given an InputStream, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). The Processor implementation will handle the rows as they come, in its Processor.rowProcessed(String[], Context) method which can accumulate/transform the rows on demand. The behavior and way to collect results is determined by the Processor implementation used.
parse in interface com.univocity.parsers.common.EntityParserInterface<HtmlRecord,HtmlParsingContext,HtmlParserResult>inputStream - the input with content to be parsedencoding - the encoding to be used when reading text from the given input.public final com.univocity.parsers.common.Results<HtmlParserResult> parse(InputStream inputStream, String encoding)
Given an InputStream, parses all records of all entities defined in the HtmlEntityList of the HtmlParserSettings object provided in the constructor of this class, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). The Processor implementation will handle the rows as they come, in its Processor.rowProcessed(String[], Context) method which can accumulate/transform the rows on demand. The behavior and way to collect results is determined by the Processor implementation used.
parse in interface com.univocity.parsers.common.EntityParserInterface<HtmlRecord,HtmlParsingContext,HtmlParserResult>inputStream - the input with content to be parsedencoding - the encoding to be used when reading text from the given input.public final com.univocity.parsers.common.Results<HtmlParserResult> parse(HtmlElement htmlTree)
Given a HtmlElement, parses all records of all entities defined in the EntityList of this parser, submitting them to the Processor implementation associated with each entity (through EntitySettings.setProcessor(Processor). The Processor implementation will handle the rows as they come, in its Processor.rowProcessed(String[], Context) method which can accumulate/transform the rows on demand. The behavior and way to collect results is determined by the Processor implementation used.
parse in interface HtmlParserInterfacehtmlTree - the HTML tree with content to be parsedpublic HtmlPaginationContext getPaginationContext()
Returns the HtmlPaginationContext object with information collected for the configured HtmlPaginator, if any. The information returned comes from the last input processed, and might have been modified by a NextInputHandler if it has been associated with the HtmlPaginator using Paginator.setPaginationHandler(NextInputHandler).
getPaginationContext in interface HtmlParserInterfacegetPaginationContext in interface com.univocity.parsers.remote.RemoteEntityParserInterface<HtmlRecord,HtmlParsingContext,HtmlParserResult>PaginationContext with pagination information captured after parsing a given input.public static final HtmlElement parseTree(com.univocity.api.io.ReaderProvider readerProvider)
Generates a DOM tree from the input made available by a ReaderProvider. Users can navigate the HTML tree and use CSS selectors against the HtmlElements returned to target any specific HTML node.
readerProvider - an input provider with content to be parsedHtmlElement of the entire HTML document.public static final HtmlElement parseTree(com.univocity.api.io.FileProvider fileProvider)
Generates a DOM tree from the input made available by a FileProvider. Users can navigate the HTML tree and use CSS selectors against the HtmlElements returned to target any specific HTML node.
fileProvider - the input file with content to be parsedHtmlElement of the entire HTML document.public static final HtmlElement parseTree(Reader reader)
Generates a DOM tree from the input made available by a Reader. Users can navigate the HTML tree and use CSS selectors against the HtmlElements returned to target any specific HTML node.
reader - the input with content to be parsedHtmlElement of the entire HTML document.public static final HtmlElement parseTree(InputStream inputStream)
Generates a DOM tree from the input made available by a InputStream. Users can navigate the HTML tree and use CSS selectors against the HtmlElements returned to target any specific HTML node.
inputStream - the input with content to be parsedHtmlElement of the entire HTML document.public static final HtmlElement parseTree(InputStream inputStream, Charset encoding)
Generates a DOM tree from the input made available by a InputStream. Users can navigate the HTML tree and use CSS selectors against the HtmlElements returned to target any specific HTML node.
inputStream - the input with content to be parsedencoding - the encoding to be used when reading text from the given input.HtmlElement of the entire HTML document.public static final HtmlElement parseTree(InputStream inputStream, String encoding)
Generates a DOM tree from the input made available by a InputStream. Users can navigate the HTML tree and use CSS selectors against the HtmlElements returned to target any specific HTML node.
inputStream - the input with content to be parsedencoding - the encoding to be used when reading text from the given input.HtmlElement of the entire HTML document.public static final HtmlElement parseTree(File file)
Generates a DOM tree from the input made available by a File. Users can navigate the HTML tree and use CSS selectors against the HtmlElements returned to target any specific HTML node.
file - the input with content to be parsedHtmlElement of the entire HTML document.public static final HtmlElement parseTree(File file, Charset encoding)
Generates a DOM tree from the input made available by a File. Users can navigate the HTML tree and use CSS selectors against the HtmlElements returned to target any specific HTML node.
file - the input with content to be parsedencoding - the encoding to be used when reading text from the given input.HtmlElement of the entire HTML document.public static final HtmlElement parseTree(File file, String encoding)
Generates a DOM tree from the input made available by a File. Users can navigate the HTML tree and use CSS selectors against the HtmlElements returned to target any specific HTML node.
file - the input with content to be parsedencoding - the encoding to be used when reading text from the given input.HtmlElement of the entire HTML document.Copyright © 2018 uniVocity Software Pty Ltd. All rights reserved.