public class FetchOptions extends Object implements Cloneable
Configuration class for use in the HtmlElement.fetchResources(com.univocity.api.io.FileProvider, com.univocity.api.entity.html.FetchOptions) methods Setters return this instance to enable method chaining during initialization.
| Constructor and Description |
|---|
FetchOptions()
Default constructor for FetchOptions Defaults to not flattening directory and accepting any String
|
| Modifier and Type | Method and Description |
|---|---|
protected FetchOptions |
clone() |
void |
flattenDirectories(boolean flatten)
Option to flatten the path section of a fetched resource into the new filename.
|
boolean |
flattenDirectoryStructure()
Whether or not the resource filenames should be ‘flattened’.
|
String |
getBaseUri()
The current base URI associated with the document whose resources are being fetched.
|
DownloadHandler |
getDownloadHandler()
Returns the
DownloadHandler callback to be used by the fetch resources operation. |
long |
getRemoteInterval()
Returns the minimum interval of time to wait between each download request.
|
com.univocity.api.io.FileProvider |
getSharedResourceDir()
Returns the shared resource directory used to store files referenced by one or more HTML pages and CSS files.
|
boolean |
isDownloadBlacklistingEnabled()
Indicates whether URLs of resources that resulted in a download failure (such as a 404) should be blacklisted while the parser is running, so no further attempts to access the same URL will be made.
|
boolean |
isOverwriteSharedResources()
Returns a flag indicating whether resources that have been downloaded and are shared among multiple pages should be overwritten during a new fetch resources operation.
|
void |
setBaseUri(String baseUri)
Modifies the current base URI associated with the document whose resources are being fetched.
|
void |
setDownloadBlacklistingEnabled(boolean downloadBlacklistingEnabled)
Configures whether URLs of resources that resulted in a download failure (such as a 404) should be blacklisted while the parser is running, so no further attempts to access the same URL will be made.
|
void |
setDownloadHandler(DownloadHandler downloadHandler)
Defines a
DownloadHandler to manipulate the downloads performed by the fetch resources operation. |
void |
setOverwriteSharedResources(boolean overwriteSharedResources)
Defines whether resources that have been downloaded and are shared among multiple pages should be overwritten during a new fetch resources operation.
|
void |
setRemoteInterval(long remoteInterval)
Defines the minimum interval of time to wait between each download request.
|
void |
setSharedResourceDir(File sharedResourceDir)
Defines the shared resource directory used to store files referenced by one or more HTML pages and CSS files.
|
void |
setSharedResourceDir(String sharedResourceDir)
Defines the shared resource directory used to store files referenced by one or more HTML pages and CSS files.
|
public FetchOptions()
Default constructor for FetchOptions Defaults to not flattening directory and accepting any String
public String getBaseUri()
The current base URI associated with the document whose resources are being fetched. Used to “build” the full URL used to download a given resource. For example, if a link such as <a href="/Images/Icons/garage.svg"></a> is being processed, and the base URI is set to http://www.univocity.com, the download URL will be http://www.univocity.com/Images/Icons/garage.svg
Stringpublic void setBaseUri(String baseUri)
Modifies the current base URI associated with the document whose resources are being fetched. Used to “build” the full URL used to download a given resource. For example, if a link such as <a href="/Images/Icons/garage.svg"></a> is being processed, and the base URI is set to http://www.univocity.com, the download URL will be http://www.univocity.com/Images/Icons/garage.svg
baseUri - base URI to use for generating absolute download URL paths.public void flattenDirectories(boolean flatten)
Option to flatten the path section of a fetched resource into the new filename.
A file with the relative path such as ./path/to/resource/image.png would normally be saved as a file named image.png in the ./path/to/resource/ directory.
When flattened it will instead be saved as path_to_resource_image.png in the . directory.
flatten - whether to flatten the path of a resource into the saved name.public boolean flattenDirectoryStructure()
Whether or not the resource filenames should be ‘flattened’. That is to say have the directories condensed into the filename so all resource files are in the same directory but all uniquely named. e.g.
A file with the relative path such as ./path/to/resource/image.png would normally be saved as a file named image.png in the ./path/to/resource/ directory.
When flattened it will instead be saved as path_to_resource_image.png in the . directory.
public DownloadHandler getDownloadHandler()
Returns the DownloadHandler callback to be used by the fetch resources operation.
public void setDownloadHandler(DownloadHandler downloadHandler)
Defines a DownloadHandler to manipulate the downloads performed by the fetch resources operation.
downloadHandler - the download handler to usepublic boolean isOverwriteSharedResources()
Returns a flag indicating whether resources that have been downloaded and are shared among multiple pages should be overwritten during a new fetch resources operation.
public void setOverwriteSharedResources(boolean overwriteSharedResources)
Defines whether resources that have been downloaded and are shared among multiple pages should be overwritten during a new fetch resources operation.
overwriteSharedResources - flag indicating that local files that already exist should be overwrittenpublic com.univocity.api.io.FileProvider getSharedResourceDir()
Returns the shared resource directory used to store files referenced by one or more HTML pages and CSS files. Use it to prevent downloading the same images and CSS files over and over again for each HTML page you want to store.
If unspecified (i.e. null) a directory named after the HTML file concatenated with the _files the suffix will be created, and all resources used by that HTML will be stored in this directory - which emulates what most browsers do when their “File -> Save Page As…” action is executed.
public void setSharedResourceDir(String sharedResourceDir)
Defines the shared resource directory used to store files referenced by one or more HTML pages and CSS files. Use it to prevent downloading the same images and CSS files over and over again for each HTML page you want to store.
If unspecified (i.e. null) a directory named after the HTML file concatenated with the _files the suffix will be created, and all resources used by that HTML will be stored in this directory - which emulates what most browsers do when their “File -> Save Page As…” action is executed.
sharedResourceDir - the path to a shared resource directory to use. It can contain system variables enclosed within { and } (e.g. {user.home}/Downloads"). Subdirectories that don’t exist will be created if required.public void setSharedResourceDir(File sharedResourceDir)
Defines the shared resource directory used to store files referenced by one or more HTML pages and CSS files. Use it to prevent downloading the same images and CSS files over and over again for each HTML page you want to store.
If unspecified (i.e. null) a directory named after the HTML file concatenated with the _files the suffix will be created, and all resources used by that HTML will be stored in this directory - which emulates what most browsers do when their “File -> Save Page As…” action is executed.
sharedResourceDir - the path to a shared resource directory to use. Subdirectories that don’t exist will be created if required.public boolean isDownloadBlacklistingEnabled()
Indicates whether URLs of resources that resulted in a download failure (such as a 404) should be blacklisted while the parser is running, so no further attempts to access the same URL will be made. Enabled by default to improve speed when fetching resources of multiple pages, especially when link following is used.
public void setDownloadBlacklistingEnabled(boolean downloadBlacklistingEnabled)
Configures whether URLs of resources that resulted in a download failure (such as a 404) should be blacklisted while the parser is running, so no further attempts to access the same URL will be made. Enabled by default to improve speed when fetching resources of multiple pages, especially when link following is used.
downloadBlacklistingEnabled - flag indicating whether bad URLs should be blacklistedpublic final long getRemoteInterval()
Returns the minimum interval of time to wait between each download request. This is required to prevent submitting multiple requests to the same server at the same time.
Defaults to 5 ms<= 0 mean the internal RateLimiter is disabled.public final void setRemoteInterval(long remoteInterval)
Defines the minimum interval of time to wait between each download request. This is required to prevent submitting multiple requests to the same server at the same time.
Defaults to 5 msremoteInterval - minimum time (in milliseconds) to wait between download requests. Any value <= 0 will disable the internal RateLimiter.protected FetchOptions clone()
Copyright © 2018 uniVocity Software Pty Ltd. All rights reserved.