com.univocity.parsers.remote
Class RemoteParserSettings<S extends com.univocity.parsers.common.CommonParserSettings,L extends RemoteEntityList,C extends com.univocity.parsers.common.Context>

java.lang.Object
  extended by com.univocity.parsers.common.EntityParserSettings<S,L,C>
      extended by com.univocity.parsers.remote.RemoteParserSettings<S,L,C>
Type Parameters:
S - an internal configuration object that extends from CommonParserSettings, and is used to manage configuration of elements shared with univocity-parsers
L - the RemoteEntityList implementation supported by an EntityParserInterface.
C - the Context implementation which provides specific details about the the parsing process performed by this parser.
All Implemented Interfaces:
Cloneable

public abstract class RemoteParserSettings<S extends com.univocity.parsers.common.CommonParserSettings,L extends RemoteEntityList,C extends com.univocity.parsers.common.Context>
extends EntityParserSettings<S,L,C>

Base configuration class of a parser that can connect to a remote location, obtain data to parse and produce records for one or more entities. The settings available in a RemoteParserSettings configure the remote content access, the parsing process, and provide default configuration options that individual implementations of RemoteEntitySettings can override. Entities are managed from an RemoteEntityList implementation.

Author:
uniVocity Software Pty Ltd - dev@univocity.com
See Also:
DataTransfer, RemoteEntitySettings, RemoteEntityList, Paginator

Field Summary
protected  Boolean downloadBeforeParsingEnabled
           
protected  Paginator paginator
           
 
Fields inherited from class com.univocity.parsers.common.EntityParserSettings
entitiesToRead, entitiesToSkip, globalSettings
 
Constructor Summary
RemoteParserSettings()
          Creates a new configuration object for an implementation of EntityParserInterface, which will process an input to produce records for entities defined by a RemoteEntityList.
 
Method Summary
 void clearFileNameParameters()
          Clears all values from the filename pattern defined in setFileNamePattern(String)
protected  RemoteParserSettings<S,L,C> clone()
           
 String getBatchId()
          Returns the custom batch ID to be used in the file name pattern specified by getFileNamePattern().
abstract  String getDefaultFileExtension()
          Returns the default file extension to use when saving files to the directory specified by getDownloadContentDirectory(), in case the the file name pattern taken from getFileNamePattern() doesn't include a file extension.
 com.univocity.api.io.FileProvider getDownloadContentDirectory()
          Returns the directory where downloaded content should be stored
 com.univocity.api.statistics.DownloadListener getDownloadListener()
          Returns the DownloadListener associated with the parser and which will receive updates on the progress of downloads made by the parser.
 int getDownloadThreads()
          Sets the number of threads that will be used to download remote content (e.g.
 String getEmptyValue()
          Returns the value to be used when the content parsed for a field of some record evaluates to an empty String Defaults to null
 ExecutorService getExecutorService()
          Returns the ExecutorService to be used by the parser for managing the multiple threads that can be started.
 Object getFileNameParameter(String parameterName)
          Gets the value of a parameter in the filename pattern defined in setFileNamePattern(String).
 Set<String> getFileNameParameters()
           
 String getFileNamePattern()
          Gets the pattern that names of downloaded files should follow.
 Nesting getNesting()
          Returns the nesting strategy to apply to rows associated to a "parent" row, such as results parsed from a link accessed by a RemoteFollower.
 Paginator getPaginator()
          Configures a Paginator to handle multiple pages of remote content that needs to parsed.
 String getParseDate()
          Returns the formatted parse date to associate with any downloaded files for future re-parsing.
 long getRemoteInterval()
          Returns the minimum interval of time to wait between remote requests.
 Charset getTextEncoding()
          Returns the character set to use when writing text/html files downloaded by the parser Defaults to the system encoding if not provided
 void ignoreFollowingErrors(boolean ignoreLinkFollowingErrors)
          Configures the parser to ignore (or not) invalid, malformed or unavailable links when following urls to collect additional data associated to a current result.
 boolean isColumnReorderingEnabled()
          Identifies whether fields should be reordered when field selection methods of an entity's EntitySettings (such as EntitySettings.selectFields(String...)) are used.
 boolean isDownloadBeforeParsingEnabled()
          Verifies whether the parser will download the remote content before parsing it.
 boolean isDownloadEnabled()
          Flags whether remote downloads are enabled.
 boolean isDownloadOverwritingEnabled()
          Returns a flag indicating whether the parser will overwrite content already downloaded.
 boolean isIgnoreFollowingErrors()
          Returns a flag indicating whether the parser will ignore invalid, malformed or unavailable links when following urls to collect additional data associated to a current result.
protected abstract  Paginator newPaginator(RemoteParserSettings parserSettings)
          Creates an instance of a concrete implementation of Paginator
 void setBatchId(String batchId)
          Defines a custom batch ID to be used in the file name pattern specified by getFileNamePattern().
 void setColumnReorderingEnabled(boolean columnReorderingEnabled)
          Defines whether fields should be reordered when field selection methods of an entity's EntitySettings (such as EntitySettings.selectFields(String...)) are used.
 void setDownloadBeforeParsingEnabled(boolean downloadBeforeParsingEnabled)
          Instructs the parser to download the remote content before parsing it.
 void setDownloadContentDirectory(File directory)
          Configures the parser to store a local copy of the remote content in the filesystem.
 void setDownloadContentDirectory(String path)
          Configures the parser to store a local copy of the remote content in the filesystem.
 void setDownloadEnabled(boolean downloadEnabled)
          Enables/disables any remote download operation.
 void setDownloadListener(com.univocity.api.statistics.DownloadListener downloadListener)
          Associates a DataTransfer with the parser, which will receive updates on the progress of downloads made by the parser.
 void setDownloadOverwritingEnabled(boolean downloadOverwritingEnabled)
          Configures the parser to overwrite content already downloaded.
 void setDownloadThreads(int downloadThreads)
          Sets the number of threads that will be used to download remote content (e.g.
 void setEmptyValue(String emptyValue)
          Defines the value to be used when the content parsed for a field of some record evaluates to an empty String Defaults to null
 void setExecutorService(ExecutorService executorService)
          Assigns an ExecutorService to be parser, which will be used to manage the multiple threads that can be started.
 void setFileNameParameter(String parameterName, Object parameterValue)
          Sets the value of a parameter in the filename pattern defined in setFileNamePattern(String).
 void setFileNamePattern(String pattern)
          Sets the pattern that names of downloaded files should follow.
 void setNesting(Nesting nesting)
          Configures the nesting strategy to apply to rows associated to a "parent" row, such as results parsed from a link accessed by a RemoteFollower.
 void setPaginator(Paginator paginator)
          Configures a Paginator to handle multiple pages of remote content that needs to parsed.
 void setParseDate(Calendar parseDate)
          Defines a parse date to process historical files.
 void setParseDate(Date parseDate)
          Defines a parse Date to process historical files.
 void setParseDate(String parseDate)
          Defines a parse Date to process historical files.
 void setRemoteInterval(long remoteInterval)
          Defines the minimum interval of time to wait between remote requests.
 void setTextEncoding(Charset encoding)
          Defines the character set to use when writing text/html files downloaded by the parser By default the system encoding is used.
 void setTextEncoding(String charsetName)
          Defines the character set to use when writing text/html files downloaded by the parser By default the system encoding is used.
 
Methods inherited from class com.univocity.parsers.common.EntityParserSettings
addEntitiesToRead, addEntitiesToRead, addEntitiesToSkip, addEntitiesToSkip, createEmptyGlobalSettings, createGlobalSettings, getEntitiesToRead, getEntitiesToSkip, getErrorContentLength, getNullValue, getProcessorErrorHandler, getTrimLeadingWhitespaces, getTrimTrailingWhitespaces, setEntitiesToRead, setEntitiesToRead, setEntitiesToSkip, setEntitiesToSkip, setErrorContentLength, setNullValue, setProcessorErrorHandler, setTrimLeadingWhitespaces, setTrimTrailingWhitespaces, shouldRead, shouldSkip, trimValues
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

paginator

protected Paginator paginator

downloadBeforeParsingEnabled

protected Boolean downloadBeforeParsingEnabled
Constructor Detail

RemoteParserSettings

public RemoteParserSettings()
Creates a new configuration object for an implementation of EntityParserInterface, which will process an input to produce records for entities defined by a RemoteEntityList. The RemoteEntityList is used to manage RemoteEntitySettings for each entity whose records will be parsed.

Method Detail

setDownloadContentDirectory

public final void setDownloadContentDirectory(String path)
Configures the parser to store a local copy of the remote content in the filesystem. If the downloaded content is text, it will be stored using the system default encoding

Parameters:
path - the path to the target directory. It can contain system variables enclosed within { and } (e.g. {user.home}/Downloads"). Subdirectories that don't exist will be created if required.

setDownloadContentDirectory

public final void setDownloadContentDirectory(File directory)
Configures the parser to store a local copy of the remote content in the filesystem. If the downloaded content is text, it will be stored using the system default encoding

Parameters:
directory - the target directory. Subdirectories that don't exist will be created if required.

getDownloadContentDirectory

public final com.univocity.api.io.FileProvider getDownloadContentDirectory()
Returns the directory where downloaded content should be stored

Returns:
a FileProvider pointing to the configured download content directory

setFileNamePattern

public final void setFileNamePattern(String pattern)
Sets the pattern that names of downloaded files should follow. For example, setting the pattern as "/search/file{page}" will make pages stored in the search folder with the name "file1.html", "file2.html" etc. Note that the file extension will be automatically added if it is known. The following patterns are recognized: Defaults to file_{page}

Parameters:
pattern - the pattern used to generate file names for downloaded content.

getFileNamePattern

public final String getFileNamePattern()
Gets the pattern that names of downloaded files should follow. For example, setting the pattern as "/search/file{page}" will make pages stored in the search folder with the name "file1.html", "file2.html" etc. Note that the file extension will be automatically added if it is known. The following patterns are recognized: Defaults to file_{page}

Returns:
the pattern used to generate file names for downloaded content.

getTextEncoding

public final Charset getTextEncoding()
Returns the character set to use when writing text/html files downloaded by the parser Defaults to the system encoding if not provided

Returns:
the current text encoding.

setTextEncoding

public final void setTextEncoding(Charset encoding)
Defines the character set to use when writing text/html files downloaded by the parser By default the system encoding is used.

Parameters:
encoding - the encoding to use for writing downloaded files

setTextEncoding

public final void setTextEncoding(String charsetName)
Defines the character set to use when writing text/html files downloaded by the parser By default the system encoding is used.

Parameters:
charsetName - the name of the charset to use for writing downloaded files

setFileNameParameter

public final void setFileNameParameter(String parameterName,
                                       Object parameterValue)
Sets the value of a parameter in the filename pattern defined in setFileNamePattern(String).

Parameters:
parameterName - the name of the parameter
parameterValue - the value of the parameter

getFileNameParameter

public final Object getFileNameParameter(String parameterName)
Gets the value of a parameter in the filename pattern defined in setFileNamePattern(String).

Parameters:
parameterName - the name of the parameter to get
Returns:
the value of the parameter

getFileNameParameters

public final Set<String> getFileNameParameters()
Returns:
the set of parameter names in the filename pattern defined in setFileNamePattern(String).

clearFileNameParameters

public final void clearFileNameParameters()
Clears all values from the filename pattern defined in setFileNamePattern(String)


setPaginator

public void setPaginator(Paginator paginator)
Configures a Paginator to handle multiple pages of remote content that needs to parsed.

Parameters:
paginator - a Paginator to be associated with the current RemoteParserSettings

getPaginator

public Paginator getPaginator()
Configures a Paginator to handle multiple pages of remote content that needs to parsed.

Returns:
a Paginator associated with the current RemoteParserSettings

newPaginator

protected abstract Paginator newPaginator(RemoteParserSettings parserSettings)
Creates an instance of a concrete implementation of Paginator

Parameters:
parserSettings - the parser settings that should be used for the new paginator
Returns:
a new Paginator instance

getEmptyValue

public final String getEmptyValue()
Returns the value to be used when the content parsed for a field of some record evaluates to an empty String Defaults to null

Returns:
the value to be used instead of empty String (i.e. "") when the content of a field is empty.

setEmptyValue

public final void setEmptyValue(String emptyValue)
Defines the value to be used when the content parsed for a field of some record evaluates to an empty String Defaults to null

Parameters:
emptyValue - the value to be used instead of empty String (i.e. "") when the content of a field is empty.

isColumnReorderingEnabled

public final boolean isColumnReorderingEnabled()
Identifies whether fields should be reordered when field selection methods of an entity's EntitySettings (such as EntitySettings.selectFields(String...)) are used.

When enabled, each parsed record will contain values only for the selected columns. The values will be ordered according to the selection.

When disabled, each parsed record will contain values for all columns, in their original sequence. Fields which were not selected will contain null values, as defined in EntitySettings.getNullValue(). Defaults to true

Returns:
a flag indicating whether or not selected fields should be reordered

setColumnReorderingEnabled

public final void setColumnReorderingEnabled(boolean columnReorderingEnabled)
Defines whether fields should be reordered when field selection methods of an entity's EntitySettings (such as EntitySettings.selectFields(String...)) are used.

When enabled, each parsed record will contain values only for the selected columns. The values will be ordered according to the selection.

When disabled, each parsed record will contain values for all columns, in their original sequence. Fields which were not selected will contain null values, as defined in EntitySettings.getNullValue(). Defaults to true

Parameters:
columnReorderingEnabled - the flag indicating whether or not selected fields should be reordered

getDownloadListener

public com.univocity.api.statistics.DownloadListener getDownloadListener()
Returns the DownloadListener associated with the parser and which will receive updates on the progress of downloads made by the parser.

Returns:
the current listener that should receive notifications regarding the progress of downloads performed by the parser. If undefined, a NoopDataTransfer will be returned.

setDownloadListener

public void setDownloadListener(com.univocity.api.statistics.DownloadListener downloadListener)
Associates a DataTransfer with the parser, which will receive updates on the progress of downloads made by the parser.

Parameters:
downloadListener - the listener that should receive notifications regarding the progress of downloads performed by the parser.

getDefaultFileExtension

public abstract String getDefaultFileExtension()
Returns the default file extension to use when saving files to the directory specified by getDownloadContentDirectory(), in case the the file name pattern taken from getFileNamePattern() doesn't include a file extension.

Returns:
the default file extension to use when saving files

isDownloadOverwritingEnabled

public boolean isDownloadOverwritingEnabled()
Returns a flag indicating whether the parser will overwrite content already downloaded. If disabled, the parser will skip the download of contents already available in the filesystem, and use the content available locally. Defaults to true Has no effect if isDownloadEnabled() evaluates to false

Returns:
flag to indicate overwriting of downloaded content is enabled.

setDownloadOverwritingEnabled

public void setDownloadOverwritingEnabled(boolean downloadOverwritingEnabled)
Configures the parser to overwrite content already downloaded. If disabled, the parser will skip the download of contents already available in the filesystem, and use the content available locally. Defaults to true

Parameters:
downloadOverwritingEnabled - flag to enable or disable overwriting of downloaded content.

isDownloadBeforeParsingEnabled

public boolean isDownloadBeforeParsingEnabled()
Verifies whether the parser will download the remote content before parsing it. If a directory to download c7ontent has been set (with setDownloadContentDirectory(String), this method will always return true and the parser will download the remote content into the given directory. If no directory has been defined, the contents will be downloaded into a temporary directory. Defaults to false

Returns:
a flag indicating whether any remote content should be downloaded into a local file before being parsed.

setDownloadBeforeParsingEnabled

public void setDownloadBeforeParsingEnabled(boolean downloadBeforeParsingEnabled)
Instructs the parser to download the remote content before parsing it. If a directory to download content has been set (with setDownloadContentDirectory(String), this method has no effect and the parser will download the remote content into the given directory. If this flag is set to true and no directory has been defined, the contents will be downloaded into a temporary directory. Defaults to false

Parameters:
downloadBeforeParsingEnabled - flag enable the parser to download remote content into a local file before parsing it.

setDownloadThreads

public final void setDownloadThreads(int downloadThreads)
Sets the number of threads that will be used to download remote content (e.g. images) that is associated with the parsed input Defaults to 4

Parameters:
downloadThreads - the maximum number of threads to be used for downloading content

getDownloadThreads

public final int getDownloadThreads()
Sets the number of threads that will be used to download remote content (e.g. images) that is associated with the parsed input Defaults to 4

Returns:
the maximum number of threads to be used for downloading content

getNesting

public final Nesting getNesting()
Returns the nesting strategy to apply to rows associated to a "parent" row, such as results parsed from a link accessed by a RemoteFollower. Defaults to the parent entity's RemoteEntitySettings.getNesting() or if undefined, the getNesting() setting.

Returns:
the nesting strategy to use when processing results associated with a parent row.

setNesting

public final void setNesting(Nesting nesting)
Configures the nesting strategy to apply to rows associated to a "parent" row, such as results parsed from a link accessed by a RemoteFollower. Defaults to the parent entity's RemoteEntitySettings.getNesting() or if undefined, the getNesting() setting.

Parameters:
nesting - the nesting strategy to use when processing results associated with a parent row.

setExecutorService

public final void setExecutorService(ExecutorService executorService)
Assigns an ExecutorService to be parser, which will be used to manage the multiple threads that can be started. These threads are used to parse/download data from a given input and any remote resources associated with it. Defaults to: Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());

Parameters:
executorService - the executor service to be used by the parser for the creation of new threads.

getExecutorService

public final ExecutorService getExecutorService()
Returns the ExecutorService to be used by the parser for managing the multiple threads that can be started. These threads are used to parse/download data from a given input and any remote resources associated with it. Defaults to: Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());

Returns:
the executor service to be used by the parser for the creation of new threads.

ignoreFollowingErrors

public void ignoreFollowingErrors(boolean ignoreLinkFollowingErrors)
Configures the parser to ignore (or not) invalid, malformed or unavailable links when following urls to collect additional data associated to a current result. If set to false, the parser will throw an Exception when attempting to follow a link that is invalid, malformed or unavailable. If true, the parser will simply ignore the error and proceed. Defaults to true

Parameters:
ignoreLinkFollowingErrors - true if the parser will ignore errors when accessing linked page, false otherwise.

isIgnoreFollowingErrors

public boolean isIgnoreFollowingErrors()
Returns a flag indicating whether the parser will ignore invalid, malformed or unavailable links when following urls to collect additional data associated to a current result. Defaults to true

Returns:
true if the parser is set to ignore errors when accessing linked page

clone

protected RemoteParserSettings<S,L,C> clone()
Overrides:
clone in class EntityParserSettings<S extends com.univocity.parsers.common.CommonParserSettings,L extends RemoteEntityList,C extends com.univocity.parsers.common.Context>

getRemoteInterval

public final long getRemoteInterval()
Returns the minimum interval of time to wait between remote requests. This is required to prevent submitting multiple requests to the same server at the same time, which can easily happen when RemoteFollowers are used. Defaults to 15 ms

Returns:
the minimum time (in milliseconds) to wait between remote requests. Values <= 0 mean the internal RateLimiter is disabled.

setRemoteInterval

public final void setRemoteInterval(long remoteInterval)
Defines the minimum interval of time to wait between remote requests. This is required to prevent submitting multiple requests to the same server at the same time, which can easily happen when RemoteFollowers are used. Defaults to 15 ms

Parameters:
remoteInterval - minimum time (in milliseconds) to wait between remote requests. Any value <= 0 will disable the internal RateLimiter.

setParseDate

public final void setParseDate(Calendar parseDate)
Defines a parse date to process historical files. It's expected that the pattern returned by getFileNamePattern() contains a date parameter, for example: @code{"{date, yyyy-MMM-dd}/results_{page}.html")"}. If the parse date is set to 2015-10-10, the parser will look for existing files under the directory named "2015-Oct-10" inside getDownloadContentDirectory() If the parse date is not null, downloads will be disabled automatically unless explicitly enabled with setDownloadEnabled(true);

Parameters:
parseDate - the date to use for loading files downloaded in the past that will be re-parsed.

setParseDate

public final void setParseDate(Date parseDate)
Defines a parse Date to process historical files. It's expected that the pattern returned by getFileNamePattern() contains a date parameter, for example: @code{"{date, yyyy-MMM-dd}/results_{page}.html")"}. If the parse Date is set to 2015-10-10, the parser will look for existing files under the directory named "2015-Oct-10" inside getDownloadContentDirectory() If the parse Date is not null, downloads will be disabled automatically unless explicitly enabled with setDownloadEnabled(true);

Parameters:
parseDate - the date to use for loading files downloaded in the past that will be re-parsed.

setParseDate

public final void setParseDate(String parseDate)
Defines a parse Date to process historical files. It's expected that the pattern returned by getFileNamePattern() contains a date parameter, for example: @code{"{date, yyyy-MMM-dd}/results_{page}.html")"}. If the parse Date is set to "2015-Oct-10", the parser will look for existing files under the directory named "2015-Oct-10" inside getDownloadContentDirectory(). If the parse Date is not null, downloads will be disabled automatically unless explicitly enabled with setDownloadEnabled(true);

Parameters:
parseDate - the formatted representation of the date to use for loading files downloaded in the past that will be re-parsed. Must match the date pattern used in getFileNamePattern()

getParseDate

public final String getParseDate()
Returns the formatted parse date to associate with any downloaded files for future re-parsing. If the pattern returned by getFileNamePattern() contains a date parameter such as "{date, yyyy-MMM-dd}/results_{page}.html")", any downloaded files will be stored under the directory named after the date. If the parse date is set manually to "2015-Oct-10", the parser will look for existing files under the directory named "2015-Oct-10" inside getDownloadContentDirectory(). If no format is defined, a String representing the time in milliseconds will be returned. If no date has been set explicitly, the current date and time of the system will be used. If given parse date is not null, downloads will be disabled automatically unless explicitly enabled with setDownloadEnabled(true);

Returns:
a formatted String representing the parse date.

getBatchId

public final String getBatchId()
Returns the custom batch ID to be used in the file name pattern specified by getFileNamePattern(). Used to process files stored locally. If a {batch} parameter is not present in the pattern, the given batch ID will be simply ignored. If the batch ID is not null, downloads will be disabled automatically unless explicitly enabled with setDownloadEnabled(true);

Returns:
the current batch ID

setBatchId

public final void setBatchId(String batchId)
Defines a custom batch ID to be used in the file name pattern specified by getFileNamePattern(). Used to process files stored locally. If a {batch} parameter is not present in the pattern, the given batch ID will be simply ignored. If the batch ID is not null, downloads will be disabled automatically unless explicitly enabled with setDownloadEnabled(true);

Parameters:
batchId - the user-specific batch ID

setDownloadEnabled

public final void setDownloadEnabled(boolean downloadEnabled)
Enables/disables any remote download operation. Enabled by default. It's recommended to disable downloads when processing historical files offline to ensure no accidental download will occur and overwrite old files. If enabled, when processing stored files any missing file that was not downloaded previously will be downloaded. Make sure that isDownloadOverwritingEnabled() is set to false to prevent downloading and overwriting existing files.

Parameters:
downloadEnabled - flag indicating whether downloads are enabled.

isDownloadEnabled

public final boolean isDownloadEnabled()
Flags whether remote downloads are enabled. true by default. It's recommended to disable downloads when processing historical files offline to ensure no accidental download will occur and overwrite old files. If enabled, when processing stored files any missing file that was not downloaded previously will be downloaded. Make sure that isDownloadOverwritingEnabled() is set to false to prevent downloading and overwriting existing files.

Returns:
flag indicating whether downloads are enabled.


Copyright © 2018 uniVocity Software Pty Ltd. All rights reserved.