Interface DataflowPipelineOptions
-
- All Superinterfaces:
org.apache.beam.sdk.options.ApplicationNameOptions,org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions,DataflowPipelineDebugOptions,DataflowPipelineWorkerPoolOptions,DataflowProfilingOptions,DataflowStreamingPipelineOptions,DataflowWorkerLoggingOptions,org.apache.beam.sdk.options.ExperimentalOptions,org.apache.beam.sdk.options.FileStagingOptions,org.apache.beam.sdk.extensions.gcp.options.GcpOptions,org.apache.beam.sdk.extensions.gcp.options.GcsOptions,org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions,org.apache.beam.sdk.transforms.display.HasDisplayData,org.apache.beam.sdk.options.MemoryMonitorOptions,org.apache.beam.sdk.options.PipelineOptions,org.apache.beam.sdk.io.gcp.pubsub.PubsubOptions,org.apache.beam.sdk.options.StreamingOptions
- All Known Subinterfaces:
DataflowWorkerHarnessOptions,TestDataflowPipelineOptions
public interface DataflowPipelineOptions extends org.apache.beam.sdk.options.PipelineOptions, org.apache.beam.sdk.extensions.gcp.options.GcpOptions, org.apache.beam.sdk.options.ApplicationNameOptions, DataflowPipelineDebugOptions, DataflowPipelineWorkerPoolOptions, org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions, org.apache.beam.sdk.extensions.gcp.options.GcsOptions, org.apache.beam.sdk.options.StreamingOptions, DataflowWorkerLoggingOptions, DataflowStreamingPipelineOptions, DataflowProfilingOptions, org.apache.beam.sdk.io.gcp.pubsub.PubsubOptions
Options that can be used to configure theDataflowRunner.
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static classDataflowPipelineOptions.FlexResourceSchedulingGoalSet of available Flexible Resource Scheduling goals.static classDataflowPipelineOptions.StagingLocationFactoryReturns a default staging location underGcpOptions.getGcpTempLocation().-
Nested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineDebugOptions
DataflowPipelineDebugOptions.DataflowClientFactory, DataflowPipelineDebugOptions.StagerFactory, DataflowPipelineDebugOptions.UnboundedReaderMaxReadTimeFactory
-
Nested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions
DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType
-
Nested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowProfilingOptions
DataflowProfilingOptions.DataflowProfilingAgentConfiguration
-
Nested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowStreamingPipelineOptions
DataflowStreamingPipelineOptions.EnableWindmillServiceDirectPathFactory, DataflowStreamingPipelineOptions.GlobalConfigRefreshPeriodFactory, DataflowStreamingPipelineOptions.HarnessUpdateReportingPeriodFactory, DataflowStreamingPipelineOptions.LocalWindmillHostportFactory, DataflowStreamingPipelineOptions.MaxStackTraceDepthToReportFactory, DataflowStreamingPipelineOptions.PeriodicStatusPageDirectoryFactory, DataflowStreamingPipelineOptions.WindmillServiceStreamingRpcBatchLimitFactory
-
Nested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowWorkerLoggingOptions
DataflowWorkerLoggingOptions.Level, DataflowWorkerLoggingOptions.WorkerLogLevelOverrides
-
Nested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
org.apache.beam.sdk.extensions.gcp.options.GcpOptions.DefaultProjectFactory, org.apache.beam.sdk.extensions.gcp.options.GcpOptions.EnableStreamingEngineFactory, org.apache.beam.sdk.extensions.gcp.options.GcpOptions.GcpOAuthScopesFactory, org.apache.beam.sdk.extensions.gcp.options.GcpOptions.GcpTempLocationFactory, org.apache.beam.sdk.extensions.gcp.options.GcpOptions.GcpUserCredentialsFactory
-
Nested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcsOptions
org.apache.beam.sdk.extensions.gcp.options.GcsOptions.ExecutorServiceFactory, org.apache.beam.sdk.extensions.gcp.options.GcsOptions.GcsCustomAuditEntries, org.apache.beam.sdk.extensions.gcp.options.GcsOptions.PathValidatorFactory
-
Nested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions
org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions.GoogleApiTracer
-
Nested classes/interfaces inherited from interface org.apache.beam.sdk.options.PipelineOptions
org.apache.beam.sdk.options.PipelineOptions.AtomicLongFactory, org.apache.beam.sdk.options.PipelineOptions.CheckEnabled, org.apache.beam.sdk.options.PipelineOptions.DirectRunner, org.apache.beam.sdk.options.PipelineOptions.JobNameFactory, org.apache.beam.sdk.options.PipelineOptions.UserAgentFactory
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description java.lang.StringgetCreateFromSnapshot()If set, the snapshot from which the job should be created.java.lang.StringgetDataflowEndpoint()Dataflow endpoint to use.java.util.List<java.lang.String>getDataflowServiceOptions()Service options are set by the user and configure the service.java.lang.StringgetDataflowWorkerJar()DataflowPipelineOptions.FlexResourceSchedulingGoalgetFlexRSGoal()This option controls Flexible Resource Scheduling mode.java.util.List<java.lang.String>getJdkAddOpenModules()Open modules needed for reflection that access JDK internals with Java 9+java.util.Map<java.lang.String,java.lang.String>getLabels()Labels that will be applied to the billing records for this job.java.lang.StringgetPipelineUrl()The URL of the staged portable pipeline.java.lang.StringgetProject()java.lang.StringgetRegion()The Google Compute Engine region for creating Dataflow jobs.java.lang.StringgetServiceAccount()Run the job as a specific service account, instead of the default GCE robot.java.lang.StringgetStagingLocation()GCS path for staging local files, e.g.java.lang.StringgetTemplateLocation()Where the runner should generate a template file.booleanisHotKeyLoggingEnabled()If enabled then the literal key will be logged to Cloud Logging if a hot key is detected.booleanisUpdate()Whether to update the currently running pipeline with the same name as this one.voidsetCreateFromSnapshot(java.lang.String value)voidsetDataflowEndpoint(java.lang.String value)voidsetDataflowServiceOptions(java.util.List<java.lang.String> options)voidsetDataflowWorkerJar(java.lang.String dataflowWorkerJar)voidsetFlexRSGoal(DataflowPipelineOptions.FlexResourceSchedulingGoal goal)voidsetHotKeyLoggingEnabled(boolean value)voidsetJdkAddOpenModules(java.util.List<java.lang.String> options)voidsetLabels(java.util.Map<java.lang.String,java.lang.String> labels)voidsetPipelineUrl(java.lang.String urlString)voidsetProject(java.lang.String value)voidsetRegion(java.lang.String region)voidsetServiceAccount(java.lang.String value)voidsetStagingLocation(java.lang.String value)voidsetTemplateLocation(java.lang.String value)Sets the Cloud Storage path where the Dataflow template will be stored.voidsetUpdate(boolean value)-
Methods inherited from interface org.apache.beam.sdk.options.ApplicationNameOptions
getAppName, setAppName
-
Methods inherited from interface org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions
getBigQueryEndpoint, getBigQueryProject, getBqStreamingApiLoggingFrequencySec, getEnableStorageReadApiV2, getGroupFilesFileLoad, getHTTPReadTimeout, getHTTPWriteTimeout, getInsertBundleParallelism, getJobLabelsMap, getMaxBufferingDurationMilliSec, getMaxConnectionPoolConnections, getMaxStreamingBatchSize, getMaxStreamingRowsToBatch, getMinConnectionPoolConnections, getNumStorageWriteApiStreamAppendClients, getNumStorageWriteApiStreams, getNumStreamingKeys, getStorageApiAppendThresholdBytes, getStorageApiAppendThresholdRecordCount, getStorageWriteApiMaxRequestSize, getStorageWriteApiMaxRetries, getStorageWriteApiTriggeringFrequencySec, getStorageWriteMaxInflightBytes, getStorageWriteMaxInflightRequests, getTempDatasetId, getUseStorageApiConnectionPool, getUseStorageWriteApi, getUseStorageWriteApiAtLeastOnce, setBigQueryEndpoint, setBigQueryProject, setBqStreamingApiLoggingFrequencySec, setEnableStorageReadApiV2, setGroupFilesFileLoad, setHTTPReadTimeout, setHTTPWriteTimeout, setInsertBundleParallelism, setJobLabelsMap, setMaxBufferingDurationMilliSec, setMaxConnectionPoolConnections, setMaxStreamingBatchSize, setMaxStreamingRowsToBatch, setMinConnectionPoolConnections, setNumStorageWriteApiStreamAppendClients, setNumStorageWriteApiStreams, setNumStreamingKeys, setStorageApiAppendThresholdBytes, setStorageApiAppendThresholdRecordCount, setStorageWriteApiMaxRequestSize, setStorageWriteApiMaxRetries, setStorageWriteApiTriggeringFrequencySec, setStorageWriteMaxInflightBytes, setStorageWriteMaxInflightRequests, setTempDatasetId, setUseStorageApiConnectionPool, setUseStorageWriteApi, setUseStorageWriteApiAtLeastOnce
-
Methods inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineDebugOptions
getApiRootUrl, getDataflowClient, getDataflowJobFile, getDesiredNumUnboundedSourceSplits, getDumpHeapOnOOM, getJfrRecordingDurationSec, getNumberOfWorkerHarnessThreads, getReaderCacheTimeoutSec, getRecordJfrOnGcThrashing, getSaveHeapDumpsToGcsPath, getSdkHarnessContainerImageOverrides, getStager, getStagerClass, getTransformNameMapping, getUnboundedReaderMaxElements, getUnboundedReaderMaxReadTimeMs, getUnboundedReaderMaxReadTimeSec, getUnboundedReaderMaxWaitForElementsMs, getWorkerCacheMb, setApiRootUrl, setDataflowClient, setDataflowJobFile, setDesiredNumUnboundedSourceSplits, setDumpHeapOnOOM, setJfrRecordingDurationSec, setNumberOfWorkerHarnessThreads, setReaderCacheTimeoutSec, setRecordJfrOnGcThrashing, setSaveHeapDumpsToGcsPath, setSdkHarnessContainerImageOverrides, setStager, setStagerClass, setTransformNameMapping, setUnboundedReaderMaxElements, setUnboundedReaderMaxReadTimeMs, setUnboundedReaderMaxReadTimeSec, setUnboundedReaderMaxWaitForElementsMs, setWorkerCacheMb
-
Methods inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions
getAutoscalingAlgorithm, getDiskSizeGb, getMaxNumWorkers, getMinCpuPlatform, getNetwork, getNumWorkers, getSdkContainerImage, getSubnetwork, getUsePublicIps, getWorkerDiskType, getWorkerHarnessContainerImage, getWorkerMachineType, setAutoscalingAlgorithm, setDiskSizeGb, setMaxNumWorkers, setMinCpuPlatform, setNetwork, setNumWorkers, setSdkContainerImage, setSubnetwork, setUsePublicIps, setWorkerDiskType, setWorkerHarnessContainerImage, setWorkerMachineType
-
Methods inherited from interface org.apache.beam.runners.dataflow.options.DataflowProfilingOptions
getProfilingAgentConfiguration, getSaveProfilesToGcs, setProfilingAgentConfiguration, setSaveProfilesToGcs
-
Methods inherited from interface org.apache.beam.runners.dataflow.options.DataflowStreamingPipelineOptions
getActiveWorkRefreshPeriodMillis, getChannelzShowOnlyWindmillServiceChannels, getGlobalConfigRefreshPeriod, getIsWindmillServiceDirectPathEnabled, getLocalWindmillHostport, getMaxBundlesFromWindmillOutstanding, getMaxBytesFromWindmillOutstanding, getMaxStackTraceDepthToReport, getOverrideWindmillBinary, getPeriodicStatusPageOutputDirectory, getPerWorkerMetricsUpdateReportingPeriodMillis, getStreamingSideInputCacheExpirationMillis, getStreamingSideInputCacheMb, getStuckCommitDurationMillis, getUseSeparateWindmillHeartbeatStreams, getUseWindmillIsolatedChannels, getWindmillGetDataStreamCount, getWindmillHarnessUpdateReportingPeriod, getWindmillMessagesBetweenIsReadyChecks, getWindmillRequestBatchedGetWorkResponse, getWindmillServiceCommitThreads, getWindmillServiceEndpoint, getWindmillServicePort, getWindmillServiceRpcChannelAliveTimeoutSec, getWindmillServiceStreamingLogEveryNStreamFailures, getWindmillServiceStreamingRpcBatchLimit, getWindmillServiceStreamingRpcHealthCheckPeriodMs, getWindmillServiceStreamMaxBackoffMillis, setActiveWorkRefreshPeriodMillis, setChannelzShowOnlyWindmillServiceChannels, setGlobalConfigRefreshPeriod, setIsWindmillServiceDirectPathEnabled, setLocalWindmillHostport, setMaxBundlesFromWindmillOutstanding, setMaxBytesFromWindmillOutstanding, setMaxStackTraceDepthToReport, setOverrideWindmillBinary, setPeriodicStatusPageOutputDirectory, setPerWorkerMetricsUpdateReportingPeriodMillis, setStreamingSideInputCacheExpirationMillis, setStreamingSideInputCacheMb, setStuckCommitDurationMillis, setUseSeparateWindmillHeartbeatStreams, setUseWindmillIsolatedChannels, setWindmillGetDataStreamCount, setWindmillHarnessUpdateReportingPeriod, setWindmillMessagesBetweenIsReadyChecks, setWindmillRequestBatchedGetWorkResponse, setWindmillServiceCommitThreads, setWindmillServiceEndpoint, setWindmillServicePort, setWindmillServiceRpcChannelAliveTimeoutSec, setWindmillServiceStreamingLogEveryNStreamFailures, setWindmillServiceStreamingRpcBatchLimit, setWindmillServiceStreamingRpcHealthCheckPeriodMs, setWindmillServiceStreamMaxBackoffMillis
-
Methods inherited from interface org.apache.beam.runners.dataflow.options.DataflowWorkerLoggingOptions
getDefaultWorkerLogLevel, getWorkerLogLevelOverrides, getWorkerSystemErrMessageLevel, getWorkerSystemOutMessageLevel, setDefaultWorkerLogLevel, setWorkerLogLevelOverrides, setWorkerSystemErrMessageLevel, setWorkerSystemOutMessageLevel
-
Methods inherited from interface org.apache.beam.sdk.options.ExperimentalOptions
getExperiments, setExperiments
-
Methods inherited from interface org.apache.beam.sdk.options.FileStagingOptions
getFilesToStage, setFilesToStage
-
Methods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
getCredentialFactoryClass, getDataflowKmsKey, getGcpCredential, getGcpOauthScopes, getGcpTempLocation, getImpersonateServiceAccount, getWorkerRegion, getWorkerZone, getZone, isEnableStreamingEngine, setCredentialFactoryClass, setDataflowKmsKey, setEnableStreamingEngine, setGcpCredential, setGcpOauthScopes, setGcpTempLocation, setImpersonateServiceAccount, setWorkerRegion, setWorkerZone, setZone
-
Methods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcsOptions
getEnableBucketReadMetricCounter, getEnableBucketWriteMetricCounter, getExecutorService, getGcsCustomAuditEntries, getGcsEndpoint, getGcsHttpRequestReadTimeout, getGcsHttpRequestWriteTimeout, getGcsPerformanceMetrics, getGcsReadCounterPrefix, getGcsRewriteDataOpBatchLimit, getGcsUploadBufferSizeBytes, getGcsUtil, getGcsWriteCounterPrefix, getGoogleCloudStorageReadOptions, getPathValidator, getPathValidatorClass, setEnableBucketReadMetricCounter, setEnableBucketWriteMetricCounter, setExecutorService, setGcsCustomAuditEntries, setGcsEndpoint, setGcsHttpRequestReadTimeout, setGcsHttpRequestWriteTimeout, setGcsPerformanceMetrics, setGcsReadCounterPrefix, setGcsRewriteDataOpBatchLimit, setGcsUploadBufferSizeBytes, setGcsUtil, setGcsWriteCounterPrefix, setGoogleCloudStorageReadOptions, setPathValidator, setPathValidatorClass
-
Methods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions
getGoogleApiTrace, setGoogleApiTrace
-
Methods inherited from interface org.apache.beam.sdk.transforms.display.HasDisplayData
populateDisplayData
-
Methods inherited from interface org.apache.beam.sdk.options.MemoryMonitorOptions
getGCThrashingPercentagePerPeriod, getGzipCompressHeapDumps, getRemoteHeapDumpLocation, setGCThrashingPercentagePerPeriod, setGzipCompressHeapDumps, setRemoteHeapDumpLocation
-
Methods inherited from interface org.apache.beam.sdk.options.PipelineOptions
as, getJobName, getOptionsId, getRunner, getStableUniqueNames, getTempLocation, getUserAgent, outputRuntimeOptions, revision, setJobName, setOptionsId, setRunner, setStableUniqueNames, setTempLocation, setUserAgent
-
-
-
-
Method Detail
-
getProject
@Required @InstanceFactory(org.apache.beam.sdk.extensions.gcp.options.GcpOptions.DefaultProjectFactory.class) java.lang.String getProject()
- Specified by:
getProjectin interfaceorg.apache.beam.sdk.extensions.gcp.options.GcpOptions
-
setProject
void setProject(java.lang.String value)
- Specified by:
setProjectin interfaceorg.apache.beam.sdk.extensions.gcp.options.GcpOptions
-
getStagingLocation
@InstanceFactory(StagingLocationFactory.class) java.lang.String getStagingLocation()
GCS path for staging local files, e.g. gs://bucket/objectMust be a valid Cloud Storage URL, beginning with the prefix "gs://"
If
getStagingLocation()is not set, it will default toGcpOptions.getGcpTempLocation().GcpOptions.getGcpTempLocation()must be a valid GCS path.
-
setStagingLocation
void setStagingLocation(java.lang.String value)
-
isUpdate
boolean isUpdate()
Whether to update the currently running pipeline with the same name as this one.
-
setUpdate
void setUpdate(boolean value)
-
getCreateFromSnapshot
java.lang.String getCreateFromSnapshot()
If set, the snapshot from which the job should be created.
-
setCreateFromSnapshot
void setCreateFromSnapshot(java.lang.String value)
-
getTemplateLocation
java.lang.String getTemplateLocation()
Where the runner should generate a template file. Must either be local or Cloud Storage.
-
setTemplateLocation
void setTemplateLocation(java.lang.String value)
Sets the Cloud Storage path where the Dataflow template will be stored. Required for creating Flex Templates or Classic Templates.Example:
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class); options.setTemplateLocation("gs://your-bucket/templates/my-template");- Parameters:
value- Cloud Storage path for storing the Dataflow template.
-
getDataflowServiceOptions
java.util.List<java.lang.String> getDataflowServiceOptions()
Service options are set by the user and configure the service. This decouples service side feature availability from the Apache Beam release cycle.
-
setDataflowServiceOptions
void setDataflowServiceOptions(java.util.List<java.lang.String> options)
-
getServiceAccount
java.lang.String getServiceAccount()
Run the job as a specific service account, instead of the default GCE robot.
-
setServiceAccount
void setServiceAccount(java.lang.String value)
-
getRegion
@InstanceFactory(DefaultGcpRegionFactory.class) java.lang.String getRegion()
The Google Compute Engine region for creating Dataflow jobs.
-
setRegion
void setRegion(java.lang.String region)
-
getDataflowEndpoint
@String("") java.lang.String getDataflowEndpoint()Dataflow endpoint to use.Defaults to the current version of the Google Cloud Dataflow API, at the time the current SDK version was released.
If the string contains "://", then this is treated as a URL, otherwise
DataflowPipelineDebugOptions.getApiRootUrl()is used as the root URL.- Specified by:
getDataflowEndpointin interfaceDataflowPipelineDebugOptions
-
setDataflowEndpoint
void setDataflowEndpoint(java.lang.String value)
- Specified by:
setDataflowEndpointin interfaceDataflowPipelineDebugOptions
-
getLabels
java.util.Map<java.lang.String,java.lang.String> getLabels()
Labels that will be applied to the billing records for this job.
-
setLabels
void setLabels(java.util.Map<java.lang.String,java.lang.String> labels)
-
getPipelineUrl
java.lang.String getPipelineUrl()
The URL of the staged portable pipeline.
-
setPipelineUrl
void setPipelineUrl(java.lang.String urlString)
-
getDataflowWorkerJar
java.lang.String getDataflowWorkerJar()
-
setDataflowWorkerJar
void setDataflowWorkerJar(java.lang.String dataflowWorkerJar)
-
getFlexRSGoal
@Enum("UNSPECIFIED") DataflowPipelineOptions.FlexResourceSchedulingGoal getFlexRSGoal()This option controls Flexible Resource Scheduling mode.
-
setFlexRSGoal
void setFlexRSGoal(DataflowPipelineOptions.FlexResourceSchedulingGoal goal)
-
isHotKeyLoggingEnabled
boolean isHotKeyLoggingEnabled()
If enabled then the literal key will be logged to Cloud Logging if a hot key is detected.
-
setHotKeyLoggingEnabled
void setHotKeyLoggingEnabled(boolean value)
-
getJdkAddOpenModules
java.util.List<java.lang.String> getJdkAddOpenModules()
Open modules needed for reflection that access JDK internals with Java 9+With JDK 16+, JDK internals are strongly encapsulated and can result in an InaccessibleObjectException being thrown if a tool or library uses reflection that access JDK internals. If you see these errors in your worker logs, you can pass in modules to open using the format module/package=target-module(,target-module)* to allow access to the library. E.g. java.base/java.lang=jamm
You may see warnings that jamm, a library used to more accurately size objects, is unable to make a private field accessible. To resolve the warning, open the specified module/package to jamm.
-
setJdkAddOpenModules
void setJdkAddOpenModules(java.util.List<java.lang.String> options)
-
-