Interface DataflowPipelineWorkerPoolOptions

  • All Superinterfaces:
    org.apache.beam.sdk.options.FileStagingOptions, org.apache.beam.sdk.extensions.gcp.options.GcpOptions, org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions, org.apache.beam.sdk.transforms.display.HasDisplayData, org.apache.beam.sdk.options.PipelineOptions
    All Known Subinterfaces:
    DataflowPipelineOptions, DataflowWorkerHarnessOptions, TestDataflowPipelineOptions

    public interface DataflowPipelineWorkerPoolOptions
    extends org.apache.beam.sdk.extensions.gcp.options.GcpOptions, org.apache.beam.sdk.options.FileStagingOptions
    Options that are used to configure the Dataflow pipeline worker pool.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Interface Description
      static class  DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType
      Type of autoscaling algorithm to use.
      • Nested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions

        org.apache.beam.sdk.extensions.gcp.options.GcpOptions.DefaultProjectFactory, org.apache.beam.sdk.extensions.gcp.options.GcpOptions.EnableStreamingEngineFactory, org.apache.beam.sdk.extensions.gcp.options.GcpOptions.GcpOAuthScopesFactory, org.apache.beam.sdk.extensions.gcp.options.GcpOptions.GcpTempLocationFactory, org.apache.beam.sdk.extensions.gcp.options.GcpOptions.GcpUserCredentialsFactory
      • Nested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions

        org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions.GoogleApiTracer
      • Nested classes/interfaces inherited from interface org.apache.beam.sdk.options.PipelineOptions

        org.apache.beam.sdk.options.PipelineOptions.AtomicLongFactory, org.apache.beam.sdk.options.PipelineOptions.CheckEnabled, org.apache.beam.sdk.options.PipelineOptions.DirectRunner, org.apache.beam.sdk.options.PipelineOptions.JobNameFactory, org.apache.beam.sdk.options.PipelineOptions.UserAgentFactory
    • Field Summary

      • Fields inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions

        STREAMING_ENGINE_EXPERIMENT, WINDMILL_SERVICE_EXPERIMENT
    • Method Detail

      • getNumWorkers

        int getNumWorkers()
        Number of workers to use when executing the Dataflow job. Note that selection of an autoscaling algorithm other then NONE will affect the size of the worker pool. If left unspecified, the Dataflow service will determine the number of workers.
      • setNumWorkers

        void setNumWorkers​(int value)
      • getAutoscalingAlgorithm

        DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType getAutoscalingAlgorithm()
        The autoscaling algorithm to use for the workerpool.
        • NONE: does not change the size of the worker pool.
        • BASIC: autoscale the worker pool size up to maxNumWorkers until the job completes.
        • THROUGHPUT_BASED: autoscale the workerpool based on throughput (up to maxNumWorkers).
      • getMaxNumWorkers

        int getMaxNumWorkers()
        The maximum number of workers to use for the workerpool. This options limits the size of the workerpool for the lifetime of the job, including pipeline updates. If left unspecified, the Dataflow service will compute a ceiling.
      • setMaxNumWorkers

        void setMaxNumWorkers​(int value)
      • getDiskSizeGb

        int getDiskSizeGb()
        Remote worker disk size, in gigabytes, or 0 to use the default size.
      • setDiskSizeGb

        void setDiskSizeGb​(int value)
      • getWorkerHarnessContainerImage

        @Deprecated
        @Hidden
        java.lang.String getWorkerHarnessContainerImage()
        Deprecated.
      • getSdkContainerImage

        java.lang.String getSdkContainerImage()
        Container image used to configure SDK execution environment on worker. Used for custom containers on portable pipelines only.
      • setSdkContainerImage

        void setSdkContainerImage​(java.lang.String value)
      • getNetwork

        java.lang.String getNetwork()
        GCE network for launching workers.

        Default is up to the Dataflow service.

      • setNetwork

        void setNetwork​(java.lang.String value)
      • getSubnetwork

        java.lang.String getSubnetwork()
        GCE subnetwork for launching workers.

        Default is up to the Dataflow service. Expected format is regions/REGION/subnetworks/SUBNETWORK or the fully qualified subnetwork name, beginning with https://..., e.g. https://www.googleapis.com/compute/alpha/projects/PROJECT/ regions/REGION/subnetworks/SUBNETWORK

      • setSubnetwork

        void setSubnetwork​(java.lang.String value)
      • getWorkerMachineType

        java.lang.String getWorkerMachineType()
        Machine type to create Dataflow worker VMs as.

        See GCE machine types for a list of valid options.

        If unset, the Dataflow service will choose a reasonable default.

      • setWorkerMachineType

        void setWorkerMachineType​(java.lang.String value)
      • getWorkerDiskType

        java.lang.String getWorkerDiskType()
        Specifies what type of persistent disk is used. The value is a full disk type resource, e.g., compute.googleapis.com/projects//zones//diskTypes/pd-ssd. For more information, see the API reference documentation for DiskTypes.
      • setWorkerDiskType

        void setWorkerDiskType​(java.lang.String value)
      • getUsePublicIps

        @Nullable java.lang.Boolean getUsePublicIps()
        Specifies whether worker pools should be started with public IP addresses.

        WARNING: This feature is available only through allowlist.

      • setUsePublicIps

        void setUsePublicIps​(@Nullable java.lang.Boolean value)
      • setMinCpuPlatform

        void setMinCpuPlatform​(java.lang.String minCpuPlatform)