Package org.apache.beam.runners.dataflow
Class DataflowRunner
- java.lang.Object
-
- org.apache.beam.sdk.PipelineRunner<DataflowPipelineJob>
-
- org.apache.beam.runners.dataflow.DataflowRunner
-
public class DataflowRunner extends org.apache.beam.sdk.PipelineRunner<DataflowPipelineJob>
APipelineRunnerthat executes the operations in the pipeline by first translating them to the Dataflow representation using theDataflowPipelineTranslatorand then submitting them to a Dataflow service for execution.Permissions
When reading from a Dataflow source or writing to a Dataflow sink using
DataflowRunner, the Google cloudservices account and the Google compute engine service account of the GCP project running the Dataflow Job will need access to the corresponding source/sink.Please see Google Cloud Dataflow Security and Permissions for more details.
DataflowRunner now supports creating job templates using the
--templateLocationoption. If this option is set, the runner will generate a template instead of running the pipeline immediately.Example:
--runner=DataflowRunner --templateLocation=gs://your-bucket/templates/my-template
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classDataflowRunner.DataflowTransformTranslatorstatic classDataflowRunner.StreamingPCollectionViewWriterFn<T>A markerDoFnfor writing the contents of aPCollectionto a streamingPCollectionViewbackend implementation.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringPROJECT_ID_REGEXPProject IDs must contain lowercase letters, digits, or dashes.static java.lang.StringUNSAFELY_ATTEMPT_TO_PROCESS_UNBOUNDED_DATA_IN_BATCH_MODEExperiment to "unsafely attempt to process unbounded data in batch mode".
-
Constructor Summary
Constructors Modifier Constructor Description protectedDataflowRunner(DataflowPipelineOptions options)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.apache.beam.model.pipeline.v1.RunnerApi.PipelineapplySdkEnvironmentOverrides(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline, DataflowPipelineOptions options)static DataflowRunnerfromOptions(org.apache.beam.sdk.options.PipelineOptions options)Construct a runner from the provided options.DataflowPipelineTranslatorgetTranslator()Returns the DataflowPipelineTranslator associated with this object.static booleanhasExperiment(DataflowPipelineDebugOptions options, java.lang.String experiment)Returns true if the specified experiment is enabled, handling null experiments.static java.util.List<java.lang.String>replaceGcsFilesWithLocalFiles(java.util.List<java.lang.String> filesToStage)Replaces GCS file paths with local file paths by downloading the GCS files locally.protected voidreplaceV1Transforms(org.apache.beam.sdk.Pipeline pipeline)protected org.apache.beam.model.pipeline.v1.RunnerApi.PipelineresolveArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)DataflowPipelineJobrun(org.apache.beam.sdk.Pipeline pipeline)voidsetHooks(DataflowRunnerHooks hooks)Sets callbacks to invoke during execution seeDataflowRunnerHooks.protected java.util.List<com.google.api.services.dataflow.model.DataflowPackage>stageArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)java.lang.StringtoString()
-
-
-
Field Detail
-
UNSAFELY_ATTEMPT_TO_PROCESS_UNBOUNDED_DATA_IN_BATCH_MODE
public static final java.lang.String UNSAFELY_ATTEMPT_TO_PROCESS_UNBOUNDED_DATA_IN_BATCH_MODE
Experiment to "unsafely attempt to process unbounded data in batch mode".- See Also:
- Constant Field Values
-
PROJECT_ID_REGEXP
public static final java.lang.String PROJECT_ID_REGEXP
Project IDs must contain lowercase letters, digits, or dashes. IDs must start with a letter and may not end with a dash. This regex isn't exact - this allows for patterns that would be rejected by the service, but this is sufficient for basic validation of project IDs.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
DataflowRunner
protected DataflowRunner(DataflowPipelineOptions options)
-
-
Method Detail
-
replaceGcsFilesWithLocalFiles
public static java.util.List<java.lang.String> replaceGcsFilesWithLocalFiles(java.util.List<java.lang.String> filesToStage)
Replaces GCS file paths with local file paths by downloading the GCS files locally. This is useful when files need to be accessed locally before being staged to Dataflow.- Parameters:
filesToStage- List of file paths that may contain GCS paths (gs://) and local paths- Returns:
- List of local file paths where any GCS paths have been downloaded locally
- Throws:
java.lang.RuntimeException- if there are errors copying GCS files locally
-
fromOptions
public static DataflowRunner fromOptions(org.apache.beam.sdk.options.PipelineOptions options)
Construct a runner from the provided options.- Parameters:
options- Properties that configure the runner.- Returns:
- The newly created runner.
-
applySdkEnvironmentOverrides
protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline applySdkEnvironmentOverrides(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline, DataflowPipelineOptions options)
-
resolveArtifacts
protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline resolveArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)
-
stageArtifacts
protected java.util.List<com.google.api.services.dataflow.model.DataflowPackage> stageArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)
-
run
public DataflowPipelineJob run(org.apache.beam.sdk.Pipeline pipeline)
- Specified by:
runin classorg.apache.beam.sdk.PipelineRunner<DataflowPipelineJob>
-
hasExperiment
public static boolean hasExperiment(DataflowPipelineDebugOptions options, java.lang.String experiment)
Returns true if the specified experiment is enabled, handling null experiments.
-
replaceV1Transforms
protected void replaceV1Transforms(org.apache.beam.sdk.Pipeline pipeline)
-
getTranslator
public DataflowPipelineTranslator getTranslator()
Returns the DataflowPipelineTranslator associated with this object.
-
setHooks
public void setHooks(DataflowRunnerHooks hooks)
Sets callbacks to invoke during execution seeDataflowRunnerHooks.
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
-