Package org.apache.beam.runners.dataflow
Provides a Beam runner that executes pipelines on the Google Cloud Dataflow service.
-
Interface Summary Interface Description TestDataflowPipelineOptions A set of options used to configure theTestPipeline.TransformTranslator<TransformT extends org.apache.beam.sdk.transforms.PTransform> ATransformTranslatorknows how to translate a particular subclass ofPTransformfor the Cloud Dataflow service.TransformTranslator.StepTranslationContext The interface for aTransformTranslatorto build a Dataflow step.TransformTranslator.TranslationContext The interface provided to registered callbacks for interacting with theDataflowRunner, including reading and writing the values ofPCollections and side inputs. -
Class Summary Class Description BatchStatefulParDoOverrides PTransformOverrideFactoriesthat expands to correctly implement statefulParDousing window-unawareBatchViewOverrides.GroupByKeyAndSortValuesOnlyto linearize processing per key.BatchStatefulParDoOverrides.BatchStatefulDoFn<K,V,OutputT> A key-preservingDoFnthat explodes an iterable that has been grouped by key and window.CreateDataflowView<ElemT,ViewT> ADataflowRunnermarker class for creating aPCollectionView.DataflowClient Wrapper around the generatedDataflowclient to provide common functionality.DataflowPipelineJob A DataflowPipelineJob represents a job submitted to Dataflow usingDataflowRunner.DataflowPipelineRegistrar DataflowPipelineRegistrar.Options Register theDataflowPipelineOptions.DataflowPipelineRegistrar.Runner Register theDataflowRunner.DataflowPipelineTranslator DataflowPipelineTranslatorknows how to translatePipelineobjects into Cloud Dataflow Service APIJobs.DataflowPipelineTranslator.JobSpecification The result of a job translation.DataflowRunner APipelineRunnerthat executes the operations in the pipeline by first translating them to the Dataflow representation using theDataflowPipelineTranslatorand then submitting them to a Dataflow service for execution.DataflowRunner.DataflowTransformTranslator DataflowRunner.StreamingPCollectionViewWriterFn<T> A markerDoFnfor writing the contents of aPCollectionto a streamingPCollectionViewbackend implementation.DataflowRunnerHooks An instance of this class can be passed to theDataflowRunnerto add user defined hooks to be invoked at various times during pipeline execution.DataflowRunnerInfo Populates versioning and other information forDataflowRunner.GroupIntoBatchesOverride PrimitiveParDoSingleFactory<InputT,OutputT> APTransformOverrideFactorythat producesPrimitiveParDoSingleFactory.ParDoSingleinstances fromParDo.SingleOutputinstances.PrimitiveParDoSingleFactory.ParDoSingle<InputT,OutputT> A single-output primitiveParDo.PrimitiveParDoSingleFactory.PayloadTranslator A translator forPrimitiveParDoSingleFactory.ParDoSingle.PrimitiveParDoSingleFactory.Registrar TestDataflowRunner TestDataflowRunneris a pipeline runner that wraps aDataflowRunnerwhen running tests against theTestPipeline. -
Exception Summary Exception Description DataflowJobAlreadyExistsException An exception that is thrown if the unique job name constraint of the Dataflow service is broken because an existing job with the same job name is currently active.DataflowJobAlreadyUpdatedException An exception that is thrown if the existing job has already been updated within the Dataflow service and is no longer able to be updated.DataflowJobException ARuntimeExceptionthat contains information about aDataflowPipelineJob.DataflowServiceException Signals there was an error retrieving information about a job from the Cloud Dataflow Service.