Class BatchStatefulParDoOverrides


  • public class BatchStatefulParDoOverrides
    extends java.lang.Object
    PTransformOverrideFactories that expands to correctly implement stateful ParDo using window-unaware BatchViewOverrides.GroupByKeyAndSortValuesOnly to linearize processing per key.

    For the Fn API, the PTransformOverrideFactory is only required to perform per key grouping and expansion.

    This implementation relies on implementation details of the Dataflow runner, specifically standard fusion behavior of ParDo transforms following a GroupByKey.

    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static <K,​InputT,​OutputT>
      org.apache.beam.sdk.runners.PTransformOverrideFactory<org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,​InputT>>,​org.apache.beam.sdk.values.PCollectionTuple,​org.apache.beam.sdk.transforms.ParDo.MultiOutput<org.apache.beam.sdk.values.KV<K,​InputT>,​OutputT>>
      multiOutputOverrideFactory​(DataflowPipelineOptions options)
      Returns a PTransformOverrideFactory that replaces a multi-output ParDo with a composite transform specialized for the DataflowRunner.
      static <K,​InputT,​OutputT>
      org.apache.beam.sdk.runners.PTransformOverrideFactory<org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,​InputT>>,​org.apache.beam.sdk.values.PCollection<OutputT>,​org.apache.beam.sdk.transforms.ParDo.SingleOutput<org.apache.beam.sdk.values.KV<K,​InputT>,​OutputT>>
      singleOutputOverrideFactory()
      Returns a PTransformOverrideFactory that replaces a single-output ParDo with a composite transform specialized for the DataflowRunner.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • BatchStatefulParDoOverrides

        public BatchStatefulParDoOverrides()
    • Method Detail

      • singleOutputOverrideFactory

        public static <K,​InputT,​OutputT> org.apache.beam.sdk.runners.PTransformOverrideFactory<org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,​InputT>>,​org.apache.beam.sdk.values.PCollection<OutputT>,​org.apache.beam.sdk.transforms.ParDo.SingleOutput<org.apache.beam.sdk.values.KV<K,​InputT>,​OutputT>> singleOutputOverrideFactory()
        Returns a PTransformOverrideFactory that replaces a single-output ParDo with a composite transform specialized for the DataflowRunner.
      • multiOutputOverrideFactory

        public static <K,​InputT,​OutputT> org.apache.beam.sdk.runners.PTransformOverrideFactory<org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,​InputT>>,​org.apache.beam.sdk.values.PCollectionTuple,​org.apache.beam.sdk.transforms.ParDo.MultiOutput<org.apache.beam.sdk.values.KV<K,​InputT>,​OutputT>> multiOutputOverrideFactory​(DataflowPipelineOptions options)
        Returns a PTransformOverrideFactory that replaces a multi-output ParDo with a composite transform specialized for the DataflowRunner.