Class ReadChangeStreamPartitionDoFn

  • All Implemented Interfaces:
    java.io.Serializable, org.apache.beam.sdk.transforms.display.HasDisplayData

    @UnboundedPerElement
    public class ReadChangeStreamPartitionDoFn
    extends org.apache.beam.sdk.transforms.DoFn<PartitionMetadata,​DataChangeRecord>
    implements java.io.Serializable
    A SDF (Splittable DoFn) class which is responsible for performing a change stream query for a given partition. A different action will be taken depending on the type of record received from the query. This component will also reflect the partition state in the partition metadata tables.

    The processing of a partition is delegated to the QueryChangeStreamAction.

    See Also:
    Serialized Form
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.beam.sdk.transforms.DoFn

        org.apache.beam.sdk.transforms.DoFn.AlwaysFetched, org.apache.beam.sdk.transforms.DoFn.BoundedPerElement, org.apache.beam.sdk.transforms.DoFn.BundleFinalizer, org.apache.beam.sdk.transforms.DoFn.Element, org.apache.beam.sdk.transforms.DoFn.FieldAccess, org.apache.beam.sdk.transforms.DoFn.FinishBundle, org.apache.beam.sdk.transforms.DoFn.FinishBundleContext, org.apache.beam.sdk.transforms.DoFn.GetInitialRestriction, org.apache.beam.sdk.transforms.DoFn.GetInitialWatermarkEstimatorState, org.apache.beam.sdk.transforms.DoFn.GetRestrictionCoder, org.apache.beam.sdk.transforms.DoFn.GetSize, org.apache.beam.sdk.transforms.DoFn.GetWatermarkEstimatorStateCoder, org.apache.beam.sdk.transforms.DoFn.Key, org.apache.beam.sdk.transforms.DoFn.MultiOutputReceiver, org.apache.beam.sdk.transforms.DoFn.NewTracker, org.apache.beam.sdk.transforms.DoFn.NewWatermarkEstimator, org.apache.beam.sdk.transforms.DoFn.OnTimer, org.apache.beam.sdk.transforms.DoFn.OnTimerContext, org.apache.beam.sdk.transforms.DoFn.OnTimerFamily, org.apache.beam.sdk.transforms.DoFn.OnWindowExpiration, org.apache.beam.sdk.transforms.DoFn.OnWindowExpirationContext, org.apache.beam.sdk.transforms.DoFn.OutputReceiver<T extends java.lang.Object>, org.apache.beam.sdk.transforms.DoFn.ProcessContext, org.apache.beam.sdk.transforms.DoFn.ProcessContinuation, org.apache.beam.sdk.transforms.DoFn.ProcessElement, org.apache.beam.sdk.transforms.DoFn.RequiresStableInput, org.apache.beam.sdk.transforms.DoFn.RequiresTimeSortedInput, org.apache.beam.sdk.transforms.DoFn.Restriction, org.apache.beam.sdk.transforms.DoFn.Setup, org.apache.beam.sdk.transforms.DoFn.SideInput, org.apache.beam.sdk.transforms.DoFn.SplitRestriction, org.apache.beam.sdk.transforms.DoFn.StartBundle, org.apache.beam.sdk.transforms.DoFn.StartBundleContext, org.apache.beam.sdk.transforms.DoFn.StateId, org.apache.beam.sdk.transforms.DoFn.Teardown, org.apache.beam.sdk.transforms.DoFn.TimerFamily, org.apache.beam.sdk.transforms.DoFn.TimerId, org.apache.beam.sdk.transforms.DoFn.Timestamp, org.apache.beam.sdk.transforms.DoFn.TruncateRestriction, org.apache.beam.sdk.transforms.DoFn.UnboundedPerElement, org.apache.beam.sdk.transforms.DoFn.WatermarkEstimatorState, org.apache.beam.sdk.transforms.DoFn.WindowedContext
    • Method Detail

      • getInitialWatermarkEstimatorState

        @GetInitialWatermarkEstimatorState
        public org.joda.time.Instant getInitialWatermarkEstimatorState​(@Element
                                                                       PartitionMetadata partition)
      • newWatermarkEstimator

        @NewWatermarkEstimator
        public org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> newWatermarkEstimator​(@WatermarkEstimatorState
                                                                                                                                   org.joda.time.Instant watermarkEstimatorState)
      • initialRestriction

        @GetInitialRestriction
        public TimestampRange initialRestriction​(@Element
                                                 PartitionMetadata partition)
        The restriction for a partition will be defined from the start and end timestamp to query the partition for. The TimestampRange restriction represents a closed-open interval, while the start / end timestamps represent a closed-closed interval, so we add 1 nanosecond to the end timestamp to convert it to closed-open.

        In this function we also update the partition state to PartitionMetadata.State.RUNNING.

        Parameters:
        partition - the partition to be queried
        Returns:
        the timestamp range from the partition start timestamp to the partition end timestamp + 1 nanosecond
      • getSize

        @GetSize
        public double getSize​(@Element
                              PartitionMetadata partition,
                              @Restriction
                              TimestampRange range)
                       throws java.lang.Exception
        Throws:
        java.lang.Exception
      • processElement

        @ProcessElement
        public org.apache.beam.sdk.transforms.DoFn.ProcessContinuation processElement​(@Element
                                                                                      PartitionMetadata partition,
                                                                                      org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<TimestampRange,​com.google.cloud.Timestamp> tracker,
                                                                                      org.apache.beam.sdk.transforms.DoFn.OutputReceiver<DataChangeRecord> receiver,
                                                                                      org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> watermarkEstimator,
                                                                                      org.apache.beam.sdk.transforms.DoFn.BundleFinalizer bundleFinalizer)
        Performs a change stream query for a given partition. A different action will be taken depending on the type of record received from the query. This component will also reflect the partition state in the partition metadata tables.

        The processing of a partition is delegated to the QueryChangeStreamAction.

        Parameters:
        partition - the partition to be queried
        tracker - an instance of ReadChangeStreamPartitionRangeTracker
        receiver - a DataChangeRecord DoFn.OutputReceiver
        watermarkEstimator - a ManualWatermarkEstimator of Instant
        bundleFinalizer - the bundle finalizer
        Returns:
        a DoFn.ProcessContinuation.stop() if a record timestamp could not be claimed or if the partition processing has finished
      • setThroughputEstimator

        public void setThroughputEstimator​(BytesThroughputEstimator<DataChangeRecord> throughputEstimator)
        Sets the estimator to calculate the backlog of this function. Must be called after the initialization of this DoFn.
        Parameters:
        throughputEstimator - an estimator to calculate local throughput.