Class DetectNewPartitionsDoFn
- java.lang.Object
-
- org.apache.beam.sdk.transforms.DoFn<PartitionMetadata,PartitionMetadata>
-
- org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.DetectNewPartitionsDoFn
-
- All Implemented Interfaces:
java.io.Serializable,org.apache.beam.sdk.transforms.display.HasDisplayData
@UnboundedPerElement public class DetectNewPartitionsDoFn extends org.apache.beam.sdk.transforms.DoFn<PartitionMetadata,PartitionMetadata>
A SplittableDoFn (SDF) that is responsible for scheduling partitions to be queried. This component will periodically scan the partition metadata table looking for partitions in thePartitionMetadata.State.CREATED, update their state toPartitionMetadata.State.SCHEDULEDand output them to the next stage in the pipeline.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.beam.sdk.transforms.DoFn
org.apache.beam.sdk.transforms.DoFn.AlwaysFetched, org.apache.beam.sdk.transforms.DoFn.BoundedPerElement, org.apache.beam.sdk.transforms.DoFn.BundleFinalizer, org.apache.beam.sdk.transforms.DoFn.Element, org.apache.beam.sdk.transforms.DoFn.FieldAccess, org.apache.beam.sdk.transforms.DoFn.FinishBundle, org.apache.beam.sdk.transforms.DoFn.FinishBundleContext, org.apache.beam.sdk.transforms.DoFn.GetInitialRestriction, org.apache.beam.sdk.transforms.DoFn.GetInitialWatermarkEstimatorState, org.apache.beam.sdk.transforms.DoFn.GetRestrictionCoder, org.apache.beam.sdk.transforms.DoFn.GetSize, org.apache.beam.sdk.transforms.DoFn.GetWatermarkEstimatorStateCoder, org.apache.beam.sdk.transforms.DoFn.Key, org.apache.beam.sdk.transforms.DoFn.MultiOutputReceiver, org.apache.beam.sdk.transforms.DoFn.NewTracker, org.apache.beam.sdk.transforms.DoFn.NewWatermarkEstimator, org.apache.beam.sdk.transforms.DoFn.OnTimer, org.apache.beam.sdk.transforms.DoFn.OnTimerContext, org.apache.beam.sdk.transforms.DoFn.OnTimerFamily, org.apache.beam.sdk.transforms.DoFn.OnWindowExpiration, org.apache.beam.sdk.transforms.DoFn.OnWindowExpirationContext, org.apache.beam.sdk.transforms.DoFn.OutputReceiver<T extends java.lang.Object>, org.apache.beam.sdk.transforms.DoFn.ProcessContext, org.apache.beam.sdk.transforms.DoFn.ProcessContinuation, org.apache.beam.sdk.transforms.DoFn.ProcessElement, org.apache.beam.sdk.transforms.DoFn.RequiresStableInput, org.apache.beam.sdk.transforms.DoFn.RequiresTimeSortedInput, org.apache.beam.sdk.transforms.DoFn.Restriction, org.apache.beam.sdk.transforms.DoFn.Setup, org.apache.beam.sdk.transforms.DoFn.SideInput, org.apache.beam.sdk.transforms.DoFn.SplitRestriction, org.apache.beam.sdk.transforms.DoFn.StartBundle, org.apache.beam.sdk.transforms.DoFn.StartBundleContext, org.apache.beam.sdk.transforms.DoFn.StateId, org.apache.beam.sdk.transforms.DoFn.Teardown, org.apache.beam.sdk.transforms.DoFn.TimerFamily, org.apache.beam.sdk.transforms.DoFn.TimerId, org.apache.beam.sdk.transforms.DoFn.Timestamp, org.apache.beam.sdk.transforms.DoFn.TruncateRestriction, org.apache.beam.sdk.transforms.DoFn.UnboundedPerElement, org.apache.beam.sdk.transforms.DoFn.WatermarkEstimatorState, org.apache.beam.sdk.transforms.DoFn.WindowedContext
-
-
Constructor Summary
Constructors Constructor Description DetectNewPartitionsDoFn(DaoFactory daoFactory, MapperFactory mapperFactory, ActionFactory actionFactory, CacheFactory cacheFactory, ChangeStreamMetrics metrics)This class needs aDaoFactoryto build DAOs to access the partition metadata tables.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.joda.time.InstantgetInitialWatermarkEstimatorState(PartitionMetadata partition)doublegetSize(TimestampRange restriction)TimestampRangeinitialRestriction(PartitionMetadata partition)Uses anTimestampRangewith a max range.DetectNewPartitionsRangeTrackernewTracker(TimestampRange restriction)org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant>newWatermarkEstimator(org.joda.time.Instant watermarkEstimatorState)org.apache.beam.sdk.transforms.DoFn.ProcessContinuationprocessElement(org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<TimestampRange,com.google.cloud.Timestamp> tracker, org.apache.beam.sdk.transforms.DoFn.OutputReceiver<PartitionMetadata> receiver, org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> watermarkEstimator)Main processing function for theDetectNewPartitionsDoFnfunction.voidsetAveragePartitionBytesSize(long averagePartitionBytesSize)Sets the average partition bytes size to estimate the backlog of this DoFn.voidsetup()Obtains the instance ofDetectNewPartitionsAction.
-
-
-
Constructor Detail
-
DetectNewPartitionsDoFn
public DetectNewPartitionsDoFn(DaoFactory daoFactory, MapperFactory mapperFactory, ActionFactory actionFactory, CacheFactory cacheFactory, ChangeStreamMetrics metrics)
This class needs aDaoFactoryto build DAOs to access the partition metadata tables. It uses mappers to transform database rows into thePartitionMetadatamodel. It builds the delegating action class using theActionFactory. It emits metrics for the partitions read using theChangeStreamMetrics. It re-schedules the process element function to be executed according to the default resume interval as inDEFAULT_RESUME_DURATION(best effort).- Parameters:
daoFactory- theDaoFactoryto constructPartitionMetadataDaosmapperFactory- theMapperFactoryto constructPartitionMetadataMappersactionFactory- theActionFactoryto construct actionsmetrics- theChangeStreamMetricsto emit partition related metrics
-
-
Method Detail
-
getInitialWatermarkEstimatorState
@GetInitialWatermarkEstimatorState public org.joda.time.Instant getInitialWatermarkEstimatorState(@Element PartitionMetadata partition)
-
newWatermarkEstimator
@NewWatermarkEstimator public org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> newWatermarkEstimator(@WatermarkEstimatorState org.joda.time.Instant watermarkEstimatorState)
-
initialRestriction
@GetInitialRestriction public TimestampRange initialRestriction(@Element PartitionMetadata partition)
Uses anTimestampRangewith a max range. This is because it does not know beforehand how many partitions it will schedule.- Returns:
- the timestamp range for the component
-
getSize
@GetSize public double getSize(@Restriction TimestampRange restriction)
-
newTracker
@NewTracker public DetectNewPartitionsRangeTracker newTracker(@Restriction TimestampRange restriction)
-
setup
@Setup public void setup()
Obtains the instance ofDetectNewPartitionsAction.
-
processElement
@ProcessElement public org.apache.beam.sdk.transforms.DoFn.ProcessContinuation processElement(org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<TimestampRange,com.google.cloud.Timestamp> tracker, org.apache.beam.sdk.transforms.DoFn.OutputReceiver<PartitionMetadata> receiver, org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> watermarkEstimator)
Main processing function for theDetectNewPartitionsDoFnfunction. It will delegate to theDetectNewPartitionsActionclass.
-
setAveragePartitionBytesSize
public void setAveragePartitionBytesSize(long averagePartitionBytesSize)
Sets the average partition bytes size to estimate the backlog of this DoFn. Must be called after the initialization of this DoFn.- Parameters:
averagePartitionBytesSize- the estimated average size of a partition record used in the backlog bytes calculation (DoFn.GetSize)
-
-