Class PubsubUnboundedSink

  • All Implemented Interfaces:
    java.io.Serializable, org.apache.beam.sdk.transforms.display.HasDisplayData

    public class PubsubUnboundedSink
    extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<PubsubMessage>,​org.apache.beam.sdk.values.PDone>
    A PTransform which streams messages to Pubsub.
    • The underlying implementation is just a GroupByKey followed by a ParDo which publishes as a side effect. (In the future we want to design and switch to a custom UnboundedSink implementation so as to gain access to system watermark and end-of-pipeline cleanup.)
    • We try to send messages in batches while also limiting send latency.
    • No stats are logged. Rather some counters are used to keep track of elements and batches.
    • Though some background threads are used by the underlying netty system all actual Pubsub calls are blocking. We rely on the underlying runner to allow multiple DoFn instances to execute concurrently and hide latency.
    • A failed bundle will cause messages to be resent. Thus we rely on the Pubsub consumer to dedup messages.
    See Also:
    Serialized Form
    • Field Summary

      • Fields inherited from class org.apache.beam.sdk.transforms.PTransform

        annotations, displayData, name, resourceHints
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.beam.sdk.values.PDone expand​(org.apache.beam.sdk.values.PCollection<PubsubMessage> input)  
      @Nullable java.lang.String getIdAttribute()
      Get the id attribute.
      boolean getPublishBatchWithOrderingKey()  
      @Nullable java.lang.String getTimestampAttribute()
      Get the timestamp attribute.
      @Nullable PubsubClient.TopicPath getTopic()
      Get the topic being written to.
      @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> getTopicProvider()
      Get the ValueProvider for the topic being written to.
      • Methods inherited from class org.apache.beam.sdk.transforms.PTransform

        addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • PubsubUnboundedSink

        public PubsubUnboundedSink​(PubsubClient.PubsubClientFactory pubsubFactory,
                                   org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic,
                                   java.lang.String timestampAttribute,
                                   java.lang.String idAttribute,
                                   int numShards,
                                   boolean publishBatchWithOrderingKey)
      • PubsubUnboundedSink

        public PubsubUnboundedSink​(PubsubClient.PubsubClientFactory pubsubFactory,
                                   org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic,
                                   java.lang.String timestampAttribute,
                                   java.lang.String idAttribute,
                                   int numShards,
                                   boolean publishBatchWithOrderingKey,
                                   java.lang.String pubsubRootUrl)
      • PubsubUnboundedSink

        public PubsubUnboundedSink​(PubsubClient.PubsubClientFactory pubsubFactory,
                                   org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic,
                                   java.lang.String timestampAttribute,
                                   java.lang.String idAttribute,
                                   int numShards,
                                   boolean publishBatchWithOrderingKey,
                                   int publishBatchSize,
                                   int publishBatchBytes)
      • PubsubUnboundedSink

        public PubsubUnboundedSink​(PubsubClient.PubsubClientFactory pubsubFactory,
                                   org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic,
                                   java.lang.String timestampAttribute,
                                   java.lang.String idAttribute,
                                   int numShards,
                                   boolean publishBatchWithOrderingKey,
                                   int publishBatchSize,
                                   int publishBatchBytes,
                                   java.lang.String pubsubRootUrl)
    • Method Detail

      • getTopicProvider

        public @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> getTopicProvider()
        Get the ValueProvider for the topic being written to.
      • getTimestampAttribute

        public @Nullable java.lang.String getTimestampAttribute()
        Get the timestamp attribute.
      • getIdAttribute

        public @Nullable java.lang.String getIdAttribute()
        Get the id attribute.
      • getPublishBatchWithOrderingKey

        public boolean getPublishBatchWithOrderingKey()
      • expand

        public org.apache.beam.sdk.values.PDone expand​(org.apache.beam.sdk.values.PCollection<PubsubMessage> input)
        Specified by:
        expand in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<PubsubMessage>,​org.apache.beam.sdk.values.PDone>