Class PubsubUnboundedSource

  • All Implemented Interfaces:
    java.io.Serializable, org.apache.beam.sdk.transforms.display.HasDisplayData

    public class PubsubUnboundedSource
    extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<PubsubMessage>>
    Users should use PubsubIO#read instead.

    A PTransform which streams messages from Pubsub.

    • The underlying implementation in an UnboundedSource which receives messages in batches and hands them out one at a time.
    • The watermark (either in Pubsub processing time or custom timestamp time) is estimated by keeping track of the minimum of the last minutes worth of messages. This assumes Pubsub delivers the oldest (in Pubsub processing time) available message at least once a minute, and that custom timestamps are 'mostly' monotonic with Pubsub processing time. Unfortunately both of those assumptions are fragile. Thus the estimated watermark may get ahead of the 'true' watermark and cause some messages to be late.
    • Checkpoints are used both to ACK received messages back to Pubsub (so that they may be retired on the Pubsub end), and to NACK already consumed messages should a checkpoint need to be restored (so that Pubsub will resend those messages promptly).
    • The backlog is determined by each reader using the messages which have been pulled from Pubsub but not yet consumed downstream. The backlog does not take account of any messages queued by Pubsub for the subscription. Unfortunately there is currently no API to determine the size of the Pubsub queue's backlog.
    • The subscription must already exist.
    • The subscription timeout is read whenever a reader is started. However it is not checked thereafter despite the timeout being user-changeable on-the-fly.
    • We log vital stats every 30 seconds.
    • Though some background threads may be used by the underlying transport all Pubsub calls are blocking. We rely on the underlying runner to allow multiple UnboundedSource.UnboundedReader instances to execute concurrently and thus hide latency.
    See Also:
    Serialized Form
    • Constructor Detail

      • PubsubUnboundedSource

        public PubsubUnboundedSource​(PubsubClient.PubsubClientFactory pubsubFactory,
                                     @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.ProjectPath> project,
                                     @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic,
                                     @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.SubscriptionPath> subscription,
                                     @Nullable java.lang.String timestampAttribute,
                                     @Nullable java.lang.String idAttribute,
                                     boolean needsAttributes)
        Construct an unbounded source to consume from the Pubsub subscription.
      • PubsubUnboundedSource

        public PubsubUnboundedSource​(com.google.api.client.util.Clock clock,
                                     PubsubClient.PubsubClientFactory pubsubFactory,
                                     @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.ProjectPath> project,
                                     @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic,
                                     @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.SubscriptionPath> subscription,
                                     @Nullable java.lang.String timestampAttribute,
                                     @Nullable java.lang.String idAttribute,
                                     boolean needsAttributes)
        Construct an unbounded source to consume from the Pubsub subscription.
      • PubsubUnboundedSource

        public PubsubUnboundedSource​(PubsubClient.PubsubClientFactory pubsubFactory,
                                     @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.ProjectPath> project,
                                     @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic,
                                     @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.SubscriptionPath> subscription,
                                     @Nullable java.lang.String timestampAttribute,
                                     @Nullable java.lang.String idAttribute,
                                     boolean needsAttributes,
                                     boolean needsMessageId)
        Construct an unbounded source to consume from the Pubsub subscription.
    • Method Detail

      • getTopicProvider

        public @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> getTopicProvider()
        Get the ValueProvider for the topic being read from.
      • getSubscriptionProvider

        public @Nullable org.apache.beam.sdk.options.ValueProvider<PubsubClient.SubscriptionPath> getSubscriptionProvider()
        Get the ValueProvider for the subscription being read from.
      • getTimestampAttribute

        public @Nullable java.lang.String getTimestampAttribute()
        Get the timestamp attribute.
      • getIdAttribute

        public @Nullable java.lang.String getIdAttribute()
        Get the id attribute.
      • getNeedsAttributes

        public boolean getNeedsAttributes()
      • getNeedsMessageId

        public boolean getNeedsMessageId()
      • getNeedsOrderingKey

        public boolean getNeedsOrderingKey()
      • expand

        public org.apache.beam.sdk.values.PCollection<PubsubMessage> expand​(org.apache.beam.sdk.values.PBegin input)
        Specified by:
        expand in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<PubsubMessage>>