Class PubsubIO


  • public class PubsubIO
    extends java.lang.Object
    Read and Write PTransforms for Cloud Pub/Sub streams. These transforms create and consume unbounded PCollections.

    Using local emulator

    In order to use local emulator for Pubsub you should use PubsubOptions#setPubsubRootUrl(String) method to set host and port of your local emulator.

    Permissions

    Permission requirements depend on the PipelineRunner that is used to execute the Beam pipeline. Please refer to the documentation of corresponding PipelineRunners for more details.

    Updates to the I/O connector code

    For any significant updates to this I/O connector, please consider involving corresponding code reviewers mentioned here.

    Example PubsubIO read usage

    
     // Read from a specific topic; a subscription will be created at pipeline start time.
     PCollection<PubsubMessage> messages = PubsubIO.readMessages().fromTopic(topic);
    
     // Read from a subscription.
     PCollection<PubsubMessage> messages = PubsubIO.readMessages().fromSubscription(subscription);
    
     // Read messages including attributes. All PubSub attributes will be included in the PubsubMessage.
     PCollection<PubsubMessage> messages = PubsubIO.readMessagesWithAttributes().fromTopic(topic);
    
     // Examples of reading different types from PubSub.
     PCollection<String> strings = PubsubIO.readStrings().fromTopic(topic);
     PCollection<MyProto> protos = PubsubIO.readProtos(MyProto.class).fromTopic(topic);
     PCollection<MyType> avros = PubsubIO.readAvros(MyType.class).fromTopic(topic);
    
     

    Example PubsubIO write usage

    Data can be written to a single topic or to a dynamic set of topics. In order to write to a single topic, the PubsubIO.Write.to(String) method can be used. For example:
    
     avros.apply(PubsubIO.writeAvros(MyType.class).to(topic));
     protos.apply(PubsubIO.writeProtos(MyProto.class).to(topic));
     strings.apply(PubsubIO.writeStrings().to(topic));
     
    Dynamic topic destinations can be accomplished by specifying a function to extract the topic from the record using the PubsubIO.Write.to(SerializableFunction) method. For example:
    
     avros.apply(PubsubIO.writeAvros(MyType.class).
          to((ValueInSingleWindow<Event> quote) -> {
                   String country = quote.getCountry();
                   return "projects/myproject/topics/events_" + country;
                  });
     
    Dynamic topics can also be specified by writing PubsubMessage objects containing the topic and writing using the writeMessagesDynamic() method. For example:
    
     events.apply(MapElements.into(new TypeDescriptor<PubsubMessage>() {})
                             .via(e -> new PubsubMessage(
                                 e.toByteString(), Collections.emptyMap()).withTopic(e.getCountry())))
     .apply(PubsubIO.writeMessagesDynamic());
     

    Custom timestamps

    All messages read from PubSub have a stable publish timestamp that is independent of when the message is read from the PubSub topic. By default, the publish time is used as the timestamp for all messages read and the watermark is based on that. If there is a different logical timestamp to be used, that timestamp must be published in a PubSub attribute and specified using PubsubIO.Read.withTimestampAttribute(java.lang.String). See the Javadoc for that method for the timestamp format.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static PubsubIO.Read<org.apache.avro.generic.GenericRecord> readAvroGenericRecords​(org.apache.avro.Schema avroSchema)
      Returns a PTransform that continuously reads binary encoded Avro messages into the Avro GenericRecord type.
      static <T> PubsubIO.Read<T> readAvros​(java.lang.Class<T> clazz)
      Returns A PTransform that continuously reads binary encoded Avro messages of the given type from a Google Cloud Pub/Sub stream.
      static <T> PubsubIO.Read<T> readAvrosWithBeamSchema​(java.lang.Class<T> clazz)
      Returns a PTransform that continuously reads binary encoded Avro messages of the specific type.
      static PubsubIO.Read<PubsubMessage> readMessages()
      Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream.
      static PubsubIO.Read<PubsubMessage> readMessagesWithAttributes()
      Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream.
      static PubsubIO.Read<PubsubMessage> readMessagesWithAttributesAndMessageId()
      Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream.
      static PubsubIO.Read<PubsubMessage> readMessagesWithAttributesAndMessageIdAndOrderingKey()
      Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream.
      static <T> PubsubIO.Read<T> readMessagesWithAttributesWithCoderAndParseFn​(org.apache.beam.sdk.coders.Coder<T> coder, org.apache.beam.sdk.transforms.SimpleFunction<PubsubMessage,​T> parseFn)
      Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream, mapping each PubsubMessage, with attributes, into type T using the supplied parse function and coder.
      static <T> PubsubIO.Read<T> readMessagesWithCoderAndParseFn​(org.apache.beam.sdk.coders.Coder<T> coder, org.apache.beam.sdk.transforms.SimpleFunction<PubsubMessage,​T> parseFn)
      Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream, mapping each PubsubMessage into type T using the supplied parse function and coder.
      static PubsubIO.Read<PubsubMessage> readMessagesWithMessageId()
      Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream.
      static PubsubIO.Read<com.google.protobuf.DynamicMessage> readProtoDynamicMessages​(com.google.protobuf.Descriptors.Descriptor descriptor)
      Similar to readProtoDynamicMessages(ProtoDomain, String) but for when the Descriptors.Descriptor is already known.
      static PubsubIO.Read<com.google.protobuf.DynamicMessage> readProtoDynamicMessages​(org.apache.beam.sdk.extensions.protobuf.ProtoDomain domain, java.lang.String fullMessageName)
      Returns a PTransform that continuously reads binary encoded protobuf messages for the type specified by fullMessageName.
      static <T extends com.google.protobuf.Message>
      PubsubIO.Read<T>
      readProtos​(java.lang.Class<T> messageClass)
      Returns A PTransform that continuously reads binary encoded protobuf messages of the given type from a Google Cloud Pub/Sub stream.
      static PubsubIO.Read<java.lang.String> readStrings()
      Returns A PTransform that continuously reads UTF-8 encoded strings from a Google Cloud Pub/Sub stream.
      static <T> PubsubIO.Write<T> writeAvros​(java.lang.Class<T> clazz)
      Returns A PTransform that writes binary encoded Avro messages of a given type to a Google Cloud Pub/Sub stream.
      static <T> PubsubIO.Write<T> writeAvros​(java.lang.Class<T> clazz, org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.ValueInSingleWindow<T>,​java.util.Map<java.lang.String,​java.lang.String>> attributeFn)
      Returns A PTransform that writes binary encoded Avro messages of a given type to a Google Cloud Pub/Sub stream.
      static PubsubIO.Write<PubsubMessage> writeMessages()
      Returns A PTransform that writes to a Google Cloud Pub/Sub stream.
      static PubsubIO.Write<PubsubMessage> writeMessagesDynamic()
      Enables dynamic destination topics.
      static <T extends com.google.protobuf.Message>
      PubsubIO.Write<T>
      writeProtos​(java.lang.Class<T> messageClass)
      Returns A PTransform that writes binary encoded protobuf messages of a given type to a Google Cloud Pub/Sub stream.
      static <T extends com.google.protobuf.Message>
      PubsubIO.Write<T>
      writeProtos​(java.lang.Class<T> messageClass, org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.ValueInSingleWindow<T>,​java.util.Map<java.lang.String,​java.lang.String>> attributeFn)
      Returns A PTransform that writes binary encoded protobuf messages of a given type to a Google Cloud Pub/Sub stream.
      static PubsubIO.Write<java.lang.String> writeStrings()
      Returns A PTransform that writes UTF-8 encoded strings to a Google Cloud Pub/Sub stream.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • ENABLE_CUSTOM_PUBSUB_SINK

        public static final java.lang.String ENABLE_CUSTOM_PUBSUB_SINK
        See Also:
        Constant Field Values
      • ENABLE_CUSTOM_PUBSUB_SOURCE

        public static final java.lang.String ENABLE_CUSTOM_PUBSUB_SOURCE
        See Also:
        Constant Field Values
    • Method Detail

      • readMessages

        public static PubsubIO.Read<PubsubMessage> readMessages()
        Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream. The messages will only contain a payload, but no attributes.
      • readMessagesWithMessageId

        public static PubsubIO.Read<PubsubMessage> readMessagesWithMessageId()
        Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream. The messages will only contain a payload with the messageId from PubSub, but no attributes.
      • readMessagesWithAttributes

        public static PubsubIO.Read<PubsubMessage> readMessagesWithAttributes()
        Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream. The messages will contain both a payload and attributes.
      • readMessagesWithAttributesAndMessageId

        public static PubsubIO.Read<PubsubMessage> readMessagesWithAttributesAndMessageId()
        Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream. The messages will contain both a payload and attributes, along with the messageId from PubSub.
      • readMessagesWithAttributesAndMessageIdAndOrderingKey

        public static PubsubIO.Read<PubsubMessage> readMessagesWithAttributesAndMessageIdAndOrderingKey()
        Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream. The messages will contain a payload, attributes, along with the messageId and {PubsubMessage#getOrderingKey() orderingKey} from PubSub.
      • readStrings

        public static PubsubIO.Read<java.lang.String> readStrings()
        Returns A PTransform that continuously reads UTF-8 encoded strings from a Google Cloud Pub/Sub stream.
      • readProtos

        public static <T extends com.google.protobuf.Message> PubsubIO.Read<T> readProtos​(java.lang.Class<T> messageClass)
        Returns A PTransform that continuously reads binary encoded protobuf messages of the given type from a Google Cloud Pub/Sub stream.
      • readProtoDynamicMessages

        public static PubsubIO.Read<com.google.protobuf.DynamicMessage> readProtoDynamicMessages​(org.apache.beam.sdk.extensions.protobuf.ProtoDomain domain,
                                                                                                 java.lang.String fullMessageName)
        Returns a PTransform that continuously reads binary encoded protobuf messages for the type specified by fullMessageName.

        This is primarily here for cases where the message type cannot be known at compile time. If it can be known, prefer readProtos(Class), as DynamicMessage tends to perform worse than concrete types.

        Beam will infer a schema for the DynamicMessage schema. Note that some proto schema features are not supported by all sinks.

        Parameters:
        domain - The ProtoDomain that contains the target message and its dependencies.
        fullMessageName - The full name of the message for lookup in domain.
      • readProtoDynamicMessages

        public static PubsubIO.Read<com.google.protobuf.DynamicMessage> readProtoDynamicMessages​(com.google.protobuf.Descriptors.Descriptor descriptor)
        Similar to readProtoDynamicMessages(ProtoDomain, String) but for when the Descriptors.Descriptor is already known.
      • readAvros

        public static <T> PubsubIO.Read<T> readAvros​(java.lang.Class<T> clazz)
        Returns A PTransform that continuously reads binary encoded Avro messages of the given type from a Google Cloud Pub/Sub stream.
      • readMessagesWithCoderAndParseFn

        public static <T> PubsubIO.Read<T> readMessagesWithCoderAndParseFn​(org.apache.beam.sdk.coders.Coder<T> coder,
                                                                           org.apache.beam.sdk.transforms.SimpleFunction<PubsubMessage,​T> parseFn)
        Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream, mapping each PubsubMessage into type T using the supplied parse function and coder.
      • readMessagesWithAttributesWithCoderAndParseFn

        public static <T> PubsubIO.Read<T> readMessagesWithAttributesWithCoderAndParseFn​(org.apache.beam.sdk.coders.Coder<T> coder,
                                                                                         org.apache.beam.sdk.transforms.SimpleFunction<PubsubMessage,​T> parseFn)
        Returns A PTransform that continuously reads from a Google Cloud Pub/Sub stream, mapping each PubsubMessage, with attributes, into type T using the supplied parse function and coder. Similar to readMessagesWithCoderAndParseFn(Coder, SimpleFunction), but with the with addition of making the message attributes available to the ParseFn.
      • readAvroGenericRecords

        public static PubsubIO.Read<org.apache.avro.generic.GenericRecord> readAvroGenericRecords​(org.apache.avro.Schema avroSchema)
        Returns a PTransform that continuously reads binary encoded Avro messages into the Avro GenericRecord type.

        Beam will infer a schema for the Avro schema. This allows the output to be used by SQL and by the schema-transform library.

      • readAvrosWithBeamSchema

        public static <T> PubsubIO.Read<T> readAvrosWithBeamSchema​(java.lang.Class<T> clazz)
        Returns a PTransform that continuously reads binary encoded Avro messages of the specific type.

        Beam will infer a schema for the Avro schema. This allows the output to be used by SQL and by the schema-transform library.

      • writeMessages

        public static PubsubIO.Write<PubsubMessage> writeMessages()
        Returns A PTransform that writes to a Google Cloud Pub/Sub stream.
      • writeStrings

        public static PubsubIO.Write<java.lang.String> writeStrings()
        Returns A PTransform that writes UTF-8 encoded strings to a Google Cloud Pub/Sub stream.
      • writeProtos

        public static <T extends com.google.protobuf.Message> PubsubIO.Write<T> writeProtos​(java.lang.Class<T> messageClass)
        Returns A PTransform that writes binary encoded protobuf messages of a given type to a Google Cloud Pub/Sub stream.
      • writeProtos

        public static <T extends com.google.protobuf.Message> PubsubIO.Write<T> writeProtos​(java.lang.Class<T> messageClass,
                                                                                            org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.ValueInSingleWindow<T>,​java.util.Map<java.lang.String,​java.lang.String>> attributeFn)
        Returns A PTransform that writes binary encoded protobuf messages of a given type to a Google Cloud Pub/Sub stream.
      • writeAvros

        public static <T> PubsubIO.Write<T> writeAvros​(java.lang.Class<T> clazz)
        Returns A PTransform that writes binary encoded Avro messages of a given type to a Google Cloud Pub/Sub stream.
      • writeAvros

        public static <T> PubsubIO.Write<T> writeAvros​(java.lang.Class<T> clazz,
                                                       org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.ValueInSingleWindow<T>,​java.util.Map<java.lang.String,​java.lang.String>> attributeFn)
        Returns A PTransform that writes binary encoded Avro messages of a given type to a Google Cloud Pub/Sub stream.