Class PubsubSchemaIOProvider

  • All Implemented Interfaces:
    org.apache.beam.sdk.schemas.io.SchemaIOProvider

    @Internal
    @AutoService(org.apache.beam.sdk.schemas.io.SchemaIOProvider.class)
    public class PubsubSchemaIOProvider
    extends java.lang.Object
    implements org.apache.beam.sdk.schemas.io.SchemaIOProvider
    An implementation of SchemaIOProvider for reading and writing JSON/AVRO payloads with PubsubIO.

    Schema

    The data schema passed to from(String, Row, Schema) must either be of the nested or flat style.

    Nested style

    If nested structure is used, the required fields included in the Pubsub message model are 'event_timestamp', 'attributes', and 'payload'.

    Flat style

    If flat structure is used, the required fields include just 'event_timestamp'. Every other field is assumed part of the payload. See PubsubMessageToRow for details.

    Configuration

    configurationSchema() consists of two attributes, timestampAttributeKey and deadLetterQueue.

    timestampAttributeKey

    An optional attribute key of the Pubsub message from which to extract the event timestamp. If not specified, the message publish time will be used as event timestamp.

    This attribute has to conform to the same requirements as in PubsubIO.Read.withTimestampAttribute(String)

    Short version: it has to be either millis since epoch or string in RFC 3339 format.

    If the attribute is specified then event timestamps will be extracted from the specified attribute. If it is not specified then message publish timestamp will be used.

    deadLetterQueue

    deadLetterQueue is an optional topic path which will be used as a dead letter queue.

    Messages that cannot be processed will be sent to this topic. If it is not specified then exception will be thrown for errors during processing causing the pipeline to crash.

    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.beam.sdk.schemas.Schema configurationSchema()
      Returns the expected schema of the configuration object.
      org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIO from​(java.lang.String location, org.apache.beam.sdk.values.Row configuration, org.apache.beam.sdk.schemas.Schema dataSchema)
      Produce a SchemaIO given a String representing the data's location, the schema of the data that resides there, and some IO-specific configuration object.
      java.lang.String identifier()
      Returns an id that uniquely represents this IO.
      org.apache.beam.sdk.values.PCollection.IsBounded isBounded()  
      boolean requiresDataSchema()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • ATTRIBUTE_MAP_FIELD_TYPE

        public static final org.apache.beam.sdk.schemas.Schema.FieldType ATTRIBUTE_MAP_FIELD_TYPE
      • ATTRIBUTE_ARRAY_ENTRY_SCHEMA

        public static final org.apache.beam.sdk.schemas.Schema ATTRIBUTE_ARRAY_ENTRY_SCHEMA
      • ATTRIBUTE_ARRAY_FIELD_TYPE

        public static final org.apache.beam.sdk.schemas.Schema.FieldType ATTRIBUTE_ARRAY_FIELD_TYPE
    • Constructor Detail

      • PubsubSchemaIOProvider

        public PubsubSchemaIOProvider()
    • Method Detail

      • identifier

        public java.lang.String identifier()
        Returns an id that uniquely represents this IO.
        Specified by:
        identifier in interface org.apache.beam.sdk.schemas.io.SchemaIOProvider
      • configurationSchema

        public org.apache.beam.sdk.schemas.Schema configurationSchema()
        Returns the expected schema of the configuration object. Note this is distinct from the schema of the data source itself.
        Specified by:
        configurationSchema in interface org.apache.beam.sdk.schemas.io.SchemaIOProvider
      • from

        public org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIO from​(java.lang.String location,
                                                                                            org.apache.beam.sdk.values.Row configuration,
                                                                                            org.apache.beam.sdk.schemas.Schema dataSchema)
        Produce a SchemaIO given a String representing the data's location, the schema of the data that resides there, and some IO-specific configuration object.
        Specified by:
        from in interface org.apache.beam.sdk.schemas.io.SchemaIOProvider
      • requiresDataSchema

        public boolean requiresDataSchema()
        Specified by:
        requiresDataSchema in interface org.apache.beam.sdk.schemas.io.SchemaIOProvider
      • isBounded

        public org.apache.beam.sdk.values.PCollection.IsBounded isBounded()
        Specified by:
        isBounded in interface org.apache.beam.sdk.schemas.io.SchemaIOProvider