Class PubsubSchemaIOProvider
- java.lang.Object
-
- org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider
-
- All Implemented Interfaces:
org.apache.beam.sdk.schemas.io.SchemaIOProvider
@Internal @AutoService(org.apache.beam.sdk.schemas.io.SchemaIOProvider.class) public class PubsubSchemaIOProvider extends java.lang.Object implements org.apache.beam.sdk.schemas.io.SchemaIOProviderAn implementation ofSchemaIOProviderfor reading and writing JSON/AVRO payloads withPubsubIO.Schema
The data schema passed to
from(String, Row, Schema)must either be of the nested or flat style.Nested style
If nested structure is used, the required fields included in the Pubsub message model are 'event_timestamp', 'attributes', and 'payload'.
Flat style
If flat structure is used, the required fields include just 'event_timestamp'. Every other field is assumed part of the payload. See
PubsubMessageToRowfor details.Configuration
configurationSchema()consists of two attributes, timestampAttributeKey and deadLetterQueue.timestampAttributeKey
An optional attribute key of the Pubsub message from which to extract the event timestamp. If not specified, the message publish time will be used as event timestamp.
This attribute has to conform to the same requirements as in
PubsubIO.Read.withTimestampAttribute(String)Short version: it has to be either millis since epoch or string in RFC 3339 format.
If the attribute is specified then event timestamps will be extracted from the specified attribute. If it is not specified then message publish timestamp will be used.
deadLetterQueue
deadLetterQueue is an optional topic path which will be used as a dead letter queue.
Messages that cannot be processed will be sent to this topic. If it is not specified then exception will be thrown for errors during processing causing the pipeline to crash.
-
-
Field Summary
Fields Modifier and Type Field Description static org.apache.beam.sdk.schemas.SchemaATTRIBUTE_ARRAY_ENTRY_SCHEMAstatic org.apache.beam.sdk.schemas.Schema.FieldTypeATTRIBUTE_ARRAY_FIELD_TYPEstatic org.apache.beam.sdk.schemas.Schema.FieldTypeATTRIBUTE_MAP_FIELD_TYPE
-
Constructor Summary
Constructors Constructor Description PubsubSchemaIOProvider()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.beam.sdk.schemas.SchemaconfigurationSchema()Returns the expected schema of the configuration object.org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIOfrom(java.lang.String location, org.apache.beam.sdk.values.Row configuration, org.apache.beam.sdk.schemas.Schema dataSchema)Produce a SchemaIO given a String representing the data's location, the schema of the data that resides there, and some IO-specific configuration object.java.lang.Stringidentifier()Returns an id that uniquely represents this IO.org.apache.beam.sdk.values.PCollection.IsBoundedisBounded()booleanrequiresDataSchema()
-
-
-
Field Detail
-
ATTRIBUTE_MAP_FIELD_TYPE
public static final org.apache.beam.sdk.schemas.Schema.FieldType ATTRIBUTE_MAP_FIELD_TYPE
-
ATTRIBUTE_ARRAY_ENTRY_SCHEMA
public static final org.apache.beam.sdk.schemas.Schema ATTRIBUTE_ARRAY_ENTRY_SCHEMA
-
ATTRIBUTE_ARRAY_FIELD_TYPE
public static final org.apache.beam.sdk.schemas.Schema.FieldType ATTRIBUTE_ARRAY_FIELD_TYPE
-
-
Method Detail
-
identifier
public java.lang.String identifier()
Returns an id that uniquely represents this IO.- Specified by:
identifierin interfaceorg.apache.beam.sdk.schemas.io.SchemaIOProvider
-
configurationSchema
public org.apache.beam.sdk.schemas.Schema configurationSchema()
Returns the expected schema of the configuration object. Note this is distinct from the schema of the data source itself.- Specified by:
configurationSchemain interfaceorg.apache.beam.sdk.schemas.io.SchemaIOProvider
-
from
public org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIO from(java.lang.String location, org.apache.beam.sdk.values.Row configuration, org.apache.beam.sdk.schemas.Schema dataSchema)Produce a SchemaIO given a String representing the data's location, the schema of the data that resides there, and some IO-specific configuration object.- Specified by:
fromin interfaceorg.apache.beam.sdk.schemas.io.SchemaIOProvider
-
requiresDataSchema
public boolean requiresDataSchema()
- Specified by:
requiresDataSchemain interfaceorg.apache.beam.sdk.schemas.io.SchemaIOProvider
-
isBounded
public org.apache.beam.sdk.values.PCollection.IsBounded isBounded()
- Specified by:
isBoundedin interfaceorg.apache.beam.sdk.schemas.io.SchemaIOProvider
-
-