@Internal
@AutoService(value=org.apache.beam.sdk.schemas.io.SchemaIOProvider.class)
public class PubsubSchemaIOProvider
extends java.lang.Object
implements org.apache.beam.sdk.schemas.io.SchemaIOProvider
SchemaIOProvider for reading and writing JSON/AVRO payloads with
PubsubIO.
The data schema passed to from(String, Row, Schema) must either be of the nested or
flat style.
If nested structure is used, the required fields included in the Pubsub message model are 'event_timestamp', 'attributes', and 'payload'.
If flat structure is used, the required fields include just 'event_timestamp'. Every other
field is assumed part of the payload. See PubsubMessageToRow for details.
configurationSchema() consists of two attributes, timestampAttributeKey and
deadLetterQueue.
An optional attribute key of the Pubsub message from which to extract the event timestamp. If not specified, the message publish time will be used as event timestamp.
This attribute has to conform to the same requirements as in PubsubIO.Read.withTimestampAttribute(String)
Short version: it has to be either millis since epoch or string in RFC 3339 format.
If the attribute is specified then event timestamps will be extracted from the specified attribute. If it is not specified then message publish timestamp will be used.
deadLetterQueue is an optional topic path which will be used as a dead letter queue.
Messages that cannot be processed will be sent to this topic. If it is not specified then exception will be thrown for errors during processing causing the pipeline to crash.
| Modifier and Type | Field and Description |
|---|---|
static org.apache.beam.sdk.schemas.Schema |
ATTRIBUTE_ARRAY_ENTRY_SCHEMA |
static org.apache.beam.sdk.schemas.Schema.FieldType |
ATTRIBUTE_ARRAY_FIELD_TYPE |
static org.apache.beam.sdk.schemas.Schema.FieldType |
ATTRIBUTE_MAP_FIELD_TYPE |
| Constructor and Description |
|---|
PubsubSchemaIOProvider() |
| Modifier and Type | Method and Description |
|---|---|
org.apache.beam.sdk.schemas.Schema |
configurationSchema()
Returns the expected schema of the configuration object.
|
org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIO |
from(java.lang.String location,
org.apache.beam.sdk.values.Row configuration,
org.apache.beam.sdk.schemas.Schema dataSchema)
Produce a SchemaIO given a String representing the data's location, the schema of the data that
resides there, and some IO-specific configuration object.
|
java.lang.String |
identifier()
Returns an id that uniquely represents this IO.
|
org.apache.beam.sdk.values.PCollection.IsBounded |
isBounded() |
boolean |
requiresDataSchema() |
public static final org.apache.beam.sdk.schemas.Schema.FieldType ATTRIBUTE_MAP_FIELD_TYPE
public static final org.apache.beam.sdk.schemas.Schema ATTRIBUTE_ARRAY_ENTRY_SCHEMA
public static final org.apache.beam.sdk.schemas.Schema.FieldType ATTRIBUTE_ARRAY_FIELD_TYPE
public java.lang.String identifier()
identifier in interface org.apache.beam.sdk.schemas.io.SchemaIOProviderpublic org.apache.beam.sdk.schemas.Schema configurationSchema()
configurationSchema in interface org.apache.beam.sdk.schemas.io.SchemaIOProviderpublic org.apache.beam.sdk.io.gcp.pubsub.PubsubSchemaIOProvider.PubsubSchemaIO from(java.lang.String location,
org.apache.beam.sdk.values.Row configuration,
org.apache.beam.sdk.schemas.Schema dataSchema)
from in interface org.apache.beam.sdk.schemas.io.SchemaIOProviderpublic boolean requiresDataSchema()
requiresDataSchema in interface org.apache.beam.sdk.schemas.io.SchemaIOProviderpublic org.apache.beam.sdk.values.PCollection.IsBounded isBounded()
isBounded in interface org.apache.beam.sdk.schemas.io.SchemaIOProvider