T - The type of the returned elements.@PublicEvolving
public class CsvReaderFormat<T>
extends org.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>
StreamFormat for reading CSV files.
The following example shows how to create a CsvReaderFormat where the schema for CSV
parsing is automatically derived based on the fields of a POJO class.
CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class);
FileSource<SomePojo> source =
FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(filesPath)).build();
Note: you might need to add @JsonPropertyOrder({field1, field2, ...}) annotation from
the Jackson library to your class definition with the fields order exactly matching those
of the CSV file columns).
If you need more fine-grained control over the CSV schema or the parsing options, use the more
low-level forSchema static factory method based on the Jackson library utilities:
Function<CsvMapper, CsvSchema> schemaGenerator =
mapper -> mapper.schemaFor(SomePojo.class)
.withColumnSeparator('|');
CsvReaderFormat<SomePojo> csvFormat =
CsvReaderFormat.forSchema(() -> new CsvMapper(), schemaGenerator, TypeInformation.of(SomePojo.class));
FileSource<SomePojo> source =
FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(filesPath)).build();
| Modifier and Type | Method and Description |
|---|---|
org.apache.flink.connector.file.src.reader.StreamFormat.Reader<T> |
createReader(org.apache.flink.configuration.Configuration config,
org.apache.flink.core.fs.FSDataInputStream stream) |
static <T> CsvReaderFormat<T> |
forPojo(Class<T> pojoType)
Builds a new
CsvReaderFormat for reading CSV files mapped to the provided POJO class
definition. |
static <T> CsvReaderFormat<T> |
forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper mapper,
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema,
org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
Deprecated.
This method is limited to serializable
CsvMappers, preventing
the usage of certain Jackson modules (like the Java 8 Date/Time Serializers). Use #forSchema(Supplier, Function,
TypeInformation) instead. |
static <T> CsvReaderFormat<T> |
forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema,
org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
Builds a new
CsvReaderFormat using a CsvSchema. |
static <T> CsvReaderFormat<T> |
forSchema(org.apache.flink.util.function.SerializableSupplier<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper> mapperFactory,
org.apache.flink.util.function.SerializableFunction<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper,org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema> schemaGenerator,
org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
Builds a new
CsvReaderFormat using a CsvSchema generator and CsvMapper factory. |
org.apache.flink.api.common.typeinfo.TypeInformation<T> |
getProducedType() |
CsvReaderFormat<T> |
withIgnoreParseErrors()
Returns a new
CsvReaderFormat configured to ignore all parsing errors. |
public static <T> CsvReaderFormat<T> forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema, org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
CsvReaderFormat using a CsvSchema.T - The type of the returned elements.schema - The Jackson CSV schema configured for parsing specific CSV files.typeInformation - The Flink type descriptor of the returned elements.@Deprecated public static <T> CsvReaderFormat<T> forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper mapper, org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema, org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
CsvMappers, preventing
the usage of certain Jackson modules (like the Java 8 Date/Time Serializers). Use #forSchema(Supplier, Function,
TypeInformation) instead.public static <T> CsvReaderFormat<T> forSchema(org.apache.flink.util.function.SerializableSupplier<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper> mapperFactory, org.apache.flink.util.function.SerializableFunction<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper,org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema> schemaGenerator, org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
CsvReaderFormat using a CsvSchema generator and CsvMapper factory.T - The type of the returned elements.mapperFactory - The factory creating the CsvMapper.schemaGenerator - A generator that creates and configures the Jackson CSV schema for
parsing specific CSV files, from a mapper created by the mapper factory.typeInformation - The Flink type descriptor of the returned elements.public static <T> CsvReaderFormat<T> forPojo(Class<T> pojoType)
CsvReaderFormat for reading CSV files mapped to the provided POJO class
definition. Produced reader uses default mapper and schema settings, use forSchema if
you need customizations.T - The type of the returned elements.pojoType - The type class of the POJO.public CsvReaderFormat<T> withIgnoreParseErrors()
CsvReaderFormat configured to ignore all parsing errors. All the other
options directly carried over from the subject of the method call.public org.apache.flink.connector.file.src.reader.StreamFormat.Reader<T> createReader(org.apache.flink.configuration.Configuration config, org.apache.flink.core.fs.FSDataInputStream stream) throws IOException
createReader in class org.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>IOExceptionpublic org.apache.flink.api.common.typeinfo.TypeInformation<T> getProducedType()
getProducedType in interface org.apache.flink.api.java.typeutils.ResultTypeQueryable<T>getProducedType in interface org.apache.flink.connector.file.src.reader.StreamFormat<T>getProducedType in class org.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.