Packages

package orc

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class OrcArrayColumnVector extends OrcColumnVector

    A column vector implementation for Spark's ArrayType.

  2. class OrcAtomicColumnVector extends OrcColumnVector

    A column vector implementation for Spark's AtomicType.

  3. class OrcColumnStatistics extends AnyRef

    Columns statistics interface wrapping ORC ColumnStatisticss.

    Columns statistics interface wrapping ORC ColumnStatisticss.

    Because ORC ColumnStatisticss are stored as an flatten array in ORC file footer, this class is used to convert ORC ColumnStatisticss from array to nested tree structure, according to data types. The flatten array stores all data types (including nested types) in tree pre-ordering. This is used for aggregate push down in ORC.

    For nested data types (array, map and struct), the sub-field statistics are stored recursively inside parent column's children field. Here is an example of OrcColumnStatistics:

    Data schema: c1: int c2: struct<f1: int, f2: float> c3: map<key: int, value: string> c4: array<int>

    OrcColumnStatistics | (children) --------------------------------------------- / | \ \ c1 c2 c3 c4 (integer) (struct) (map) (array) (min:1, | (children) | (children) | (children) max:10) ----- ----- element / \ / \ (integer) c2.f1 c2.f2 key value (integer) (float) (integer) (string) (min:0.1, (min:"a", max:100.5) max:"zzz")

  4. abstract class OrcColumnVector extends ColumnVector

    A column vector interface wrapping Hive's ColumnVector.

    A column vector interface wrapping Hive's ColumnVector.

    Because Spark ColumnarBatch only accepts Spark's vectorized.ColumnVector, this column vector is used to adapt Hive ColumnVector with Spark ColumnarVector.

  5. class OrcColumnarBatchReader extends RecordReader[Void, ColumnarBatch]

    To support vectorization in WholeStageCodeGen, this reader returns ColumnarBatch.

    To support vectorization in WholeStageCodeGen, this reader returns ColumnarBatch. After creating, initialize and initBatch should be called sequentially.

  6. class OrcDeserializer extends AnyRef

    A deserializer to deserialize ORC structs to Spark rows.

  7. class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable

    New ORC File Format based on Apache ORC.

  8. trait OrcFiltersBase extends AnyRef

    Methods that can be shared when upgrading the built-in Hive.

  9. class OrcFooterReader extends AnyRef

    OrcFooterReader is a util class which encapsulates the helper methods of reading ORC file footer.

  10. class OrcMapColumnVector extends OrcColumnVector

    A column vector implementation for Spark's MapType.

  11. class OrcOptions extends FileSourceOptions

    Options for the ORC data source.

  12. class OrcSerializer extends AnyRef

    A serializer to serialize Spark rows to ORC structs.

  13. class OrcStructColumnVector extends OrcColumnVector

    A column vector implementation for Spark's StructType.

Value Members

  1. object OrcOptions extends DataSourceOptions with Serializable
  2. object OrcUtils extends Logging

Ungrouped