Packages

p

org.apache.spark.sql

vectorized

package vectorized

Type Members

  1. final class ArrowColumnVector extends ColumnVector

    A column vector backed by Apache Arrow.

    A column vector backed by Apache Arrow. Currently calendar interval type and map type are not supported.

  2. abstract class ColumnVector extends AutoCloseable

    An interface representing in-memory columnar data in Spark.

    An interface representing in-memory columnar data in Spark. This interface defines the main APIs to access the data, as well as their batched versions. The batched versions are considered to be faster and preferable whenever possible.

    Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values in this ColumnVector.

    Spark only calls specific get method according to the data type of this ColumnVector, e.g. if it's int type, Spark is guaranteed to only call #getInt(int) or int).

    ColumnVector supports all the data types including nested types. To handle nested types, ColumnVector can have children and is a tree structure. Please refer to #getStruct(int), #getArray(int) and #getMap(int) for the details about how to implement nested types.

    ColumnVector is expected to be reused during the entire data loading process, to avoid allocating memory again and again.

    ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint. Implementations should prefer computing efficiency over storage efficiency when design the format. Since it is expected to reuse the ColumnVector instance while loading data, the storage footprint is negligible.

  3. final class ColumnarArray extends ArrayData

    Array abstraction in ColumnVector.

  4. final class ColumnarBatch extends AutoCloseable

    This class wraps multiple ColumnVectors as a row-wise table.

    This class wraps multiple ColumnVectors as a row-wise table. It provides a row view of this batch so that Spark can access the data row by row. Instance of it is meant to be reused during the entire data loading process.

  5. final class ColumnarMap extends MapData

    Map abstraction in ColumnVector.

  6. final class ColumnarRow extends InternalRow

    Row abstraction in ColumnVector.

Ungrouped