package vectorized
Type Members
-
final
class
ArrowColumnVector extends ColumnVector
A column vector backed by Apache Arrow.
A column vector backed by Apache Arrow. Currently calendar interval type and map type are not supported.
-
abstract
class
ColumnVector extends AutoCloseable
An interface representing in-memory columnar data in Spark.
An interface representing in-memory columnar data in Spark. This interface defines the main APIs to access the data, as well as their batched versions. The batched versions are considered to be faster and preferable whenever possible.
Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values in this ColumnVector.
Spark only calls specific
getmethod according to the data type of thisColumnVector, e.g. if it's int type, Spark is guaranteed to only call#getInt(int)orint).ColumnVector supports all the data types including nested types. To handle nested types, ColumnVector can have children and is a tree structure. Please refer to
#getStruct(int),#getArray(int)and#getMap(int)for the details about how to implement nested types.ColumnVector is expected to be reused during the entire data loading process, to avoid allocating memory again and again.
ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint. Implementations should prefer computing efficiency over storage efficiency when design the format. Since it is expected to reuse the ColumnVector instance while loading data, the storage footprint is negligible.
-
final
class
ColumnarArray extends ArrayData
Array abstraction in
ColumnVector. -
final
class
ColumnarBatch extends AutoCloseable
This class wraps multiple ColumnVectors as a row-wise table.
This class wraps multiple ColumnVectors as a row-wise table. It provides a row view of this batch so that Spark can access the data row by row. Instance of it is meant to be reused during the entire data loading process.
-
final
class
ColumnarMap extends MapData
Map abstraction in
ColumnVector. -
final
class
ColumnarRow extends InternalRow
Row abstraction in
ColumnVector.