abstract class RowBasedKeyValueBatch extends MemoryConsumer with Closeable
RowBasedKeyValueBatch stores key value pairs in contiguous memory region.
Each key or value is stored as a single UnsafeRow. Each record contains one key and one value
and some auxiliary data, which differs based on implementation:
i.e., FixedLengthRowBasedKeyValueBatch and VariableLengthRowBasedKeyValueBatch.
We use FixedLengthRowBasedKeyValueBatch if all fields in the key and the value are fixed-length
data types. Otherwise we use VariableLengthRowBasedKeyValueBatch.
RowBasedKeyValueBatch is backed by a single page / MemoryBlock (ranges from 1 to 64MB depending on the system configuration). If the page is full, the aggregate logic should fallback to a second level, larger hash map. We intentionally use the single-page design because it simplifies memory address encoding & decoding for each key-value pair. Because the maximum capacity for RowBasedKeyValueBatch is only 2^16, it is unlikely we need a second page anyway. Filling the page requires an average size for key value pairs to be larger than 1024 bytes.
- Alphabetic
- By Inheritance
- RowBasedKeyValueBatch
- Closeable
- AutoCloseable
- MemoryConsumer
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
RowBasedKeyValueBatch(keySchema: StructType, valueSchema: StructType, maxRows: Int, manager: TaskMemoryManager)
- Attributes
- protected[expressions]
Abstract Value Members
-
abstract
def
appendRow(kbase: Any, koff: Long, klen: Int, vbase: Any, voff: Long, vlen: Int): UnsafeRow
Append a key value pair.
Append a key value pair. It copies data into the backing MemoryBlock. Returns an UnsafeRow pointing to the value if succeeds, otherwise returns null.
-
abstract
def
getKeyRow(rowId: Int): UnsafeRow
Returns the key row in this batch at
rowId.Returns the key row in this batch at
rowId. Returned key row is reused across calls. -
abstract
def
getValueFromKey(rowId: Int): UnsafeRow
Returns the value row by two steps: 1) looking up the key row with the same id (skipped if the key row is cached) 2) retrieve the value row by reusing the metadata from step 1) In most times, 1) is skipped because
getKeyRow(id)is often called beforegetValueRow(id).Returns the value row by two steps: 1) looking up the key row with the same id (skipped if the key row is cached) 2) retrieve the value row by reusing the metadata from step 1) In most times, 1) is skipped because
getKeyRow(id)is often called beforegetValueRow(id).- Attributes
- protected[expressions]
-
abstract
def
rowIterator(): KVIterator[UnsafeRow, UnsafeRow]
Returns an iterator to go through all rows
Concrete Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
acquireMemory(arg0: Long): Long
- Definition Classes
- MemoryConsumer
-
def
allocateArray(arg0: Long): LongArray
- Definition Classes
- MemoryConsumer
-
def
allocatePage(arg0: Long): MemoryBlock
- Attributes
- protected[memory]
- Definition Classes
- MemoryConsumer
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
close(): Unit
- Definition Classes
- RowBasedKeyValueBatch → Closeable → AutoCloseable
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
freeArray(arg0: LongArray): Unit
- Definition Classes
- MemoryConsumer
-
def
freeMemory(arg0: Long): Unit
- Definition Classes
- MemoryConsumer
-
def
freePage(arg0: MemoryBlock): Unit
- Attributes
- protected[memory]
- Definition Classes
- MemoryConsumer
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getMode(): MemoryMode
- Definition Classes
- MemoryConsumer
-
def
getUsed(): Long
- Definition Classes
- MemoryConsumer
-
final
def
getValueRow(rowId: Int): UnsafeRow
Returns the value row in this batch at
rowId.Returns the value row in this batch at
rowId. Returned value row is reused across calls. BecausegetValueRow(id)is always called aftergetKeyRow(id)with the same id, we usegetValueFromKey(id) to retrieve value row, which reuses metadata from the cached key. -
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def numRows(): Int
-
final
def
spill(size: Long, trigger: MemoryConsumer): Long
Sometimes the TaskMemoryManager may call spill() on its associated MemoryConsumers to make space for new consumers.
Sometimes the TaskMemoryManager may call spill() on its associated MemoryConsumers to make space for new consumers. For RowBasedKeyValueBatch, we do not actually spill and return 0. We should not throw OutOfMemory exception here because other associated consumers might spill
- Definition Classes
- RowBasedKeyValueBatch → MemoryConsumer
-
def
spill(): Unit
- Definition Classes
- MemoryConsumer
- Annotations
- @throws( classOf[java.io.IOException] )
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()