package partitioning
Type Members
-
class
ClusteredDistribution extends Distribution
A concrete implementation of
Distribution.A concrete implementation of
Distribution. Represents a distribution where records that share the same values for the#clusteredColumnswill be produced by the samePartitionReader.- Since
3.0.0
-
trait
Distribution extends AnyRef
An interface to represent data distribution requirement, which specifies how the records should be distributed among the data partitions (one
PartitionReaderoutputs data for one partition).An interface to represent data distribution requirement, which specifies how the records should be distributed among the data partitions (one
PartitionReaderoutputs data for one partition). Note that this interface has nothing to do with the data ordering inside one partition(the output records of a singlePartitionReader).The instance of this interface is created and provided by Spark, then consumed by
Partitioning#satisfy(Distribution). This means data source developers don't need to implement this interface, but need to catch as more concrete implementations of this interface as possible inPartitioning#satisfy(Distribution).Concrete implementations until now:
- Since
3.0.0
-
trait
Partitioning extends AnyRef
An interface to represent the output data partitioning for a data source, which is returned by
SupportsReportPartitioning#outputPartitioning().An interface to represent the output data partitioning for a data source, which is returned by
SupportsReportPartitioning#outputPartitioning(). Note that this should work like a snapshot. Once created, it should be deterministic and always report the same number of partitions and the same "satisfy" result for a certain distribution.- Since
3.0.0