Packages

package partitioning

Type Members

  1. class ClusteredDistribution extends Distribution

    A concrete implementation of Distribution.

    A concrete implementation of Distribution. Represents a distribution where records that share the same values for the #clusteredColumns will be produced by the same PartitionReader.

    Since

    3.0.0

  2. trait Distribution extends AnyRef

    An interface to represent data distribution requirement, which specifies how the records should be distributed among the data partitions (one PartitionReader outputs data for one partition).

    An interface to represent data distribution requirement, which specifies how the records should be distributed among the data partitions (one PartitionReader outputs data for one partition). Note that this interface has nothing to do with the data ordering inside one partition(the output records of a single PartitionReader).

    The instance of this interface is created and provided by Spark, then consumed by Partitioning#satisfy(Distribution). This means data source developers don't need to implement this interface, but need to catch as more concrete implementations of this interface as possible in Partitioning#satisfy(Distribution).

    Concrete implementations until now:

    Since

    3.0.0

  3. trait Partitioning extends AnyRef

    An interface to represent the output data partitioning for a data source, which is returned by SupportsReportPartitioning#outputPartitioning().

    An interface to represent the output data partitioning for a data source, which is returned by SupportsReportPartitioning#outputPartitioning(). Note that this should work like a snapshot. Once created, it should be deterministic and always report the same number of partitions and the same "satisfy" result for a certain distribution.

    Since

    3.0.0

Ungrouped