Class DistributedShufflerConfig<T>


  • public class DistributedShufflerConfig<T>
    extends Object
    A configuration object for DistributedShufflerPipe. Works with builder design pattern.
    Author:
    Eyal Schneider
    • Method Detail

      • getWorkers

        public List<HostPort> getWorkers()
        Returns:
        The list of workers (each specified by address+ports) taking part in the distributed shuffling. 1. Must include the current worker, with the same port as specified by the getPort(..) method. 2. When used in distributed shuffling, the same set of workers must be used by all workers 3. The order returned here is the one determining the shard-id to worker mapping; worker at index i works exclusively on shard #i.
      • getPort

        public int getPort()
        Returns:
        The server port to use by current worker
      • getShardFunc

        public Function<T,​Integer> getShardFunc()
        Returns:
        The explicit item sharding function to use. For each item, returns the id of the shard responsible to handle it (0 .. workers.size() - 1). The mapping between shard id and worker is done internally, but the caller can get it using getWorkerShardId(..). By default, sharding is based on hashing of the item itself.
      • getCodec

        public ByteArrayCodec<T> getCodec()
        Returns:
        The encoder/decoder to use for sending/receiving items to/from other workers
      • getWorkerShardId

        public int getWorkerShardId​(HostPort worker)
        Utility function
        Parameters:
        worker - A worker. Expected to exist in the set of workers.
        Returns:
        The shard-id this worker is responsible for (0 .. getWorkers().size() - 1), or -1 if not found.