Class LocalMultiFileReaderConfig.Builder<T>

    • Method Detail

      • andFilter

        public LocalMultiFileReaderConfig.Builder<T> andFilter​(Predicate<File> fileFilter)
        Parameters:
        fileFilter - The file predicate to AND with existing ones. Given a file object, determines whether the file should be read. By default, the filter accepts all files.
        Returns:
        this builder
      • shard

        public LocalMultiFileReaderConfig.Builder<T> shard​(ShardSpecifier shardSpecifier,
                                                           boolean isBalanced)
        Parameters:
        shardSpecifier - Identifies the shard to read. When using this method, all files passing the filters are automatically assigned a shard. Sharding is based on hashing of their paths when balancing is turned off, and based on file sizes when balancing is turned on.
        isBalanced - Indicates whether the sharding should be based on file sizes, in order to achieve a semi-balanced partition of the data into shards. This option has some caveats to be aware of: 1. It consumes more memory since it stores all file references in memory. If you are working with millions of files, this may require careful memory settings. 2. When using in a distributed system, it is the responsibility of the user to guarantee that no file is added/changed once the workers start. Failing to do so will result in severe silent problems such as files handled by multiple instances, or files not handled at all.
        Returns:
        this builder
      • shard

        public LocalMultiFileReaderConfig.Builder<T> shard​(ShardSpecifier shardSpecifier)
        Parameters:
        shardSpecifier - Indicates that automatic data sharding is requested. All files passing the filter conditions are automatically assigned a shard. Sharding is based on hashing of their paths.
        Returns:
        this builder
      • paths

        public LocalMultiFileReaderConfig.Builder<T> paths​(Collection<String> paths,
                                                           boolean isRecursive)
        Parameters:
        paths - The set of paths (full local paths) of folders to read files from.
        isRecursive - Indicates whether files should be fetched from the paths recursively or not.
        Returns:
        this builder
      • paths

        public LocalMultiFileReaderConfig.Builder<T> paths​(String path,
                                                           boolean isRecursive)
        Parameters:
        path - The folder path (full local path) to read files from
        isRecursive - Indicates whether files should be fetched from the path recursively or not.
        Returns:
        this builder
      • threadNum

        public LocalMultiFileReaderConfig.Builder<T> threadNum​(int threadNum)
        Parameters:
        threadNum - The number of threads to use for reading files when used by the async pipe. For sync pipes this configuration has no effect. By default, the number of machine cores is used.
        Returns:
        this builder
      • fileOrder

        public LocalMultiFileReaderConfig.Builder<T> fileOrder​(Comparator<File> fileOrder)
        Parameters:
        fileOrder - Forces an order by which files should be read. The order is defined as a comparator on file objects. This only applies for sync reading. By default order is lexicographic on the full path.
        Returns:
        this builder