Class GSMultiTxtFileReaderPipe

  • All Implemented Interfaces:
    Closeable, AutoCloseable, org.pipecraft.pipes.BasePipe, org.pipecraft.pipes.sync.Pipe<String>

    public class GSMultiTxtFileReaderPipe
    extends org.pipecraft.pipes.sync.source.StorageMultiTxtFileReaderPipe<com.google.cloud.storage.Blob>
    Reads data from multiple files in Google-Storage under some folder, as if they were concatenated using some predefined order. Files are automatically un-compressed according to their extensions (See Compression) for supported formats.
    Author:
    Eyal Schneider
    • Constructor Detail

      • GSMultiTxtFileReaderPipe

        public GSMultiTxtFileReaderPipe​(GoogleStorage storage,
                                        String bucket,
                                        String folderPath,
                                        Charset charset,
                                        int chunkSize,
                                        String fileRegex,
                                        Comparator<com.google.cloud.storage.Blob> comparator)
        Constructor
        Parameters:
        storage - The cloud storage connector
        bucket - The bucket to read the file from
        folderPath - The full path of the folder to read the files from
        charset - The charset used
        chunkSize - The size (in bytes) of each chunk read from storage at once, or 0 for using the default one.
        fileRegex - Used for determining which files to read from based on the file name
        comparator - A comparator used for defining the order at which file are read
      • GSMultiTxtFileReaderPipe

        public GSMultiTxtFileReaderPipe​(GoogleStorage storage,
                                        String bucket,
                                        String folderPath,
                                        Comparator<com.google.cloud.storage.Blob> comparator)
        Constructor Assumes UTF8 encoding of all files, and doesn't apply any filter on files to read from.
        Parameters:
        storage - The cloud storage connector
        bucket - The bucket to read the file from
        folderPath - The full path of the folder to read the files from
        comparator - A comparator used for defining the order at which file are read
      • GSMultiTxtFileReaderPipe

        public GSMultiTxtFileReaderPipe​(GoogleStorage storage,
                                        String bucket,
                                        String folderPath)
        Constructor Scans the remote files in lexicographic name order. Performs no filtering, and assumes UTF8 encoding.
        Parameters:
        storage - The cloud storage connector
        bucket - The bucket to read the file from
        folderPath - The full path of the folder to read the files from