Class ParquetStreamFactory

  • All Implemented Interfaces:
    IAvroStreamFactory

    public class ParquetStreamFactory
    extends AvroStreamFactory
    Enable converting a Parquet file to a Stream of Map
    Author:
    Benoit Lacelle
    • Constructor Detail

      • ParquetStreamFactory

        public ParquetStreamFactory()
      • ParquetStreamFactory

        public ParquetStreamFactory​(org.apache.hadoop.conf.Configuration configuration)
    • Method Detail

      • cloneDefaultConfiguration

        public static org.apache.hadoop.conf.Configuration cloneDefaultConfiguration()
      • getConfiguration

        public org.apache.hadoop.conf.Configuration getConfiguration()
      • stream

        public Stream<org.apache.avro.generic.GenericRecord> stream​(InputStream rawInputStream)
                                                             throws IOException
        Parameters:
        rawInputStream - a stream of bytes associated to a Parquet file. It is sub-optimal as Parquet require a SeekableInputStream (i.e. an InputStream with RandomAccess). We will then copy the file in local FS
        Returns:
        Throws:
        IOException
      • toStream

        public Stream<org.apache.avro.generic.GenericRecord> toStream​(org.apache.hadoop.fs.Path hadoopPath)
                                                               throws IOException
        Throws:
        IOException
      • makeFilter

        protected org.apache.parquet.filter2.compat.FilterCompat.Filter makeFilter()
      • toStream

        protected Stream<org.apache.avro.generic.GenericRecord> toStream​(org.apache.parquet.hadoop.ParquetReader<org.apache.avro.generic.GenericRecord> reader)
      • toHadoopPath

        protected org.apache.hadoop.fs.Path toHadoopPath​(URI uri)