Package cormoran.pepper.parquet
Class ParquetStreamFactory
- java.lang.Object
-
- cormoran.pepper.avro.AvroStreamFactory
-
- cormoran.pepper.parquet.ParquetStreamFactory
-
- All Implemented Interfaces:
cormoran.pepper.avro.IAvroStreamFactory
public class ParquetStreamFactory extends cormoran.pepper.avro.AvroStreamFactoryEnable converting a Parquet file to a Stream of Map- Author:
- Benoit Lacelle
-
-
Constructor Summary
Constructors Constructor Description ParquetStreamFactory()ParquetStreamFactory(org.apache.hadoop.conf.Configuration configuration)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static org.apache.hadoop.conf.ConfigurationcloneDefaultConfiguration()org.apache.hadoop.conf.ConfigurationgetConfiguration()protected org.apache.parquet.filter2.compat.FilterCompat.FiltermakeFilter()protected cormoran.pepper.avro.IGenericRecordConsumerprepareRecordConsumer(org.apache.avro.Schema schema, URI uri)static Stream<Map<String,?>>readParquetAsStream(URI uriToParquet, Map<String,?> exampleTypes)Stream<org.apache.avro.generic.GenericRecord>stream(InputStream rawInputStream)Stream<org.apache.avro.generic.GenericRecord>stream(URI uri)protected org.apache.hadoop.fs.PathtoHadoopPath(URI uri)Stream<org.apache.avro.generic.GenericRecord>toStream(org.apache.hadoop.fs.Path hadoopPath)protected Stream<org.apache.avro.generic.GenericRecord>toStream(org.apache.parquet.hadoop.ParquetReader<org.apache.avro.generic.GenericRecord> reader)-
Methods inherited from class cormoran.pepper.avro.AvroStreamFactory
outputStream, prepareRecordConsumer, serialize, serialize, transcode
-
-
-
-
Method Detail
-
cloneDefaultConfiguration
public static org.apache.hadoop.conf.Configuration cloneDefaultConfiguration()
-
getConfiguration
public org.apache.hadoop.conf.Configuration getConfiguration()
-
stream
public Stream<org.apache.avro.generic.GenericRecord> stream(URI uri) throws IOException
- Specified by:
streamin interfacecormoran.pepper.avro.IAvroStreamFactory- Overrides:
streamin classcormoran.pepper.avro.AvroStreamFactory- Throws:
IOException
-
stream
public Stream<org.apache.avro.generic.GenericRecord> stream(InputStream rawInputStream) throws IOException
- Parameters:
rawInputStream- a stream of bytes associated to a Parquet file. It is sub-optimal as Parquet require a SeekableInputStream (i.e. an InputStream with RandomAccess). We will then copy the file in local FS- Returns:
- Throws:
IOException
-
toStream
public Stream<org.apache.avro.generic.GenericRecord> toStream(org.apache.hadoop.fs.Path hadoopPath) throws IOException
- Throws:
IOException
-
makeFilter
protected org.apache.parquet.filter2.compat.FilterCompat.Filter makeFilter()
-
toStream
protected Stream<org.apache.avro.generic.GenericRecord> toStream(org.apache.parquet.hadoop.ParquetReader<org.apache.avro.generic.GenericRecord> reader)
-
readParquetAsStream
public static Stream<Map<String,?>> readParquetAsStream(URI uriToParquet, Map<String,?> exampleTypes) throws FileNotFoundException, IOException
- Throws:
FileNotFoundExceptionIOException
-
prepareRecordConsumer
protected cormoran.pepper.avro.IGenericRecordConsumer prepareRecordConsumer(org.apache.avro.Schema schema, URI uri) throws IOException- Overrides:
prepareRecordConsumerin classcormoran.pepper.avro.AvroStreamFactory- Throws:
IOException
-
toHadoopPath
protected org.apache.hadoop.fs.Path toHadoopPath(URI uri)
-
-