org.bdgenomics.adam.parquet_reimpl

ParquetLister

class ParquetLister[T <: IndexedRecord] extends Logging

ParquetLister materializes the records within a Parquet row group as an Iterator[T].

This is used for two purposes at the moment: 1. we've added a 'PrintParquet' command (which works only for ADAMFlatGenotype parquet files, at the moment), whose purpose is to print out a number of entries from within a Parquet file for debugging purposes. PrintParquet uses ParquetLister to get those entries and materialize them. 2. The index generators (for the Range and IDRange indices) need to read through a Parquet files entries in order to index it -- if you're not building the index at the time you write the file, that is. So they have to materialize the records in the file, and they use the ParquetLister to do that.

Materializing records from a Parquet file turns out to be a somewhat-complicated operation, with a few parameters floating around. The original version was written as part of the materialize methods in the two Parquet RDDs, but we factored it out into this class when we realized we needed it for the two additional purposes (listed above).

ParquetLister doesn't yet support UnboundRecordFilter (see the comment below) but it should be adapted to do so. When it does, we should replace the original materialize methods in the ParquetRDDs with use of this class instead, so that there's only one implementation of the materialize code floating around.

T

The type of the record to be read.

Linear Supertypes
Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. ParquetLister
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ParquetLister(indexableSchema: Option[Schema] = scala.None)(implicit classTag: ClassTag[T])

    indexableSchema

    An (optional) Schema indicating which fields to read from the Parquet file.

    classTag

    The type of the record (T) must be manifest.

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. val avroSchema: Schema

  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  11. val filter: UnboundRecordFilter

  12. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  14. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  16. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  17. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  18. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  19. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  20. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  21. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  22. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  23. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  24. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  25. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  26. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  27. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  28. def materialize(rootLocator: FileLocator, relativePath: String): Iterator[T]

    Materializes _all_ the row groups for a particular Parquet file, given as a root locator (the parent directory, for instance) and a relative path (the location of the file).

    Materializes _all_ the row groups for a particular Parquet file, given as a root locator (the parent directory, for instance) and a relative path (the location of the file).

    The particular form of this method and its arguments it driven by two concerns: 1. recursive listing of Parquet files in the local filesystem, see the materialize(fullPath : String) method above. 2. when reading files from S3, we often store the locations of our files in terms of a root path (bucket + root key) and relative offsets (which are the names read from the index itself).

    rootLocator

    A base locator, relative to which the relativePath will point to a valid parquet file

    relativePath

    A relative path to a Parquet file

    returns

    An iterator over T values from the named parquet file

  29. def materialize(fullPath: String): Iterator[T]

    Given a full path to the local filesystem, checks whether the path is a file -- in which case, it materializes the row groups from that file -- or if it's a directory.

    Given a full path to the local filesystem, checks whether the path is a file -- in which case, it materializes the row groups from that file -- or if it's a directory. If the path is a directory, it lists the row groups of all the files within the directory.

    fullPath

    The path on the local filesystem, corresponding either to a Parquet file or a directory filled with parquet files.

    returns

    An iterator over the values requested from the file (or, in case fullPath is a directory, _all_ the files in some arbitrary order)

  30. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  31. final def notify(): Unit

    Definition Classes
    AnyRef
  32. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  33. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  34. def toString(): String

    Definition Classes
    AnyRef → Any
  35. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped