org.bdgenomics.adam.parquet_reimpl

AvroIndexedParquetRDD

class AvroIndexedParquetRDD[T <: IndexedRecord] extends RDD[T]

AvroIndexedParquetRDD is an RDD which reads an index file, and (for a given CombinedFilter, which contains a predicate on the entries of the index) has as many partitions as there are unique row groups within the indexed Parquet files whose index entries satisfy the predicate.

So, for example: if you had two Parquet files, with two row groups each, and the CombinedFilter's index entry predicate matched index entries corresponding to both row groups of the first file and only one row group of the second, you'd have an RDD with three partitions total that read from both parquet files to materialize its records.

T

the type of the record in this RDD.

Linear Supertypes
RDD[T], Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. AvroIndexedParquetRDD
  2. RDD
  3. Logging
  4. Serializable
  5. Serializable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new AvroIndexedParquetRDD(sc: SparkContext, filter: CombinedFilter[T, IDRangeIndexEntry], indexLocator: FileLocator, dataRootLocator: FileLocator, requestedSchema: Option[Schema])(implicit arg0: ClassTag[T])

    sc

    The SparkContext for this RDD

    filter

    The CombinedFilter for the records in this RDD -- note that this determines _both_ which records are in the RDD as well as how many partitions the RDD has, see the notes on the semantics of CombinedFilter in the comments for that class.

    indexLocator

    The FileLocator that locates the index file itself

    dataRootLocator

    The index file contains paths to Parquet files; these paths are relative to this value, the dataRootLocator. (This makes it easier to write tests for indexing and for this RDD).

    requestedSchema

    The user-defined projection for the records of this RDD

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. def ++(other: RDD[T]): RDD[T]

    Definition Classes
    RDD
  5. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  6. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  7. def aggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): U

    Definition Classes
    RDD
  8. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  9. def cache(): AvroIndexedParquetRDD.this.type

    Definition Classes
    RDD
  10. def cartesian[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(T, U)]

    Definition Classes
    RDD
  11. def checkpoint(): Unit

    Definition Classes
    RDD
  12. def clearDependencies(): Unit

    Attributes
    protected
    Definition Classes
    RDD
  13. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  14. def coalesce(numPartitions: Int, shuffle: Boolean)(implicit ord: Ordering[T]): RDD[T]

    Definition Classes
    RDD
  15. def collect[U](f: PartialFunction[T, U])(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  16. def collect(): Array[T]

    Definition Classes
    RDD
  17. def compute(split: Partition, context: TaskContext): Iterator[T]

    Definition Classes
    AvroIndexedParquetRDD → RDD
  18. def context: SparkContext

    Definition Classes
    RDD
  19. def count(): Long

    Definition Classes
    RDD
  20. def countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble]

    Definition Classes
    RDD
    Annotations
    @Experimental()
  21. def countApproxDistinct(relativeSD: Double): Long

    Definition Classes
    RDD
    Annotations
    @Experimental()
  22. def countByValue()(implicit ord: Ordering[T]): Map[T, Long]

    Definition Classes
    RDD
  23. def countByValueApprox(timeout: Long, confidence: Double)(implicit ord: Ordering[T]): PartialResult[Map[T, BoundedDouble]]

    Definition Classes
    RDD
    Annotations
    @Experimental()
  24. final def dependencies: Seq[Dependency[_]]

    Definition Classes
    RDD
  25. def distinct(): RDD[T]

    Definition Classes
    RDD
  26. def distinct(numPartitions: Int)(implicit ord: Ordering[T]): RDD[T]

    Definition Classes
    RDD
  27. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  28. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  29. def filter(f: (T) ⇒ Boolean): RDD[T]

    Definition Classes
    RDD
  30. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  31. def first(): T

    Definition Classes
    RDD
  32. def firstParent[U](implicit arg0: ClassTag[U]): RDD[U]

    Attributes
    protected[org.apache.spark]
    Definition Classes
    RDD
  33. def flatMap[U](f: (T) ⇒ TraversableOnce[U])(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  34. def fold(zeroValue: T)(op: (T, T) ⇒ T): T

    Definition Classes
    RDD
  35. def foreach(f: (T) ⇒ Unit): Unit

    Definition Classes
    RDD
  36. def foreachPartition(f: (Iterator[T]) ⇒ Unit): Unit

    Definition Classes
    RDD
  37. def getCheckpointFile: Option[String]

    Definition Classes
    RDD
  38. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  39. def getDependencies: Seq[Dependency[_]]

    Attributes
    protected
    Definition Classes
    RDD
  40. def getPartitions: Array[Partition]

    Attributes
    protected
    Definition Classes
    AvroIndexedParquetRDD → RDD
  41. def getPreferredLocations(split: Partition): Seq[String]

    Attributes
    protected
    Definition Classes
    RDD
  42. def getStorageLevel: StorageLevel

    Definition Classes
    RDD
  43. def glom(): RDD[Array[T]]

    Definition Classes
    RDD
  44. def groupBy[K](f: (T) ⇒ K, p: Partitioner)(implicit kt: ClassTag[K], ord: Ordering[K]): RDD[(K, Iterable[T])]

    Definition Classes
    RDD
  45. def groupBy[K](f: (T) ⇒ K, numPartitions: Int)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]

    Definition Classes
    RDD
  46. def groupBy[K](f: (T) ⇒ K)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]

    Definition Classes
    RDD
  47. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  48. val id: Int

    Definition Classes
    RDD
  49. def intersection(other: RDD[T], numPartitions: Int): RDD[T]

    Definition Classes
    RDD
  50. def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T]): RDD[T]

    Definition Classes
    RDD
  51. def intersection(other: RDD[T]): RDD[T]

    Definition Classes
    RDD
  52. def isCheckpointed: Boolean

    Definition Classes
    RDD
  53. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  54. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  55. final def iterator(split: Partition, context: TaskContext): Iterator[T]

    Definition Classes
    RDD
  56. def keyBy[K](f: (T) ⇒ K): RDD[(K, T)]

    Definition Classes
    RDD
  57. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  58. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  59. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  60. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  61. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  62. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  63. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  64. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  65. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  66. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  67. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  68. def map[U](f: (T) ⇒ U)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  69. def mapPartitions[U](f: (Iterator[T]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  70. def mapPartitionsWithContext[U](f: (TaskContext, Iterator[T]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
    Annotations
    @DeveloperApi()
  71. def mapPartitionsWithIndex[U](f: (Int, Iterator[T]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  72. def max()(implicit ord: Ordering[T]): T

    Definition Classes
    RDD
  73. def min()(implicit ord: Ordering[T]): T

    Definition Classes
    RDD
  74. var name: String

    Definition Classes
    RDD
  75. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  76. final def notify(): Unit

    Definition Classes
    AnyRef
  77. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  78. val partitioner: Option[Partitioner]

    Definition Classes
    RDD
  79. final def partitions: Array[Partition]

    Definition Classes
    RDD
  80. def persist(): AvroIndexedParquetRDD.this.type

    Definition Classes
    RDD
  81. def persist(newLevel: StorageLevel): AvroIndexedParquetRDD.this.type

    Definition Classes
    RDD
  82. def pipe(command: Seq[String], env: Map[String, String], printPipeContext: ((String) ⇒ Unit) ⇒ Unit, printRDDElement: (T, (String) ⇒ Unit) ⇒ Unit, separateWorkingDir: Boolean): RDD[String]

    Definition Classes
    RDD
  83. def pipe(command: String, env: Map[String, String]): RDD[String]

    Definition Classes
    RDD
  84. def pipe(command: String): RDD[String]

    Definition Classes
    RDD
  85. final def preferredLocations(split: Partition): Seq[String]

    Definition Classes
    RDD
  86. def randomSplit(weights: Array[Double], seed: Long): Array[RDD[T]]

    Definition Classes
    RDD
  87. def reduce(f: (T, T) ⇒ T): T

    Definition Classes
    RDD
  88. def repartition(numPartitions: Int)(implicit ord: Ordering[T]): RDD[T]

    Definition Classes
    RDD
  89. def sample(withReplacement: Boolean, fraction: Double, seed: Long): RDD[T]

    Definition Classes
    RDD
  90. def saveAsObjectFile(path: String): Unit

    Definition Classes
    RDD
  91. def saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec]): Unit

    Definition Classes
    RDD
  92. def saveAsTextFile(path: String): Unit

    Definition Classes
    RDD
  93. def setName(_name: String): AvroIndexedParquetRDD.this.type

    Definition Classes
    RDD
  94. def sparkContext: SparkContext

    Definition Classes
    RDD
  95. def subtract(other: RDD[T], p: Partitioner)(implicit ord: Ordering[T]): RDD[T]

    Definition Classes
    RDD
  96. def subtract(other: RDD[T], numPartitions: Int): RDD[T]

    Definition Classes
    RDD
  97. def subtract(other: RDD[T]): RDD[T]

    Definition Classes
    RDD
  98. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  99. def take(num: Int): Array[T]

    Definition Classes
    RDD
  100. def takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T]

    Definition Classes
    RDD
  101. def takeSample(withReplacement: Boolean, num: Int, seed: Long): Array[T]

    Definition Classes
    RDD
  102. def toDebugString: String

    Definition Classes
    RDD
  103. def toJavaRDD(): JavaRDD[T]

    Definition Classes
    RDD
  104. def toLocalIterator: Iterator[T]

    Definition Classes
    RDD
  105. def toString(): String

    Definition Classes
    RDD → AnyRef → Any
  106. def top(num: Int)(implicit ord: Ordering[T]): Array[T]

    Definition Classes
    RDD
  107. def union(other: RDD[T]): RDD[T]

    Definition Classes
    RDD
  108. def unpersist(blocking: Boolean): AvroIndexedParquetRDD.this.type

    Definition Classes
    RDD
  109. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  110. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  111. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  112. def zip[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(T, U)]

    Definition Classes
    RDD
  113. def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D])(f: (Iterator[T], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  114. def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D], preservesPartitioning: Boolean)(f: (Iterator[T], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  115. def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C])(f: (Iterator[T], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  116. def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C], preservesPartitioning: Boolean)(f: (Iterator[T], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  117. def zipPartitions[B, V](rdd2: RDD[B])(f: (Iterator[T], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  118. def zipPartitions[B, V](rdd2: RDD[B], preservesPartitioning: Boolean)(f: (Iterator[T], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  119. def zipWithIndex(): RDD[(T, Long)]

    Definition Classes
    RDD
  120. def zipWithUniqueId(): RDD[(T, Long)]

    Definition Classes
    RDD

Deprecated Value Members

  1. def filterWith[A](constructA: (Int) ⇒ A)(p: (T, A) ⇒ Boolean): RDD[T]

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex and filter

  2. def flatMapWith[A, U](constructA: (Int) ⇒ A, preservesPartitioning: Boolean)(f: (T, A) ⇒ Seq[U])(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex and flatMap

  3. def foreachWith[A](constructA: (Int) ⇒ A)(f: (T, A) ⇒ Unit): Unit

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex and foreach

  4. def mapPartitionsWithSplit[U](f: (Int, Iterator[T]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 0.7.0) use mapPartitionsWithIndex

  5. def mapWith[A, U](constructA: (Int) ⇒ A, preservesPartitioning: Boolean)(f: (T, A) ⇒ U)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex

  6. def toArray(): Array[T]

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use collect

Inherited from RDD[T]

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped