The SparkContext for this RDD
The CombinedFilter for the records in this RDD -- note that this determines _both_ which records are in the RDD as well as how many partitions the RDD has, see the notes on the semantics of CombinedFilter in the comments for that class.
The FileLocator that locates the index file itself
The index file contains paths to Parquet files; these paths are relative to this value, the dataRootLocator. (This makes it easier to write tests for indexing and for this RDD).
The user-defined projection for the records of this RDD
(Since version 1.0.0) use mapPartitionsWithIndex and filter
(Since version 1.0.0) use mapPartitionsWithIndex and flatMap
(Since version 1.0.0) use mapPartitionsWithIndex and foreach
(Since version 0.7.0) use mapPartitionsWithIndex
(Since version 1.0.0) use mapPartitionsWithIndex
(Since version 1.0.0) use collect
AvroIndexedParquetRDD is an RDD which reads an index file, and (for a given CombinedFilter, which contains a predicate on the entries of the index) has as many partitions as there are unique row groups within the indexed Parquet files whose index entries satisfy the predicate.
So, for example: if you had two Parquet files, with two row groups each, and the CombinedFilter's index entry predicate matched index entries corresponding to both row groups of the first file and only one row group of the second, you'd have an RDD with three partitions total that read from both parquet files to materialize its records.
the type of the record in this RDD.