ReferenceFolder is a trait for folding a value (of type T) into a Sequence of ReferenceRegion values.
This is primarily used, right now, in the index building for the IDRangeIndex, where we find
contiguous regions of the genome corresponding to each Parquet row group by accumulating variant
positions that are within a certain 'window' distance of each other.
We _could_ have avoided writing ReferenceFolder altogether, and just used the existing ReferenceMapping
trait and implicit conversions for ADAMRecord, ADAMFlatGenotype, etc. However, that necessitates the
creation of a few additional values per comparison, and that extra object creation was putting pressure
on our GC times. ReferenceFolder is, therefore, ultimately an optimization for performance.
T
the type of the values to fold -- should be something that is located along the genome.
ReferenceFolder is a trait for folding a value (of type T) into a Sequence of ReferenceRegion values.
This is primarily used, right now, in the index building for the IDRangeIndex, where we find contiguous regions of the genome corresponding to each Parquet row group by accumulating variant positions that are within a certain 'window' distance of each other.
We _could_ have avoided writing ReferenceFolder altogether, and just used the existing ReferenceMapping trait and implicit conversions for ADAMRecord, ADAMFlatGenotype, etc. However, that necessitates the creation of a few additional values per comparison, and that extra object creation was putting pressure on our GC times. ReferenceFolder is, therefore, ultimately an optimization for performance.
the type of the values to fold -- should be something that is located along the genome.