Calculates the set of unique attribute values that occur for the given tag, and the number of time each value occurs.
Calculates the set of unique attribute values that occur for the given tag, and the number of time each value occurs.
The name of the optional field whose values are to be counted.
A Map whose keys are the values of the tag, and whose values are the number of time each tag-value occurs.
Converts a set of records into an RDD containing the pairs of all unique tagStrings within the records, along with the count (number of records) which have that particular attribute.
Converts a set of records into an RDD containing the pairs of all unique tagStrings within the records, along with the count (number of records) which have that particular attribute.
An RDD of attribute name / count pairs.
Converts an RDD of ADAM read records into SAM records.
Converts an RDD of ADAM read records into SAM records.
Returns a SAM/BAM formatted RDD of reads, as well as the file header.
Returns the subset of the ADAMRecords which have an attribute with the given name.
Returns the subset of the ADAMRecords which have an attribute with the given name.
The name of the attribute to filter on (should be length 2)
An RDD[ADAMRecord] containing the subset of records with a tag that matches the given name.
Collects a dictionary summarizing the read groups in an RDD of ADAMRecords.
Collects a dictionary summarizing the read groups in an RDD of ADAMRecords.
A dictionary describing the read groups in this RDD.
Aggregates together a sequence dictionary from the different individual reference sequences used in this dataset.
Aggregates together a sequence dictionary from the different individual reference sequences used in this dataset.
A sequence dictionary describing the reference contigs in this dataset.
Groups all reads by reference position and returns a non-aggregated pileup RDD.
Groups all reads by reference position and returns a non-aggregated pileup RDD.
Creates pileups for non-primary aligned reads. Default is false.
ADAMPileup without aggregation
Groups all reads by reference position, with all reference position bases grouped into a rod.
Groups all reads by reference position, with all reference position bases grouped into a rod.
Size in basepairs of buckets. Larger buckets take more time per bucket to convert, but have lower skew. Default is 1000.
Creates rods for non-primary aligned reads. Default is false.
RDD of ADAMRods.
Saves an RDD of ADAM read data into the SAM/BAM format.
Saves an RDD of ADAM read data into the SAM/BAM format.
Path to save files to.
Selects whether to save as SAM or BAM. The default value is true (save in SAM format).
Groups all reads by record group and read name
Groups all reads by record group and read name
SingleReadBuckets with primary, secondary and unmapped reads
Trims low quality read prefix/suffixes.
Trims low quality read prefix/suffixes. The average read prefix/suffix quality is calculated from the Phred scaled qualities for read bases. We trim suffixes/prefixes that are below a user provided threshold.
Phred score for trimming. Defaut value is 20.
Returns an RDD of trimmed reads.
Trims bases from the start and end of all reads in an RDD.
Trims bases from the start and end of all reads in an RDD.
Number of bases to trim from the start of the read.
Number of bases to trim from the end of the read.
Optional parameter specifying which read group to trim. If omitted, all reads are trimmed.
Returns an RDD of trimmed reads.
Trimming parameters must be >= 0.
For a single RDD element, returns 0+ sequence record elements.
For a single RDD element, returns 0+ sequence record elements.
Element from which to extract sequence records.
A seq of sequence records.