Saves AlignmentRecords as a directory of Parquet files or as SAM/BAM.
Saves AlignmentRecords as a directory of Parquet files or as SAM/BAM.
This method infers the output format from the file extension. Filenames ending in .sam/.bam are saved as SAM/BAM, and all other files are saved as Parquet.
Save configuration arguments.
Sequence dictionary describing the contigs these reads are aligned to.
Record group dictionary describing the record groups these reads are from.
saveAsParquet
adamSAMSave
adamSave
Runs base quality score recalibration on a set of reads.
Runs base quality score recalibration on a set of reads. Uses a table of known SNPs to mask true variation during the recalibration process.
A table of known SNPs to mask valid variants.
An optional local path to dump recalibration observations to.
Returns an RDD of recalibrated reads.
Calculates the set of unique attribute values that occur for the given tag, and the number of time each value occurs.
Calculates the set of unique attribute values that occur for the given tag, and the number of time each value occurs.
The name of the optional field whose values are to be counted.
A Map whose keys are the values of the tag, and whose values are the number of time each tag-value occurs.
Converts a set of records into an RDD containing the pairs of all unique tagStrings within the records, along with the count (number of records) which have that particular attribute.
Converts a set of records into an RDD containing the pairs of all unique tagStrings within the records, along with the count (number of records) which have that particular attribute.
An RDD of attribute name / count pairs.
Converts an RDD of ADAM read records into SAM records.
Converts an RDD of ADAM read records into SAM records.
Returns a SAM/BAM formatted RDD of reads, as well as the file header.
Cuts reads into _k_-mers, and then counts the number of occurrences of each _k_-mer.
Cuts reads into _k_-mers, and then counts the number of occurrences of each _k_-mer.
The value of _k_ to use for cutting _k_-mers.
Returns an RDD containing k-mer/count pairs.
adamCountQmers
Returns the subset of the ADAMRecords which have an attribute with the given name.
Returns the subset of the ADAMRecords which have an attribute with the given name.
The name of the attribute to filter on (should be length 2)
An RDD[Read] containing the subset of records with a tag that matches the given name.
Aggregates together a sequence dictionary from the different individual reference sequences used in this dataset.
Aggregates together a sequence dictionary from the different individual reference sequences used in this dataset.
A sequence dictionary describing the reference contigs in this dataset.
Marks reads as possible fragment duplicates.
Marks reads as possible fragment duplicates.
A dictionary mapping read groups to sequencing libraries. This is used when deduping reads, as we only dedupe reads that are from the same original library.
A new RDD where reads have the duplicate read flag set. Duplicate reads are NOT filtered out.
Reassembles read pairs from two sets of unpaired reads.
Reassembles read pairs from two sets of unpaired reads. The assumption is that the two sets were _originally_ paired together.
The rdd containing the second read from the pairs.
How stringently to validate the reads.
Returns an RDD with the pair information recomputed.
The RDD that this is called on should be the RDD with the first read from the pair.
Realigns indels using a concensus-based heuristic.
Realigns indels using a concensus-based heuristic.
If the input data is sorted, setting this parameter to true avoids a second sort.
The size of the largest indel to use for realignment.
The maximum number of consensus sequences to realign against per target region.
Log-odds threhold to use when realigning; realignments are only finalized if the log-odds threshold is exceeded.
The maximum width of a single target region for realignment.
Returns an RDD of mapped reads which have been realigned.
RealignIndels
Saves an RDD of ADAM read data into the SAM/BAM format.
Saves an RDD of ADAM read data into the SAM/BAM format.
Path to save files to.
A dictionary describing the contigs this file is aligned against.
A dictionary describing the read groups in this file.
Selects whether to save as SAM or BAM. The default value is true (save in SAM format).
If true, saves output as a single file.
If the output is sorted, this will modify the header.
Converts an RDD into the SAM spec string it represents.
Converts an RDD into the SAM spec string it represents.
This method converts an RDD of AlignmentRecords back to an RDD of SAMRecordWritables and a SAMFileHeader, and then maps this RDD into a string on the driver that represents this file in SAM.
Sequence dictionary describing the contigs these reads are aligned to.
Record group dictionary describing the record groups these reads are from.
A string on the driver representing this RDD of reads in SAM format.
adamConvertToSAM
Saves AlignmentRecords as a directory of Parquet files or as SAM/BAM.
Saves AlignmentRecords as a directory of Parquet files or as SAM/BAM.
This method infers the output format from the file extension. Filenames ending in .sam/.bam are saved as SAM/BAM, and all other files are saved as Parquet.
Save configuration arguments.
Sequence dictionary describing the contigs these reads are aligned to.
Record group dictionary describing the record groups these reads are from.
adamSaveAsFastq
saveAsParquet
adamSAMSave
adamAlignedRecordSave
Saves reads in FASTQ format.
Saves reads in FASTQ format.
Path to save files at.
Output the original base qualities (OQ) if available as opposed to those from BQSR
Whether to sort the FASTQ files by read name or not. Defaults to false. Sorting the output will recover pair order, if desired.
Saves these AlignmentRecords to two FASTQ files: one for the first mate in each pair, and the other for the second.
Saves these AlignmentRecords to two FASTQ files: one for the first mate in each pair, and the other for the second.
Path at which to save a FASTQ file containing the first mate of each pair.
Path at which to save a FASTQ file containing the second mate of each pair.
Iff strict, throw an exception if any read in this RDD is not accompanied by its mate.
Groups all reads by record group and read name
Groups all reads by record group and read name
SingleReadBuckets with primary, secondary and unmapped reads
Calculates the subset of the RDD whose AlignmentRecords overlap the corresponding query ReferenceRegion.
Calculates the subset of the RDD whose AlignmentRecords overlap the corresponding query ReferenceRegion. Equality of the reference sequence (to which these are aligned) is tested by string equality of the names. AlignmentRecords whose 'getReadMapped' method return 'false' are ignored.
The end of the record against the reference sequence is calculated from the cigar string using the ADAMContext.referenceLengthFromCigar method.
The query region, only records which overlap this region are returned.
The subset of AlignmentRecords (corresponding to either primary or secondary alignments) that overlap the query region.
For a single RDD element, returns 0+ sequence record elements.
For a single RDD element, returns 0+ sequence record elements.
Element from which to extract sequence records.
A seq of sequence records.
Saves AlignmentRecords as a directory of Parquet files.
Saves AlignmentRecords as a directory of Parquet files.
The RDD is written as a directory of Parquet files, with Parquet configuration described by the input param args. The provided sequence dictionary is written at args.outputPath.seqdict while the provided record group dictionary is written at args.outputPath.rgdict. These two files are written as Avro binary.
Save configuration arguments.
Sequence dictionary describing the contigs these reads are aligned to.
Record group dictionary describing the record groups these reads are from.
adamAlignedRecordSave
adamSave