org.bdgenomics.adam.rdd.read

AlignmentRecordRDDFunctions

class AlignmentRecordRDDFunctions extends ADAMSequenceDictionaryRDDAggregator[AlignmentRecord]

Linear Supertypes
ADAMSequenceDictionaryRDDAggregator[AlignmentRecord], Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. AlignmentRecordRDDFunctions
  2. ADAMSequenceDictionaryRDDAggregator
  3. Logging
  4. Serializable
  5. Serializable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new AlignmentRecordRDDFunctions(rdd: RDD[AlignmentRecord])

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def adamAlignedRecordSave(args: ADAMSaveArgs): Boolean

  7. def adamBQSR(knownSnps: Broadcast[SnpTable], observationDumpFile: Option[String] = None): RDD[AlignmentRecord]

    Runs base quality score recalibration on a set of reads.

    Runs base quality score recalibration on a set of reads. Uses a table of known SNPs to mask true variation during the recalibration process.

    knownSnps

    A table of known SNPs to mask valid variants.

    observationDumpFile

    An optional local path to dump recalibration observations to.

    returns

    Returns an RDD of recalibrated reads.

  8. def adamCharacterizeTagValues(tag: String): Map[Any, Long]

    Calculates the set of unique attribute values that occur for the given tag, and the number of time each value occurs.

    Calculates the set of unique attribute values that occur for the given tag, and the number of time each value occurs.

    tag

    The name of the optional field whose values are to be counted.

    returns

    A Map whose keys are the values of the tag, and whose values are the number of time each tag-value occurs.

  9. def adamCharacterizeTags(): RDD[(String, Long)]

    Converts a set of records into an RDD containing the pairs of all unique tagStrings within the records, along with the count (number of records) which have that particular attribute.

    Converts a set of records into an RDD containing the pairs of all unique tagStrings within the records, along with the count (number of records) which have that particular attribute.

    returns

    An RDD of attribute name / count pairs.

  10. def adamConvertToSAM(): (RDD[SAMRecordWritable], SAMFileHeader)

    Converts an RDD of ADAM read records into SAM records.

    Converts an RDD of ADAM read records into SAM records.

    returns

    Returns a SAM/BAM formatted RDD of reads, as well as the file header.

  11. def adamCountKmers(kmerLength: Int): RDD[(String, Long)]

    Cuts reads into _k_-mers, and then counts the number of occurrences of each _k_-mer.

    Cuts reads into _k_-mers, and then counts the number of occurrences of each _k_-mer.

    kmerLength

    The value of _k_ to use for cutting _k_-mers.

    returns

    Returns an RDD containing k-mer/count pairs.

    See also

    adamCountQmers

  12. def adamCountQmers(qmerLength: Int): RDD[(String, Double)]

    Cuts reads into _q_-mers, and then finds the _q_-mer weight.

    Cuts reads into _q_-mers, and then finds the _q_-mer weight. Q-mers are described in:

    Kelley, David R., Michael C. Schatz, and Steven L. Salzberg. "Quake: quality-aware detection and correction of sequencing errors." Genome Biol 11.11 (2010): R116.

    _Q_-mers are _k_-mers weighted by the quality score of the bases in the _k_-mer.

    qmerLength

    The value of _q_ to use for cutting _q_-mers.

    returns

    Returns an RDD containing q-mer/weight pairs.

    See also

    adamCountKmers

  13. def adamFilterRecordsWithTag(tagName: String): RDD[AlignmentRecord]

    Returns the subset of the ADAMRecords which have an attribute with the given name.

    Returns the subset of the ADAMRecords which have an attribute with the given name.

    tagName

    The name of the attribute to filter on (should be length 2)

    returns

    An RDD[Read] containing the subset of records with a tag that matches the given name.

  14. def adamFlagStat(): (FlagStatMetrics, FlagStatMetrics)

  15. def adamGetReadGroupDictionary(): RecordGroupDictionary

    Collects a dictionary summarizing the read groups in an RDD of ADAMRecords.

    Collects a dictionary summarizing the read groups in an RDD of ADAMRecords.

    returns

    A dictionary describing the read groups in this RDD.

  16. def adamGetSequenceDictionary(): SequenceDictionary

    Aggregates together a sequence dictionary from the different individual reference sequences used in this dataset.

    Aggregates together a sequence dictionary from the different individual reference sequences used in this dataset.

    returns

    A sequence dictionary describing the reference contigs in this dataset.

    Definition Classes
    ADAMSequenceDictionaryRDDAggregator
  17. def adamMarkDuplicates(): RDD[AlignmentRecord]

  18. def adamRePairReads(secondPairRdd: RDD[AlignmentRecord], validationStringency: ValidationStringency = ValidationStringency.LENIENT): RDD[AlignmentRecord]

    Reassembles read pairs from two sets of unpaired reads.

    Reassembles read pairs from two sets of unpaired reads. The assumption is that the two sets were _originally_ paired together.

    secondPairRdd

    The rdd containing the second read from the pairs.

    validationStringency

    How stringently to validate the reads.

    returns

    Returns an RDD with the pair information recomputed.

    Note

    The RDD that this is called on should be the RDD with the first read from the pair.

  19. def adamRealignIndels(consensusModel: ConsensusGenerator = new ConsensusGeneratorFromReads, isSorted: Boolean = false, maxIndelSize: Int = 500, maxConsensusNumber: Int = 30, lodThreshold: Double = 5.0, maxTargetSize: Int = 3000): RDD[AlignmentRecord]

    Realigns indels using a concensus-based heuristic.

    Realigns indels using a concensus-based heuristic.

    isSorted

    If the input data is sorted, setting this parameter to true avoids a second sort.

    maxIndelSize

    The size of the largest indel to use for realignment.

    maxConsensusNumber

    The maximum number of consensus sequences to realign against per target region.

    lodThreshold

    Log-odds threhold to use when realigning; realignments are only finalized if the log-odds threshold is exceeded.

    maxTargetSize

    The maximum width of a single target region for realignment.

    returns

    Returns an RDD of mapped reads which have been realigned.

    See also

    RealignIndels

  20. def adamRecords2Pileup(secondaryAlignments: Boolean = false): RDD[Pileup]

    Groups all reads by reference position and returns a non-aggregated pileup RDD.

    Groups all reads by reference position and returns a non-aggregated pileup RDD.

    secondaryAlignments

    Creates pileups for non-primary aligned reads. Default is false.

    returns

    Pileup without aggregation

  21. def adamRecords2Rods(bucketSize: Int = 1000, secondaryAlignments: Boolean = false): RDD[Rod]

    Groups all reads by reference position, with all reference position bases grouped into a rod.

    Groups all reads by reference position, with all reference position bases grouped into a rod.

    bucketSize

    Size in basepairs of buckets. Larger buckets take more time per bucket to convert, but have lower skew. Default is 1000.

    secondaryAlignments

    Creates rods for non-primary aligned reads. Default is false.

    returns

    RDD of Rods.

  22. def adamSAMSave(filePath: String, asSam: Boolean = true): Unit

    Saves an RDD of ADAM read data into the SAM/BAM format.

    Saves an RDD of ADAM read data into the SAM/BAM format.

    filePath

    Path to save files to.

    asSam

    Selects whether to save as SAM or BAM. The default value is true (save in SAM format).

  23. def adamSAMString: String

  24. def adamSave(args: ADAMSaveAnyArgs): Boolean

  25. def adamSaveAsFastq(fileName: String, fileName2Opt: Option[String] = None, sort: Boolean = false, validationStringency: ValidationStringency = ValidationStringency.LENIENT, persistLevel: Option[StorageLevel] = None): Unit

    Saves reads in FASTQ format.

    Saves reads in FASTQ format.

    fileName

    Path to save files at.

    sort

    Whether to sort the FASTQ files by read name or not. Defaults to false. Sorting the output will recover pair order, if desired.

  26. def adamSaveAsPairedFastq(fileName1: String, fileName2: String, validationStringency: ValidationStringency = ValidationStringency.LENIENT, persistLevel: Option[StorageLevel] = None): Unit

    Saves these AlignmentRecords to two FASTQ files: one for the first mate in each pair, and the other for the second.

    Saves these AlignmentRecords to two FASTQ files: one for the first mate in each pair, and the other for the second.

    fileName1

    Path at which to save a FASTQ file containing the first mate of each pair.

    fileName2

    Path at which to save a FASTQ file containing the second mate of each pair.

    validationStringency

    Iff strict, throw an exception if any read in this RDD is not accompanied by its mate.

  27. def adamSingleReadBuckets(): RDD[SingleReadBucket]

    Groups all reads by record group and read name

    Groups all reads by record group and read name

    returns

    SingleReadBuckets with primary, secondary and unmapped reads

  28. def adamSortReadsByReferencePosition(): RDD[AlignmentRecord]

  29. def adamTrimLowQualityReadGroups(phredThreshold: Int = 20): RDD[AlignmentRecord]

    Trims low quality read prefix/suffixes.

    Trims low quality read prefix/suffixes. The average read prefix/suffix quality is calculated from the Phred scaled qualities for read bases. We trim suffixes/prefixes that are below a user provided threshold.

    phredThreshold

    Phred score for trimming. Defaut value is 20.

    returns

    Returns an RDD of trimmed reads.

  30. def adamTrimReads(trimStart: Int, trimEnd: Int, readGroup: String = null): RDD[AlignmentRecord]

    Trims bases from the start and end of all reads in an RDD.

    Trims bases from the start and end of all reads in an RDD.

    trimStart

    Number of bases to trim from the start of the read.

    trimEnd

    Number of bases to trim from the end of the read.

    readGroup

    Optional parameter specifying which read group to trim. If omitted, all reads are trimmed.

    returns

    Returns an RDD of trimmed reads.

    Note

    Trimming parameters must be >= 0.

  31. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  32. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  33. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  34. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  35. def filterByOverlappingRegion(query: ReferenceRegion): RDD[AlignmentRecord]

    Calculates the subset of the RDD whose AlignmentRecords overlap the corresponding query ReferenceRegion.

    Calculates the subset of the RDD whose AlignmentRecords overlap the corresponding query ReferenceRegion. Equality of the reference sequence (to which these are aligned) is tested by string equality of the names. AlignmentRecords whose 'getReadMapped' method return 'false' are ignored.

    The end of the record against the reference sequence is calculated from the cigar string using the ADAMContext.referenceLengthFromCigar method.

    query

    The query region, only records which overlap this region are returned.

    returns

    The subset of AlignmentRecords (corresponding to either primary or secondary alignments) that overlap the query region.

  36. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  37. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  38. def getSequenceRecordsFromElement(elem: AlignmentRecord): Set[SequenceRecord]

    For a single RDD element, returns 0+ sequence record elements.

    For a single RDD element, returns 0+ sequence record elements.

    elem

    Element from which to extract sequence records.

    returns

    A seq of sequence records.

    Definition Classes
    AlignmentRecordRDDFunctionsADAMSequenceDictionaryRDDAggregator
  39. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  40. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  41. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  42. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  43. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  44. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  45. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  46. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  47. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  48. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  49. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  50. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  51. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  52. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  53. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  54. def maybeSaveBam(args: ADAMSaveArgs): Boolean

  55. def maybeSaveFastq(args: ADAMSaveAnyArgs): Boolean

  56. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  57. final def notify(): Unit

    Definition Classes
    AnyRef
  58. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  59. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  60. def toString(): String

    Definition Classes
    AnyRef → Any
  61. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  62. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  63. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from ADAMSequenceDictionaryRDDAggregator[AlignmentRecord]

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped