CompareADAM is a tool for pairwise comparison of ADAM files (or merged sets of ADAM files, see the
note on the -recurse{1,2} optional parameters, below).
The canonical use-case for CompareADAM involves a single input file run through (for example) two
different implementations of the same pipeline, producing two comparable ADAM files at the end.
CompareADAM will load these ADAM files and perform a read-name-based equi-join. It then computes
one or more metrics (embodied as BucketComparisons values) across the joined records, as specified
on the command-line, and aggregates each metric into a histogram (although, this can be modified if
other aggregations are required in the future) and outputs the resulting histograms to a specified
directory as text files.
There is an R script in the adam-scripts module to process those outputs into a figure.
The available metrics to be calculated are defined, by name, in the DefaultComparisons object.
A subsequent tool like FindReads can be used to track down which reads give rise to particular aggregated
bins in the output histograms, if further diagnosis is needed.
CompareADAM is a tool for pairwise comparison of ADAM files (or merged sets of ADAM files, see the note on the -recurse{1,2} optional parameters, below).
The canonical use-case for CompareADAM involves a single input file run through (for example) two different implementations of the same pipeline, producing two comparable ADAM files at the end.
CompareADAM will load these ADAM files and perform a read-name-based equi-join. It then computes one or more metrics (embodied as BucketComparisons values) across the joined records, as specified on the command-line, and aggregates each metric into a histogram (although, this can be modified if other aggregations are required in the future) and outputs the resulting histograms to a specified directory as text files.
There is an R script in the adam-scripts module to process those outputs into a figure.
The available metrics to be calculated are defined, by name, in the DefaultComparisons object.
A subsequent tool like FindReads can be used to track down which reads give rise to particular aggregated bins in the output histograms, if further diagnosis is needed.