Squares off genotypes containing both called sites and reference models.
Squares off genotypes containing both called sites and reference models.
Genotypes containing both called sites and reference models.
A set of variant contexts where at least one copy of the alternate allele was called across all samples, with genotype likelihood models for all samples that had data at the site.
Squares off genotypes containing both called sites and reference models.
Squares off genotypes containing both called sites and reference models.
Genotypes containing both called sites and reference models.
A set of variant contexts where at least one copy of the alternate allele was called across all samples, with genotype likelihood models for all samples that had data at the site.
Discovers variant sites from the reference model genotypes.
Discovers variant sites from the reference model genotypes.
Genotypes containing both called sites and reference models.
Returns sites where a variant was seen in at least one sample.
Squares off a set of genotypes with reference models.
Many joint genotyping workflows use a "Genome VCF" (gVCF) based approach to incrementally compute genotype likelihoods across their dataset. In this methodology, we generate genotype likelihoods at all positions in all samples. For sites where we do not see evidence of a variant, we compute a "reference model", which is a set of genotype likelihoods assuming that we saw an unknown alternate allele. These likelihoods are then used in a joint genotyping step.
The alternative to this approach is to discover variants across all samples simultaneously, and to then score these variants. This approach is generally considered too computationally expensive for large cohorts.
This singleton object "squares off" the reference model by discovering all sites where we called a variant in at least one sample, joining these discovered variants back against the input genotypes, and then excising the genotype likelihoods from the reference models.