public class VCFInputFormat extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,VariantContextWritable>
InputFormat for VCF files. Values
are the individual records; see VCFRecordReader for the meaning of
the key.| Modifier and Type | Field and Description |
|---|---|
static String |
INTERVALS_PROPERTY
Filter by region, like
-L in SAMtools. |
static String |
TRUST_EXTS_PROPERTY
Whether file extensions are to be trusted, defaults to true.
|
| Constructor and Description |
|---|
VCFInputFormat()
Creates a new input format, which will use the
Configuration from the first public method called. |
VCFInputFormat(org.apache.hadoop.conf.Configuration conf)
Creates a new input format, reading
TRUST_EXTS_PROPERTY from
the given Configuration. |
VCFInputFormat(Map<org.apache.hadoop.fs.Path,VCFFormat> formatMap)
Creates a new input format, trusting the given
Map to
define the file-to-format associations. |
| Modifier and Type | Method and Description |
|---|---|
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,VariantContextWritable> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext ctx)
Returns a
BCFRecordReader or VCFRecordReader as
appropriate, initialized with the given parameters. |
VCFFormat |
getFormat(org.apache.hadoop.fs.Path path)
Returns the
VCFFormat corresponding to the given path. |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext job)
Defers to
BCFSplitGuesser as appropriate for each individual
path. |
protected boolean |
isSplitable(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path filename) |
static <T extends htsjdk.samtools.util.Locatable> |
setIntervals(org.apache.hadoop.conf.Configuration conf,
List<T> intervals) |
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSizepublic static final String TRUST_EXTS_PROPERTY
public static final String INTERVALS_PROPERTY
-L in SAMtools. Takes a comma-separated
list of intervals, e.g. chr1:1-20000,chr2:12000-20000. For
programmatic use setIntervals(Configuration, List) should be preferred.public VCFInputFormat()
Configuration from the first public method called. Thus this
will behave as though constructed with a Configuration
directly, but only after it has received it in
createRecordReader (via the TaskAttemptContext)
or isSplitable or getSplits (via the
JobContext). Until then, other methods will throw an IllegalStateException.
This constructor exists mainly as a convenience, e.g. so that
VCFInputFormat can be used directly in
Job.setInputFormatClass.public VCFInputFormat(org.apache.hadoop.conf.Configuration conf)
TRUST_EXTS_PROPERTY from
the given Configuration.public VCFInputFormat(Map<org.apache.hadoop.fs.Path,VCFFormat> formatMap)
Map to
define the file-to-format associations. Neither file paths nor their
contents are looked at, only the Map is used.
The Map is not copied, so it should not be modified while
this input format is in use!
public static <T extends htsjdk.samtools.util.Locatable> void setIntervals(org.apache.hadoop.conf.Configuration conf,
List<T> intervals)
public VCFFormat getFormat(org.apache.hadoop.fs.Path path)
VCFFormat corresponding to the given path. Returns
null if it cannot be determined even based on the file
contents (unless future VCF/BCF formats are very different, this means
that the path does not refer to a VCF or BCF file).
If this input format was constructed using a given
Map<Path,VCFFormat> and the path is not contained
within that map, throws an IllegalArgumentException.
protected boolean isSplitable(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path filename)
isSplitable in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,VariantContextWritable>public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,VariantContextWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext ctx) throws InterruptedException, IOException
BCFRecordReader or VCFRecordReader as
appropriate, initialized with the given parameters.
Throws IllegalArgumentException if the given input split is
not a FileVirtualSplit or a FileSplit, or if the path
referred to is not recognized as a VCF or BCF file (see getFormat(org.apache.hadoop.fs.Path)).
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,VariantContextWritable>InterruptedExceptionIOExceptionpublic List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job) throws IOException
BCFSplitGuesser as appropriate for each individual
path. VCF paths do not require special handling, so their splits are left
unchanged.getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,VariantContextWritable>IOExceptionCopyright © 2016. All rights reserved.