Package org.bdgenomics.adam.io
Class FastqRecordReader
- java.lang.Object
-
- org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
-
- org.bdgenomics.adam.io.FastqRecordReader
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public abstract class FastqRecordReader extends org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
A record reader for the interleaved FASTQ format. Reads over an input file and parses interleaved FASTQ read pairs into a single Text output. This is then fed into the FastqConverter, which converts the single Text instance into two Alignments.
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_MAX_READ_LENGTHDefault maximum read length,10,000bp.protected longendFirst index value beyond the slice, i.e.protected booleanisCompressedTrue if the underlying data is compressed.protected booleanisSplittableTrue if the underlying data is splittable.static StringMAX_READ_LENGTH_PROPERTYMaximum read length property name.protected longposCurrent position in file.
-
Constructor Summary
Constructors Modifier Constructor Description protectedFastqRecordReader(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.mapreduce.lib.input.FileSplit split)Builds a new record reader given a config file and an input split.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract booleancheckBuffer(int bufferLength, org.apache.hadoop.io.Text buffer)Checks to see whether the buffer is positioned at a valid record.voidclose()Close this RecordReader to future operations.VoidgetCurrentKey()FASTQ has no keys, so we return null.org.apache.hadoop.io.TextgetCurrentValue()Returns the last interleaved FASTQ record.floatgetProgress()How much of the input has the RecordReader consumed?voidinitialize(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)protected booleanlowLevelFastqRead(org.apache.hadoop.io.Text readName, org.apache.hadoop.io.Text value)Parses a read from an interleaved FASTQ file.protected StringmakePositionMessage()Produces a debugging message with the file position.protected abstract booleannext(org.apache.hadoop.io.Text value)Reads from the input split.booleannextKeyValue()Seeks ahead in our split to the next key-value pair.protected intpositionAtFirstRecord(org.apache.hadoop.fs.FSDataInputStream stream, org.apache.hadoop.io.compress.CompressionCodec codec)Position the input stream at the start of the first record.static voidsetMaxReadLength(org.apache.hadoop.conf.Configuration conf, int maxReadLength)Set the maximum read length property tomaxReadLength.
-
-
-
Field Detail
-
DEFAULT_MAX_READ_LENGTH
public static final int DEFAULT_MAX_READ_LENGTH
Default maximum read length,10,000bp.- See Also:
- Constant Field Values
-
MAX_READ_LENGTH_PROPERTY
public static final String MAX_READ_LENGTH_PROPERTY
Maximum read length property name.- See Also:
- Constant Field Values
-
end
protected long end
First index value beyond the slice, i.e. slice is in range [start,end).
-
pos
protected long pos
Current position in file.
-
isSplittable
protected boolean isSplittable
True if the underlying data is splittable.
-
isCompressed
protected boolean isCompressed
True if the underlying data is compressed.
-
-
Constructor Detail
-
FastqRecordReader
protected FastqRecordReader(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.mapreduce.lib.input.FileSplit split) throws IOExceptionBuilds a new record reader given a config file and an input split.- Parameters:
conf- The Hadoop configuration object. Used for gaining access to the underlying file system.split- The file split to read.- Throws:
IOException
-
-
Method Detail
-
setMaxReadLength
public static void setMaxReadLength(org.apache.hadoop.conf.Configuration conf, int maxReadLength)Set the maximum read length property tomaxReadLength.- Parameters:
conf- configurationmaxReadLength- maximum read length, in base pairs (bp)
-
checkBuffer
protected abstract boolean checkBuffer(int bufferLength, org.apache.hadoop.io.Text buffer)Checks to see whether the buffer is positioned at a valid record.- Parameters:
bufferLength- The length of the line currently in the buffer.buffer- A buffer containing a peek at the first line in the current stream.- Returns:
- Returns true if the buffer contains the first line of a properly formatted FASTQ record.
-
positionAtFirstRecord
protected final int positionAtFirstRecord(org.apache.hadoop.fs.FSDataInputStream stream, org.apache.hadoop.io.compress.CompressionCodec codec) throws IOExceptionPosition the input stream at the start of the first record.- Parameters:
stream- The stream to reposition.- Throws:
IOException
-
initialize
public final void initialize(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException- Specified by:
initializein classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>- Throws:
IOExceptionInterruptedException
-
getCurrentKey
public final Void getCurrentKey()
FASTQ has no keys, so we return null.- Specified by:
getCurrentKeyin classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>- Returns:
- Always returns null.
-
getCurrentValue
public final org.apache.hadoop.io.Text getCurrentValue()
Returns the last interleaved FASTQ record.- Specified by:
getCurrentValuein classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>- Returns:
- The text corresponding to the last read pair.
-
nextKeyValue
public final boolean nextKeyValue() throws IOException, InterruptedExceptionSeeks ahead in our split to the next key-value pair. Triggers the read of an interleaved FASTQ read pair, and populates internal state.- Specified by:
nextKeyValuein classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>- Returns:
- True if reading the next read pair succeeded.
- Throws:
IOExceptionInterruptedException
-
close
public final void close() throws IOExceptionClose this RecordReader to future operations.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>- Throws:
IOException
-
getProgress
public final float getProgress()
How much of the input has the RecordReader consumed?- Specified by:
getProgressin classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>- Returns:
- Returns a value on [0.0, 1.0] that notes how many bytes we have read so far out of the total bytes to read.
-
makePositionMessage
protected final String makePositionMessage()
Produces a debugging message with the file position.- Returns:
- Returns a string containing {filename}:{index}.
-
lowLevelFastqRead
protected final boolean lowLevelFastqRead(org.apache.hadoop.io.Text readName, org.apache.hadoop.io.Text value) throws IOExceptionParses a read from an interleaved FASTQ file. Only reads a single record.- Parameters:
readName- Text record containing read name. Output parameter.value- Text record containing full record. Output parameter.- Returns:
- Returns true if read was successful (did not hit EOF).
- Throws:
RuntimeException- Throws exception if FASTQ record doesn't have proper formatting (e.g., record doesn't start with @).IOException
-
next
protected abstract boolean next(org.apache.hadoop.io.Text value) throws IOExceptionReads from the input split.- Parameters:
value- Text record to write input value into.- Returns:
- Returns whether this read was successful or not.
- Throws:
IOException- See Also:
lowLevelFastqRead(Text, Text)
-
-