public abstract class FastqRecordReader extends org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
| Modifier and Type | Field and Description |
|---|---|
static int |
DEFAULT_MAX_READ_LENGTH
Default maximum read length,
10,000 bp. |
protected long |
end
First index value beyond the slice, i.e.
|
protected boolean |
isCompressed
True if the underlying data is compressed.
|
protected boolean |
isSplittable
True if the underlying data is splittable.
|
static String |
MAX_READ_LENGTH_PROPERTY
Maximum read length property name.
|
protected long |
pos
Current position in file.
|
| Modifier | Constructor and Description |
|---|---|
protected |
FastqRecordReader(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.mapreduce.lib.input.FileSplit split)
Builds a new record reader given a config file and an input split.
|
| Modifier and Type | Method and Description |
|---|---|
protected abstract boolean |
checkBuffer(int bufferLength,
org.apache.hadoop.io.Text buffer)
Checks to see whether the buffer is positioned at a valid record.
|
void |
close()
Close this RecordReader to future operations.
|
Void |
getCurrentKey()
FASTQ has no keys, so we return null.
|
org.apache.hadoop.io.Text |
getCurrentValue()
Returns the last interleaved FASTQ record.
|
float |
getProgress()
How much of the input has the RecordReader consumed?
|
void |
initialize(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context) |
protected boolean |
lowLevelFastqRead(org.apache.hadoop.io.Text readName,
org.apache.hadoop.io.Text value)
Parses a read from an interleaved FASTQ file.
|
protected String |
makePositionMessage()
Produces a debugging message with the file position.
|
protected abstract boolean |
next(org.apache.hadoop.io.Text value)
Reads from the input split.
|
boolean |
nextKeyValue()
Seeks ahead in our split to the next key-value pair.
|
protected int |
positionAtFirstRecord(org.apache.hadoop.fs.FSDataInputStream stream,
org.apache.hadoop.io.compress.CompressionCodec codec)
Position the input stream at the start of the first record.
|
static void |
setMaxReadLength(org.apache.hadoop.conf.Configuration conf,
int maxReadLength)
Set the maximum read length property to
maxReadLength. |
public static final int DEFAULT_MAX_READ_LENGTH
10,000 bp.public static final String MAX_READ_LENGTH_PROPERTY
protected long end
protected long pos
protected boolean isSplittable
protected boolean isCompressed
protected FastqRecordReader(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.mapreduce.lib.input.FileSplit split)
throws IOException
conf - The Hadoop configuration object. Used for gaining access
to the underlying file system.split - The file split to read.IOExceptionpublic static void setMaxReadLength(org.apache.hadoop.conf.Configuration conf,
int maxReadLength)
maxReadLength.conf - configurationmaxReadLength - maximum read length, in base pairs (bp)protected abstract boolean checkBuffer(int bufferLength,
org.apache.hadoop.io.Text buffer)
bufferLength - The length of the line currently in the buffer.buffer - A buffer containing a peek at the first line in the current
stream.protected final int positionAtFirstRecord(org.apache.hadoop.fs.FSDataInputStream stream,
org.apache.hadoop.io.compress.CompressionCodec codec)
throws IOException
stream - The stream to reposition.IOExceptionpublic final void initialize(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException,
InterruptedException
initialize in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>IOExceptionInterruptedExceptionpublic final Void getCurrentKey()
getCurrentKey in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>public final org.apache.hadoop.io.Text getCurrentValue()
getCurrentValue in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>public final boolean nextKeyValue()
throws IOException,
InterruptedException
nextKeyValue in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>IOExceptionInterruptedExceptionpublic final void close()
throws IOException
close in interface Closeableclose in interface AutoCloseableclose in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>IOExceptionpublic final float getProgress()
getProgress in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>protected final String makePositionMessage()
protected final boolean lowLevelFastqRead(org.apache.hadoop.io.Text readName,
org.apache.hadoop.io.Text value)
throws IOException
readName - Text record containing read name. Output parameter.value - Text record containing full record. Output parameter.RuntimeException - Throws exception if FASTQ record doesn't
have proper formatting (e.g., record doesn't start with @).IOExceptionprotected abstract boolean next(org.apache.hadoop.io.Text value)
throws IOException
value - Text record to write input value into.IOExceptionlowLevelFastqRead(Text, Text)Copyright © 2019. All rights reserved.