Package ml.shifu.guagua.mapreduce
Class GuaguaMRRecordReader
- java.lang.Object
-
- org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
-
- ml.shifu.guagua.mapreduce.GuaguaMRRecordReader
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public class GuaguaMRRecordReader extends org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>GuaguaMRRecordReaderis used as a mock for mapreduce reader interface, not real reading data.To update progress,
currentIterationandtotalIterationsshould be set.currentIterationonly can be set in GuaguaMapper.run.Why set
currentIterationto static? The reason is that currentIteration for task cannot be transferred to#GuaguaRecordReaderbecause of no API from MapperContext. So static field here is used to update current iteration.If
currentIterationis not set in each iteration. It can only start from 0. This progress update doesn't work well for task fail-over(TODO).
-
-
Constructor Summary
Constructors Constructor Description GuaguaMRRecordReader()Default constructor,totalIterationsis set to default 0.GuaguaMRRecordReader(int totalIterations)Constructor withtotalIterationssetting.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()org.apache.hadoop.io.LongWritablegetCurrentKey()This is a mock to hide Hadoop raw map iteration on map input key.org.apache.hadoop.io.TextgetCurrentValue()This is a mock to hide Hadoop raw map iteration on map input value.floatgetProgress()Each iterationcontext.nextKeyValueshould be called, and currentIteration is updated, so the progress is updated.voidinitialize(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)booleannextKeyValue()Update iteration number.static voidsetCurrentIteration(int currentIteration)Should only be called in GuaguaMapper Progress callback.
-
-
-
Constructor Detail
-
GuaguaMRRecordReader
public GuaguaMRRecordReader()
Default constructor,totalIterationsis set to default 0.
-
GuaguaMRRecordReader
public GuaguaMRRecordReader(int totalIterations)
Constructor withtotalIterationssetting.- Parameters:
totalIterations- total iterations for such guagua job.
-
-
Method Detail
-
close
public void close() throws IOException- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein classorg.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>- Throws:
IOException
-
getProgress
public float getProgress() throws IOExceptionEach iterationcontext.nextKeyValueshould be called, and currentIteration is updated, so the progress is updated.- Specified by:
getProgressin classorg.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>- Throws:
IOException
-
getCurrentKey
public org.apache.hadoop.io.LongWritable getCurrentKey() throws IOException, InterruptedExceptionThis is a mock to hide Hadoop raw map iteration on map input key.- Specified by:
getCurrentKeyin classorg.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>- Throws:
IOExceptionInterruptedException
-
getCurrentValue
public org.apache.hadoop.io.Text getCurrentValue() throws IOException, InterruptedExceptionThis is a mock to hide Hadoop raw map iteration on map input value.- Specified by:
getCurrentValuein classorg.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>- Throws:
IOExceptionInterruptedException
-
initialize
public void initialize(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException- Specified by:
initializein classorg.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>- Throws:
IOExceptionInterruptedException
-
nextKeyValue
public boolean nextKeyValue() throws IOException, InterruptedExceptionUpdate iteration number. This is called for each iteration once. It is used to update Hadoop job progress more precisely.- Specified by:
nextKeyValuein classorg.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>- Throws:
IOExceptionInterruptedException
-
setCurrentIteration
public static void setCurrentIteration(int currentIteration)
Should only be called in GuaguaMapper Progress callback.
-
-