Class GuaguaMRRecordReader

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public class GuaguaMRRecordReader
    extends org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text>
    GuaguaMRRecordReader is used as a mock for mapreduce reader interface, not real reading data.

    To update progress, currentIteration and totalIterations should be set. currentIteration only can be set in GuaguaMapper.run.

    Why set currentIteration to static? The reason is that currentIteration for task cannot be transferred to #GuaguaRecordReader because of no API from MapperContext. So static field here is used to update current iteration.

    If currentIteration is not set in each iteration. It can only start from 0. This progress update doesn't work well for task fail-over(TODO).

    • Constructor Detail

      • GuaguaMRRecordReader

        public GuaguaMRRecordReader()
        Default constructor, totalIterations is set to default 0.
      • GuaguaMRRecordReader

        public GuaguaMRRecordReader​(int totalIterations)
        Constructor with totalIterations setting.
        Parameters:
        totalIterations - total iterations for such guagua job.
    • Method Detail

      • close

        public void close()
                   throws IOException
        Specified by:
        close in interface AutoCloseable
        Specified by:
        close in interface Closeable
        Specified by:
        close in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text>
        Throws:
        IOException
      • getProgress

        public float getProgress()
                          throws IOException
        Each iteration context.nextKeyValue should be called, and currentIteration is updated, so the progress is updated.
        Specified by:
        getProgress in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text>
        Throws:
        IOException
      • getCurrentKey

        public org.apache.hadoop.io.LongWritable getCurrentKey()
                                                        throws IOException,
                                                               InterruptedException
        This is a mock to hide Hadoop raw map iteration on map input key.
        Specified by:
        getCurrentKey in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text>
        Throws:
        IOException
        InterruptedException
      • getCurrentValue

        public org.apache.hadoop.io.Text getCurrentValue()
                                                  throws IOException,
                                                         InterruptedException
        This is a mock to hide Hadoop raw map iteration on map input value.
        Specified by:
        getCurrentValue in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text>
        Throws:
        IOException
        InterruptedException
      • initialize

        public void initialize​(org.apache.hadoop.mapreduce.InputSplit inputSplit,
                               org.apache.hadoop.mapreduce.TaskAttemptContext context)
                        throws IOException,
                               InterruptedException
        Specified by:
        initialize in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text>
        Throws:
        IOException
        InterruptedException
      • nextKeyValue

        public boolean nextKeyValue()
                             throws IOException,
                                    InterruptedException
        Update iteration number. This is called for each iteration once. It is used to update Hadoop job progress more precisely.
        Specified by:
        nextKeyValue in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text>
        Throws:
        IOException
        InterruptedException
      • setCurrentIteration

        public static void setCurrentIteration​(int currentIteration)
        Should only be called in GuaguaMapper Progress callback.