ml.shifu.guagua.mapreduce
类 GuaguaMRRecordReader

java.lang.Object
  继承者 org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
      继承者 ml.shifu.guagua.mapreduce.GuaguaMRRecordReader
所有已实现的接口:
Closeable

public class GuaguaMRRecordReader
extends org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>

GuaguaMRRecordReader is used as a mock for mapreduce reader interface, not real reading data.

To update progress, currentIteration and totalIterations should be set. currentIteration only can be set in GuaguaMapper.run.

Why set currentIteration to static? The reason is that currentIteration for task cannot be transferred to #GuaguaRecordReader because of no API from MapperContext. So static field here is used to update current iteration.

If currentIteration is not set in each iteration. It can only start from 0. This progress update doesn't work well for task fail-over(TODO).


构造方法摘要
GuaguaMRRecordReader()
          Default constructor, totalIterations is set to default 0.
GuaguaMRRecordReader(int totalIterations)
          Constructor with totalIterations setting.
 
方法摘要
 void close()
           
 org.apache.hadoop.io.LongWritable getCurrentKey()
          This is a mock to hide Hadoop raw map iteration on map input key.
 org.apache.hadoop.io.Text getCurrentValue()
          This is a mock to hide Hadoop raw map iteration on map input value.
 float getProgress()
          Each iteration context.nextKeyValue should be called, and currentIteration is updated, so the progress is updated.
 void initialize(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
 boolean nextKeyValue()
          Update iteration number.
static void setCurrentIteration(int currentIteration)
          Should only be called in GuaguaMapper Progress callback.
 
从类 java.lang.Object 继承的方法
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

构造方法详细信息

GuaguaMRRecordReader

public GuaguaMRRecordReader()
Default constructor, totalIterations is set to default 0.


GuaguaMRRecordReader

public GuaguaMRRecordReader(int totalIterations)
Constructor with totalIterations setting.

参数:
totalIterations - total iterations for such guagua job.
方法详细信息

close

public void close()
           throws IOException
指定者:
接口 Closeable 中的 close
指定者:
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> 中的 close
抛出:
IOException

getProgress

public float getProgress()
                  throws IOException
Each iteration context.nextKeyValue should be called, and currentIteration is updated, so the progress is updated.

指定者:
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> 中的 getProgress
抛出:
IOException

getCurrentKey

public org.apache.hadoop.io.LongWritable getCurrentKey()
                                                throws IOException,
                                                       InterruptedException
This is a mock to hide Hadoop raw map iteration on map input key.

指定者:
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> 中的 getCurrentKey
抛出:
IOException
InterruptedException

getCurrentValue

public org.apache.hadoop.io.Text getCurrentValue()
                                          throws IOException,
                                                 InterruptedException
This is a mock to hide Hadoop raw map iteration on map input value.

指定者:
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> 中的 getCurrentValue
抛出:
IOException
InterruptedException

initialize

public void initialize(org.apache.hadoop.mapreduce.InputSplit inputSplit,
                       org.apache.hadoop.mapreduce.TaskAttemptContext context)
                throws IOException,
                       InterruptedException
指定者:
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> 中的 initialize
抛出:
IOException
InterruptedException

nextKeyValue

public boolean nextKeyValue()
                     throws IOException,
                            InterruptedException
Update iteration number. This is called for each iteration once. It is used to update Hadoop job progress more precisely.

指定者:
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> 中的 nextKeyValue
抛出:
IOException
InterruptedException

setCurrentIteration

public static void setCurrentIteration(int currentIteration)
Should only be called in GuaguaMapper Progress callback.



Copyright © 2014. All Rights Reserved.