|
|||||||||
| 上一个类 下一个类 | 框架 无框架 | ||||||||
| 摘要: 嵌套 | 字段 | 构造方法 | 方法 | 详细信息: 字段 | 构造方法 | 方法 | ||||||||
java.lang.Objectorg.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
org.apache.hadoop.mapreduce.lib.input.TextInputFormat
ml.shifu.guagua.mapreduce.GuaguaInputFormat
public class GuaguaInputFormat
GuaguaInputFormat is used to determine how many mappers in guagua MapReduce job.
In getSplits(JobContext), we add a GuaguaInputSplit instance as a master, others
are workers. These make sure one master and multiple workers are started as mapper tasks.
If multiple masters are needed, add new GuaguaInputSplit in getSplits(JobContext).
But sometimes fail-over on multiple masters is not good as master task restarting by hadoop mapper task fail over.
Since in multiple masters case: if one master is down, zookeeper will wait for session timeout setting to find failed
master. If session timeout is two large, it may be larger than hadoop restarting a task.
By default guagua depends on hadoop default splits implementation, while guagua also provide a mechanism to support
combining several splits together. Set GuaguaConstants.GUAGUA_SPLIT_COMBINABLE to true and
GuaguaConstants.GUAGUA_SPLIT_MAX_COMBINED_SPLIT_SIZE to a number to make splits combine to a given number.
-Dguagua.split.combinable=true -Dguagua.split.maxCombinedSplitSiz=268435456
| 构造方法摘要 | |
|---|---|
GuaguaInputFormat()
|
|
| 方法摘要 | |
|---|---|
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
|
static List<List<org.apache.hadoop.mapreduce.InputSplit>> |
getCombineGuaguaSplits(List<org.apache.hadoop.mapreduce.InputSplit> oneInputSplits,
long maxCombinedSplitSize)
|
protected List<org.apache.hadoop.mapreduce.InputSplit> |
getFinalCombineGuaguaSplits(List<org.apache.hadoop.mapreduce.InputSplit> newSplits,
long combineSize)
Copy from pig implementation, need to check this code logic. |
protected List<org.apache.hadoop.mapreduce.InputSplit> |
getGuaguaSplits(org.apache.hadoop.mapreduce.JobContext job)
Generate the list of files and make them into FileSplits. |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext job)
Splitter building logic including master setting, also includes combining input feature like Pig. |
protected boolean |
isPigOrHadoopMetaFile(org.apache.hadoop.fs.Path path)
Whether it is not pig or hadoop meta output file. |
protected boolean |
isSplitable(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path file)
|
| 从类 org.apache.hadoop.mapreduce.lib.input.FileInputFormat 继承的方法 |
|---|
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
| 从类 java.lang.Object 继承的方法 |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| 构造方法详细信息 |
|---|
public GuaguaInputFormat()
| 方法详细信息 |
|---|
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job)
throws IOException
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> 中的 getSplitsIOException
protected List<org.apache.hadoop.mapreduce.InputSplit> getFinalCombineGuaguaSplits(List<org.apache.hadoop.mapreduce.InputSplit> newSplits,
long combineSize)
throws IOException
IOException
protected List<org.apache.hadoop.mapreduce.InputSplit> getGuaguaSplits(org.apache.hadoop.mapreduce.JobContext job)
throws IOException
IOException
public static List<List<org.apache.hadoop.mapreduce.InputSplit>> getCombineGuaguaSplits(List<org.apache.hadoop.mapreduce.InputSplit> oneInputSplits,
long maxCombinedSplitSize)
throws IOException,
InterruptedException
IOException
InterruptedExceptionprotected boolean isPigOrHadoopMetaFile(org.apache.hadoop.fs.Path path)
protected boolean isSplitable(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path file)
org.apache.hadoop.mapreduce.lib.input.TextInputFormat 中的 isSplitable
public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
org.apache.hadoop.mapreduce.lib.input.TextInputFormat 中的 createRecordReader
|
|||||||||
| 上一个类 下一个类 | 框架 无框架 | ||||||||
| 摘要: 嵌套 | 字段 | 构造方法 | 方法 | 详细信息: 字段 | 构造方法 | 方法 | ||||||||