public class GuaguaInputFormat
extends org.apache.hadoop.mapreduce.lib.input.TextInputFormat
GuaguaInputFormat is used to determine how many mappers in guagua MapReduce job.
In getSplits(JobContext), we add a GuaguaInputSplit instance as a master, others
are workers. These make sure one master and multiple workers are started as mapper tasks.
If multiple masters are needed, add new GuaguaInputSplit in getSplits(JobContext).
But sometimes fail-over on multiple masters is not good as master task restarting by hadoop mapper task fail over.
Since in multiple masters case: if one master is down, zookeeper will wait for session timeout setting to find failed
master. If session timeout is two large, it may be larger than hadoop restarting a task.
By default guagua depends on hadoop default splits implementation, while guagua also provide a mechanism to support
combining several splits together. Set GuaguaConstants.GUAGUA_SPLIT_COMBINABLE to true and
GuaguaConstants.GUAGUA_SPLIT_MAX_COMBINED_SPLIT_SIZE to a number to make splits combine to a given number.
-Dguagua.split.combinable=true -Dguagua.split.maxCombinedSplitSiz=268435456
| Constructor and Description |
|---|
GuaguaInputFormat() |
| Modifier and Type | Method and Description |
|---|---|
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context) |
static List<List<org.apache.hadoop.mapreduce.InputSplit>> |
getCombineGuaguaSplits(List<org.apache.hadoop.mapreduce.InputSplit> oneInputSplits,
long maxCombinedSplitSize) |
protected List<org.apache.hadoop.mapreduce.InputSplit> |
getFinalCombineGuaguaSplits(List<org.apache.hadoop.mapreduce.InputSplit> newSplits,
long combineSize)
Copy from pig implementation, need to check this code logic.
|
protected List<org.apache.hadoop.mapreduce.InputSplit> |
getGuaguaSplits(org.apache.hadoop.mapreduce.JobContext job)
Generate the list of files and make them into FileSplits.
|
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext job)
Splitter building logic including master setting, also includes combining input feature like Pig.
|
protected boolean |
isPigOrHadoopMetaFile(org.apache.hadoop.fs.Path path)
Whether it is not pig or hadoop meta output file.
|
protected boolean |
isSplitable(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path file) |
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSizepublic List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job) throws IOException
getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>IOExceptionprotected List<org.apache.hadoop.mapreduce.InputSplit> getFinalCombineGuaguaSplits(List<org.apache.hadoop.mapreduce.InputSplit> newSplits, long combineSize) throws IOException
IOExceptionprotected List<org.apache.hadoop.mapreduce.InputSplit> getGuaguaSplits(org.apache.hadoop.mapreduce.JobContext job) throws IOException
IOExceptionpublic static List<List<org.apache.hadoop.mapreduce.InputSplit>> getCombineGuaguaSplits(List<org.apache.hadoop.mapreduce.InputSplit> oneInputSplits, long maxCombinedSplitSize) throws IOException, InterruptedException
IOExceptionInterruptedExceptionprotected boolean isPigOrHadoopMetaFile(org.apache.hadoop.fs.Path path)
protected boolean isSplitable(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path file)
isSplitable in class org.apache.hadoop.mapreduce.lib.input.TextInputFormatpublic org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
createRecordReader in class org.apache.hadoop.mapreduce.lib.input.TextInputFormatCopyright © 2019. All Rights Reserved.