Class KMeansWorker

  • All Implemented Interfaces:
    ml.shifu.guagua.worker.WorkerComputable<KMeansMasterParams,​KMeansWorkerParams>

    public class KMeansWorker
    extends ml.shifu.guagua.worker.AbstractWorkerComputable<KMeansMasterParams,​KMeansWorkerParams,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.LongWritable>,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.Text>>
    KMeansWorker re-computes each record tagged with new category.

    To calculate new k centers in master, KMeansWorker also help to accumulate worker info for new k centers by using sum list and count list.

    • Constructor Detail

      • KMeansWorker

        public KMeansWorker()
    • Method Detail

      • initRecordReader

        public void initRecordReader​(ml.shifu.guagua.io.GuaguaFileSplit fileSplit)
                              throws IOException
        Reading input line by line
        Specified by:
        initRecordReader in class ml.shifu.guagua.worker.AbstractWorkerComputable<KMeansMasterParams,​KMeansWorkerParams,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.LongWritable>,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.Text>>
        Throws:
        IOException
      • init

        public void init​(ml.shifu.guagua.worker.WorkerContext<KMeansMasterParams,​KMeansWorkerParams> context)
        Specified by:
        init in class ml.shifu.guagua.worker.AbstractWorkerComputable<KMeansMasterParams,​KMeansWorkerParams,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.LongWritable>,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.Text>>
      • doCompute

        public KMeansWorkerParams doCompute​(ml.shifu.guagua.worker.WorkerContext<KMeansMasterParams,​KMeansWorkerParams> context)
        Using the new k centers to tag each record with index denoting the record belongs to which category.
        Specified by:
        doCompute in class ml.shifu.guagua.worker.AbstractWorkerComputable<KMeansMasterParams,​KMeansWorkerParams,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.LongWritable>,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.Text>>
      • postLoad

        protected void postLoad​(ml.shifu.guagua.worker.WorkerContext<KMeansMasterParams,​KMeansWorkerParams> context)
        Overrides:
        postLoad in class ml.shifu.guagua.worker.AbstractWorkerComputable<KMeansMasterParams,​KMeansWorkerParams,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.LongWritable>,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.Text>>
      • load

        public void load​(ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.LongWritable> currentKey,
                         ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.Text> currentValue,
                         ml.shifu.guagua.worker.WorkerContext<KMeansMasterParams,​KMeansWorkerParams> workerContext)
        Loading data into memory. any invalid data will be set to null.
        Specified by:
        load in class ml.shifu.guagua.worker.AbstractWorkerComputable<KMeansMasterParams,​KMeansWorkerParams,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.LongWritable>,​ml.shifu.guagua.hadoop.io.GuaguaWritableAdapter<org.apache.hadoop.io.Text>>