Class DistributedIdentifyContents


  • public class DistributedIdentifyContents
    extends java.lang.Object
    Identify the expired and live contents in a distributed way using the spark and bloom filter by walking all the references (both dead and live).
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.Map<java.lang.String,​ContentBloomFilter> getLiveContentsBloomFilters​(java.util.List<java.lang.String> references, long bloomFilterSize, java.util.Map<java.lang.String,​java.time.Instant> droppedRefTimeMap)
      Compute the bloom filter per content id by walking all the live references in a distributed way using spark.
      java.lang.String identifyExpiredContents​(java.util.Map<java.lang.String,​ContentBloomFilter> liveContentsBloomFilterMap, java.util.List<java.lang.String> references)
      Gets the expired contents per content id by walking all the live and dead references in a distributed way using spark and checking the contents against the live bloom filter results.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • DistributedIdentifyContents

        public DistributedIdentifyContents​(org.apache.spark.sql.SparkSession session,
                                           GCParams gcParams)
    • Method Detail

      • getLiveContentsBloomFilters

        public java.util.Map<java.lang.String,​ContentBloomFilter> getLiveContentsBloomFilters​(java.util.List<java.lang.String> references,
                                                                                                    long bloomFilterSize,
                                                                                                    java.util.Map<java.lang.String,​java.time.Instant> droppedRefTimeMap)
        Compute the bloom filter per content id by walking all the live references in a distributed way using spark.
        Parameters:
        references - list of all the references (JSON serialized)
        bloomFilterSize - size of bloom filter to be used
        droppedRefTimeMap - map of dropped time for reference@hash (JSON serialized)
        Returns:
        map of ContentBloomFilter per content-id.
      • identifyExpiredContents

        public java.lang.String identifyExpiredContents​(java.util.Map<java.lang.String,​ContentBloomFilter> liveContentsBloomFilterMap,
                                                        java.util.List<java.lang.String> references)
        Gets the expired contents per content id by walking all the live and dead references in a distributed way using spark and checking the contents against the live bloom filter results.
        Parameters:
        liveContentsBloomFilterMap - live contents bloom filter per content id.
        references - list of all the references (JSON serialized) to walk (live and dead)
        Returns:
        current run id of the completed gc task