Package org.projectnessie.gc.base
Class DistributedIdentifyContents
- java.lang.Object
-
- org.projectnessie.gc.base.DistributedIdentifyContents
-
public class DistributedIdentifyContents extends Object
Identify the expired and live contents in a distributed way using the spark and bloom filter by walking all the references (both dead and live).
-
-
Constructor Summary
Constructors Constructor Description DistributedIdentifyContents(org.apache.spark.sql.SparkSession session, GCParams gcParams)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description IdentifiedResultgetIdentifiedResults(Map<String,ContentBloomFilter> liveContentsBloomFilterMap, List<org.projectnessie.model.Reference> references)Gets the expired contents per content id by walking all the live and dead references in a distributed way using spark and checking the contents against the live bloom filter results.Map<String,ContentBloomFilter>getLiveContentsBloomFilters(List<org.projectnessie.model.Reference> references, long bloomFilterSize, Map<org.projectnessie.model.Reference,Instant> droppedRefTimeMap)Compute the bloom filter per content id by walking all the live references in a distributed way using spark.
-
-
-
Constructor Detail
-
DistributedIdentifyContents
public DistributedIdentifyContents(org.apache.spark.sql.SparkSession session, GCParams gcParams)
-
-
Method Detail
-
getLiveContentsBloomFilters
public Map<String,ContentBloomFilter> getLiveContentsBloomFilters(List<org.projectnessie.model.Reference> references, long bloomFilterSize, Map<org.projectnessie.model.Reference,Instant> droppedRefTimeMap)
Compute the bloom filter per content id by walking all the live references in a distributed way using spark.- Parameters:
references- list of all the referencesbloomFilterSize- size of bloom filter to be useddroppedRefTimeMap- map of dropped time for reference@hash- Returns:
- map of
ContentBloomFilterper content-id.
-
getIdentifiedResults
public IdentifiedResult getIdentifiedResults(Map<String,ContentBloomFilter> liveContentsBloomFilterMap, List<org.projectnessie.model.Reference> references)
Gets the expired contents per content id by walking all the live and dead references in a distributed way using spark and checking the contents against the live bloom filter results.- Parameters:
liveContentsBloomFilterMap- live contents bloom filter per content id.references- list of all the references to walk (live and dead)- Returns:
IdentifiedResultobject.
-
-