C D G I M O P S V W 
All Classes All Packages

C

collectExpiredContentsAsDataSet(String) - Method in class org.projectnessie.gc.base.IdentifiedResultsRepo
Collect the expired contents for the given run id as spark dataset.
computeLiveContentsFunc(long, Map<String, Instant>) - Method in class org.projectnessie.gc.base.IdentifyContentsPerExecutor
 
ContentBloomFilter - Class in org.projectnessie.gc.base
A utility class wrapping bloom filter functionality.
ContentBloomFilter(long, double) - Constructor for class org.projectnessie.gc.base.ContentBloomFilter
 

D

deserializeReference(String) - Static method in class org.projectnessie.gc.base.GCUtil
Deserialize JSON String to Reference object.
DistributedIdentifyContents - Class in org.projectnessie.gc.base
Identify the expired and live contents in a distributed way using the spark and bloom filter by walking all the references (both dead and live).
DistributedIdentifyContents(SparkSession, GCParams) - Constructor for class org.projectnessie.gc.base.DistributedIdentifyContents
 

G

GCImpl - Class in org.projectnessie.gc.base
Encapsulates the logic to retrieve expired contents by walking over all commits in all named-references.
GCImpl(GCParams) - Constructor for class org.projectnessie.gc.base.GCImpl
Instantiates a new GCImpl.
GCParams - Interface in org.projectnessie.gc.base
Config params for GC.
GCUtil - Class in org.projectnessie.gc.base
 
getBloomFilterExpectedEntries() - Method in interface org.projectnessie.gc.base.GCParams
Optional bloom filter expected live commits entries per reference.
getBloomFilterFpp() - Method in interface org.projectnessie.gc.base.GCParams
Optional bloom filter fpp.
getCommitProtectionDuration() - Method in interface org.projectnessie.gc.base.GCParams
Commit protection duration to avoid expiring on going or recent commits.
getCutOffTimestampPerRef() - Method in interface org.projectnessie.gc.base.GCParams
Optional cutoff time per live reference.
getDeadReferenceCutOffTimeStamp() - Method in interface org.projectnessie.gc.base.GCParams
Optional cutoff time for all the dead references.
getDefaultCutOffTimestamp() - Method in interface org.projectnessie.gc.base.GCParams
Default cutoff time for all the references.
getExpectedFpp() - Method in class org.projectnessie.gc.base.ContentBloomFilter
 
getExpiredContentRowsFunc(Map<String, ContentBloomFilter>, String, Timestamp) - Method in class org.projectnessie.gc.base.IdentifyContentsPerExecutor
 
getInstantFromMicros(long) - Static method in class org.projectnessie.gc.base.GCUtil
 
getLatestCompletedRunID() - Method in class org.projectnessie.gc.base.IdentifiedResultsRepo
 
getLiveContentsBloomFilters(List<String>, long, Map<String, Instant>) - Method in class org.projectnessie.gc.base.DistributedIdentifyContents
Compute the bloom filter per content id by walking all the live references in a distributed way using spark.
getNessieCatalogName() - Method in interface org.projectnessie.gc.base.GCParams
Nessie catalog name to be used with spark to create the output results table.
getNessieClientConfigs() - Method in interface org.projectnessie.gc.base.GCParams
Nessie client configurations from NessieConfigConstants.
getOutputBranchName() - Method in interface org.projectnessie.gc.base.GCParams
Branch's name to be used for creating the output table.
getOutputTableIdentifier() - Method in interface org.projectnessie.gc.base.GCParams
Output table identifier (namespace and table name) to be used for storing the results in GCParams.getOutputBranchName().
getSchema() - Method in class org.projectnessie.gc.base.IdentifiedResultsRepo
 
getSparkPartitionsCount() - Method in interface org.projectnessie.gc.base.GCParams
Optional spark partitions count to be used for distributing references.

I

IdentifiedResultsRepo - Class in org.projectnessie.gc.base
DDL + DML functionality for the "IdentifiedResult" table.
IdentifiedResultsRepo(SparkSession, String, String, String) - Constructor for class org.projectnessie.gc.base.IdentifiedResultsRepo
 
IdentifyContentsPerExecutor - Class in org.projectnessie.gc.base
Contains the methods that executes in spark executor for GCImpl.identifyExpiredContents(SparkSession).
IdentifyContentsPerExecutor(GCParams) - Constructor for class org.projectnessie.gc.base.IdentifyContentsPerExecutor
 
identifyExpiredContents(Map<String, ContentBloomFilter>, List<String>) - Method in class org.projectnessie.gc.base.DistributedIdentifyContents
Gets the expired contents per content id by walking all the live and dead references in a distributed way using spark and checking the contents against the live bloom filter results.
identifyExpiredContents(SparkSession) - Method in class org.projectnessie.gc.base.GCImpl
Identify the expired contents using a two-step traversal algorithm.

M

merge(ContentBloomFilter) - Method in class org.projectnessie.gc.base.ContentBloomFilter
 
mightContain(Content) - Method in class org.projectnessie.gc.base.ContentBloomFilter
 

O

org.projectnessie.gc.base - package org.projectnessie.gc.base
 

P

put(Content) - Method in class org.projectnessie.gc.base.ContentBloomFilter
 

S

SerializableFunction1<T,​U> - Interface in org.projectnessie.gc.base
Interface that makes scala.Function1 as Serializable.
serializeReference(Reference) - Static method in class org.projectnessie.gc.base.GCUtil
Serialize Reference object using JSON Serialization.

V

validate() - Method in interface org.projectnessie.gc.base.GCParams
 

W

wasMerged() - Method in class org.projectnessie.gc.base.ContentBloomFilter
A merged bloomfilter might indicate decreased filter quality.
C D G I M O P S V W 
All Classes All Packages