C D G I M O P S V W
All Classes All Packages
All Classes All Packages
All Classes All Packages
C
- collectExpiredContentsAsDataSet(String) - Method in class org.projectnessie.gc.base.IdentifiedResultsRepo
-
Collect the expired contents for the given run id as spark dataset.
- computeLiveContentsFunc(long, Map<String, Instant>) - Method in class org.projectnessie.gc.base.IdentifyContentsPerExecutor
- ContentBloomFilter - Class in org.projectnessie.gc.base
-
A utility class wrapping bloom filter functionality.
- ContentBloomFilter(long, double) - Constructor for class org.projectnessie.gc.base.ContentBloomFilter
D
- deserializeReference(String) - Static method in class org.projectnessie.gc.base.GCUtil
-
Deserialize JSON String to
Referenceobject. - DistributedIdentifyContents - Class in org.projectnessie.gc.base
-
Identify the expired and live contents in a distributed way using the spark and bloom filter by walking all the references (both dead and live).
- DistributedIdentifyContents(SparkSession, GCParams) - Constructor for class org.projectnessie.gc.base.DistributedIdentifyContents
G
- GCImpl - Class in org.projectnessie.gc.base
-
Encapsulates the logic to retrieve expired contents by walking over all commits in all named-references.
- GCImpl(GCParams) - Constructor for class org.projectnessie.gc.base.GCImpl
-
Instantiates a new GCImpl.
- GCParams - Interface in org.projectnessie.gc.base
-
Config params for GC.
- GCUtil - Class in org.projectnessie.gc.base
- getBloomFilterExpectedEntries() - Method in interface org.projectnessie.gc.base.GCParams
-
Optional bloom filter expected live commits entries per reference.
- getBloomFilterFpp() - Method in interface org.projectnessie.gc.base.GCParams
-
Optional bloom filter fpp.
- getCommitProtectionDuration() - Method in interface org.projectnessie.gc.base.GCParams
-
Commit protection duration to avoid expiring on going or recent commits.
- getCutOffTimestampPerRef() - Method in interface org.projectnessie.gc.base.GCParams
-
Optional cutoff time per live reference.
- getDeadReferenceCutOffTimeStamp() - Method in interface org.projectnessie.gc.base.GCParams
-
Optional cutoff time for all the dead references.
- getDefaultCutOffTimestamp() - Method in interface org.projectnessie.gc.base.GCParams
-
Default cutoff time for all the references.
- getExpectedFpp() - Method in class org.projectnessie.gc.base.ContentBloomFilter
- getExpiredContentRowsFunc(Map<String, ContentBloomFilter>, String, Timestamp) - Method in class org.projectnessie.gc.base.IdentifyContentsPerExecutor
- getInstantFromMicros(long) - Static method in class org.projectnessie.gc.base.GCUtil
- getLatestCompletedRunID() - Method in class org.projectnessie.gc.base.IdentifiedResultsRepo
- getLiveContentsBloomFilters(List<String>, long, Map<String, Instant>) - Method in class org.projectnessie.gc.base.DistributedIdentifyContents
-
Compute the bloom filter per content id by walking all the live references in a distributed way using spark.
- getNessieCatalogName() - Method in interface org.projectnessie.gc.base.GCParams
-
Nessie catalog name to be used with spark to create the output results table.
- getNessieClientConfigs() - Method in interface org.projectnessie.gc.base.GCParams
-
Nessie client configurations from
NessieConfigConstants. - getOutputBranchName() - Method in interface org.projectnessie.gc.base.GCParams
-
Branch's name to be used for creating the output table.
- getOutputTableIdentifier() - Method in interface org.projectnessie.gc.base.GCParams
-
Output table identifier (namespace and table name) to be used for storing the results in
GCParams.getOutputBranchName(). - getSchema() - Method in class org.projectnessie.gc.base.IdentifiedResultsRepo
- getSparkPartitionsCount() - Method in interface org.projectnessie.gc.base.GCParams
-
Optional spark partitions count to be used for distributing references.
I
- IdentifiedResultsRepo - Class in org.projectnessie.gc.base
-
DDL + DML functionality for the "IdentifiedResult" table.
- IdentifiedResultsRepo(SparkSession, String, String, String) - Constructor for class org.projectnessie.gc.base.IdentifiedResultsRepo
- IdentifyContentsPerExecutor - Class in org.projectnessie.gc.base
-
Contains the methods that executes in spark executor for
GCImpl.identifyExpiredContents(SparkSession). - IdentifyContentsPerExecutor(GCParams) - Constructor for class org.projectnessie.gc.base.IdentifyContentsPerExecutor
- identifyExpiredContents(Map<String, ContentBloomFilter>, List<String>) - Method in class org.projectnessie.gc.base.DistributedIdentifyContents
-
Gets the expired contents per content id by walking all the live and dead references in a distributed way using spark and checking the contents against the live bloom filter results.
- identifyExpiredContents(SparkSession) - Method in class org.projectnessie.gc.base.GCImpl
-
Identify the expired contents using a two-step traversal algorithm.
M
- merge(ContentBloomFilter) - Method in class org.projectnessie.gc.base.ContentBloomFilter
- mightContain(Content) - Method in class org.projectnessie.gc.base.ContentBloomFilter
O
- org.projectnessie.gc.base - package org.projectnessie.gc.base
P
- put(Content) - Method in class org.projectnessie.gc.base.ContentBloomFilter
S
- SerializableFunction1<T,U> - Interface in org.projectnessie.gc.base
-
Interface that makes scala.Function1 as Serializable.
- serializeReference(Reference) - Static method in class org.projectnessie.gc.base.GCUtil
-
Serialize
Referenceobject using JSON Serialization.
V
- validate() - Method in interface org.projectnessie.gc.base.GCParams
W
- wasMerged() - Method in class org.projectnessie.gc.base.ContentBloomFilter
-
A merged bloomfilter might indicate decreased filter quality.
All Classes All Packages