Class OffsetCommitter

java.lang.Object
pl.allegro.tech.hermes.consumers.consumer.offset.OffsetCommitter

public class OffsetCommitter extends Object
Note on algorithm used to calculate offsets to actually commit. The idea behind this algorithm is that we would like to commit:
  • maximal offset marked as committed,
  • but not larger than smallest inflight offset (smallest inflight - 1).

Important note! This class is Kafka OffsetCommiter, and so it perceives offsets in Kafka way. Most importantly committed offset marks message that is read as first on Consumer restart (offset is inclusive for reading and exclusive for writing).

There are two queues which are used by Consumers to report message state:

  • inflightOffsets: message offsets that are currently being sent (inflight),
  • commitedOffsets: message offsets that are ready to get committed.

This committer class holds internal state in form of inflightOffsets and maxCommittedOffsets collections.

  • inflightOffsets are all offsets that are currently in inflight state,
  • maxCommittedOffsets are offsets (maximum per partition) of already committed messages that could not yet be committed to kafka due to an existing inflight offset on the same partition.

In scheduled periods, commit algorithm is run. It has three phases. First one is draining the queues and performing reductions:

  • drain committedOffsets queue to collection - it needs to be done before draining inflights, so this collection will not grow anymore, resulting in having inflights unmatched by commits; commits are incremented by 1 to match Kafka commit definition,
  • update the maxCommittedOffsets map with largest committed offsets,
  • drain inflightOffsets.

Second phase is calculating the offsets:

  • calculate maximal committed offset for each subscription and partition,
  • calculate minimal inflight offset for each subscription and partition.

Third phase is choosing which offset to commit for each subscription/partition. This is the minimal value of:

  • maximum committed offset,
  • minimum inflight offset.

This algorithm is very simple, memory efficient, can be performed in single thread and introduces no locks.