The maximum number of elements that a queue can hold.
The maximum number of elements that a queue can hold.
that unbounded queues can still implement this interface
with capaciy = MAX_INT.
the number of elements that have ever been taken from the queue.
if you know how much time the queue is alive, you can calculate the rate at which elements are being dequeued.
the number of elements that have ever been added to the queue.
if you know how much time the queue is alive, you can calculate the rate at which elements are being enqueued.
,that Long is used here, since Int will be
overflowed really quickly for busy queues.
A non-blocking enqueue.
A non-blocking enqueue.
whether the enqueue was successful or not.
A non-blocking dequeue.
A non-blocking dequeue.
either an element from the queue, or the default
param.
that if there's no meaningful default for your type, you
can alway use poll(null). Not the best, but reasonable price
to pay for lower heap churn from not using Option here.
the current number of elements inside the queue.
that this method can be non-atomic and return the approximate number in a concurrent setting.
A lock-free array based bounded queue. It is thread-safe and can be used in multiple-producer/multiple-consumer (MPMC) setting.
Main concepts
A simple array based queue of size N uses an array
bufof size N as an underlying storage. There are 2 pointersheadandtail. The element is enqueued intobufat positiontail % Nand dequeued fromhead % N. Each time an enqueue happenstailis incremented, similarly when dequeue happensheadis incremented.Since pointers wrap around the array as they get incremented such data structure is also called a circular buffer or a ring buffer.
Because queue is bounded, enqueue and dequeue may fail, which is captured in the semantics of
offerandpollmethods.Using
offeras an example, the algorithm can be broken down roughly into three steps:Steps 1 and 2 are usually done in a loop to accommodate the possibility of failure due to race. Depending on the implementation of these steps the resulting queue will have different characteristics. For instance, the more sub-steps are between reserve and publish in step 2, the higher is the chance that one thread will delay other threads due to being descheduled.
Notes on the design
The queue uses a
bufarray to store elements. It usesseqarray to store longs which serve as: 1. an indicator to producer/consumer threads whether the slot is right for enqueue/dequeue, 2. an indicator whether the queue is empty/full, 3. a mechanism to publish changes tobufvia volatile write (can even be relaxed to ordered store). See comments inoffer/pollmethods for more details onseq.The benefit of using
seq+head/tailcounters is that there are no allocations during enqueue/dequeue and very little overhead. The downside is it doubles (on 64bit) or triples (compressed OOPs) the amount of memory needed for queue.Concurrent enqueues and concurrent dequeues are possible. However there is no helping, so threads can delay other threads, and thus the queue doesn't provide full set of lock-free guarantees. In practice it's usually not a problem, since benefits are simplicity, zero GC pressure and speed.
The real capacity of the queue is the next power of 2 of the
desiredCapacity. The reason ishead % Nandtail % Nare rather cheap when can be done as a simple mask (N is pow 2), and pretty expensive when involve anidivinstruction. The queue can be made to work with arbitrary sizes but the user will have to suffer ~20% performance loss.To ensure good performance reads/writes to
headandtailfields need to be independant, e.g. they shouldn't fall on the same (adjacent) cache-line.We can make those counters regular volatile long fields and space them out, but we still need a way to do CAS on them. The only way to do this except
Unsafeis to useAtomicLongFieldUpdater, which is exactly what we have here.scalaz.zio.internal.impls.padding.MutableQueueFieldsPadding for more details on padding and object's memory layout. The design is heavily inspired by such libraries as https://github.com/LMAX-Exchange/disruptor and https://github.com/JCTools/JCTools which is based off D. Vyukov's design http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue Compared to JCTools this implementation doesn't rely on
sun.misc.Unsafe, so it is arguably more portable, and should be easier to read. It's also very extensively commented, including reasoning, assumptions, and hacks.Alternative designs
There is an alternative design described in the paper A Portable Lock-Free Bounded Queue by Pirkelbauer et al. It provides full lock-free guarantees, which generally means that one out of many contending threads is guaranteed to make progress in a finite number of steps. The design thus is not susceptible to threads delaying other threads. However the helping scheme is rather involved and cannot be implemented without allocations (at least I couldn't come up with a way yet). This translates into worse performance on average, and better performance in some very specific situations.