Queueing in Erlang

Author

Max Bucknell

Published

February 22, 2025

Queueing in Erlang

Last year, I needed a queue to implement a breadth-first search for an advent of code problem. I was using Elixir, and I wasn’t sure how to implement a queue myself. Luckily, Erlang provides queue, and I was off to the races. I’ve been wondering off and on since then exactly how it works, since the data structure for it was both transparent and also bizarre (to me). So I’m going to read the source and notate it here. Erlang is not a language I know, so hopefully it makes sense to me. We are all learning here.

Basic Queueing Operations

The first thing that we see from the constructor is that a queue is a tuple of two lists. This is further backed up by is_queue/1:

new() -> {[],[]}. %{RearList,FrontList}

%% ...

is_queue({R,F}) when is_list(R), is_list(F) ->
    true;
is_queue(_) ->
    false.

Similar to Elixir, Erlang supports and encourages branching behaviour based on different argument patterns. Essentially, the first clause of the function returns true, but only when the supplied argument is a tuple of two lists. If it is, it is a valid representation of a queue in Erlang! The second clause, with the underscore argument, matches everything else. In general, when set of parameters match multiple function definitions, the first one is taken. I find this an enjoyable and very expressive form of programming. I love me some pattern matching, truly.

We also get a clue in here that the first list is for the rear, and the second one for the front. Just below this, is_empty/1 also confirms that {[], []} is the empty queue.

len/1 shows us that there is no duplication in either queue, meaning it’s memory efficient. If there were duplication, the count of items would not be simply the sum of each list:

len({R,F}) when is_list(R), is_list(F) ->
    length(R)+length(F);
len(Q) ->
    erlang:error(badarg, [Q]).

An example about using to_list/1 shows us the usage of its counterpart from_list/1, and also that the correct way to represent [1, 2, 3, 4, 5] is {[5, 4, 3], [1, 2]}. The big question is “Why?”.

queue:from_list([1,2,3,4,5]).
%% {[5,4,3],[1,2]}

to_list/1 goes on to show that for any queue, its list representation is the second part followed by the first part, backwards:

to_list({In,Out}) when is_list(In), is_list(Out) ->
    Out++lists:reverse(In, []);

Things get really interesting when we look at the source its counterpart from_list/1:

from_list(L) when is_list(L) ->
    f2r(L);

What the hell is f2r/1? Right down at the bottom of the file, there are two internal functions, f2r/1 and r2f/1. They are documented in the code, but come under a header “Internal workers”, and are nowhere to be found in the documentation. They appear relatively simple. Considering a queue {R, F}, where R and F denote Rear and Front lists respectively, f2r/1 moves half of the elements from F to R. Conversely, r2f/1 moves half of the elements of R into F.

Here they are, in their entirety:

%% Move half of elements from R to F, if there are at least three
r2f([]) ->
    {[],[]};
r2f([_]=R) ->
    {[],R};
r2f([Y,X]) ->
    {[Y],[X]};
r2f(List) ->
    {RR,FF} = lists:split(length(List) div 2, List),
    {RR,lists:reverse(FF, [])}.

%% Move half of elements from F to R, if there are enough
f2r([]) ->
    {[],[]};
f2r([_]=F) ->
    {F,[]};
f2r([X,Y]) ->
    {[Y],[X]};
f2r(List) ->
    {FF,RR} = lists:split(length(List) div 2, List),
    {lists:reverse(RR, []),FF}.

So these functions will balance the two ends of the queue for us when they are called. One interesting feature of these two functions is that they take a list, rather than a queue, but return a queue. Meaning that they should only be called when their destination list is empty.

Reading these, it makes sense to me now why R is reversed: prepending to a list is O(1), meaning that enqueueing is similarly cheap. However, if we push n items into our queue, dequeueing them will be O(n). Hence, at some point we wish to rebalance our queue so that ripping items off the front of F is cheap. By reading these functions, it seems that this operation is performed when we run out of items to pop from F, and in that case we take half of the items from R and put them into F. This means, that if a queue contains n items and we drain it, we can expect r2f/1 to be called roughly log(n) times. I now understand the documentation’s meaning when it states that these operations “have an amortized O(1) running time”. Every so often, you get lumped with a worst case of O(n). As long as this is rare, it is not a problem.

There are a few other interesting things about this code:

I’m pretty sure that f2r/1 is merely a relabelling of r2f/1 with the order of the tuple swapped. I probably would have written it like that.
If the list has one item only, it is swapped to the other side. That is to say that f2r/1 puts it in R, and r2f/1 puts it in F.

It’s relatively clear to me that we force ourselves to rebalance the queue if we run out of items at either end (out/1 will call r2f/1, and out_r will call f2r/1 to pull from the back. But when else? My instinct is that adding items will never rebalance the queue, because we cannot anticipate how a queue will be drained until the client starts draining it. However, there clearly is some general goal of keeping things ready for anything, since from_list/1 calls f2r/1 to rebalance on creation. That’s how we even got here. Let’s take a look at in/2 to see:

in(X, {[_]=In,[]}) ->
    {[X], In};
in(X, {In,Out}) when is_list(In), is_list(Out) ->
    {[X|In],Out};
in(X, Q) ->
    erlang:error(badarg, [X,Q]).

Well it’s true that it doesn’t call f2r/1, and I would guess that in_r/2 behaves similarly. There are a couple of interesting clauses here, though:

The first deals with a queue with only a single item in its rear item, such as a queue that would be generated by creating an empty queue and adding a single item. In this case, the first item is moved to F, and the new item is added to R, leaving the queue balanced.
Otherwise, we simply add the new item to the front of R, which represents the back of the queue.

So, if we append 1000 items to an empty queue, we would end up with one item in F, and 999 items in R.

If we turn our attention to out/1, we don’t see anything particularly surprising:

out({[],[]}=Q) ->
    {empty,Q};
out({[V],[]}) ->
    {{value,V},{[],[]}};
out({[Y|In],[]}) ->
    [V|Out] = lists:reverse(In, []),
    {{value,V},{[Y],Out}};
out({In,[V]}) when is_list(In) ->
    {{value,V},r2f(In)};
out({In,[V|Out]}) when is_list(In) ->
    {{value,V},{In,Out}};
out(Q) ->
    erlang:error(badarg, [Q]).

The return value here is a 2-tuple where the first value is either an atom empty, or a tuple with the atom value and the returned value, and the second item is the new queue. Atoms are unique symbols that exist in the Erlang runtime, string constants that are used kind of like global enum values.

My guess above was that r2f/1 would only be called by out/1 iff F was empty. That is not the case. The third clause deals with an empty F, and it does so be emptying all but one element of R into F. However, r2f/1 is called, it is just called when F has 1 element in it, rather than 0. Although, when F has a single item, the queue returned thereafter would indeed have an empty F, so I wasn’t too far off.

As an interesting side-quest, it is not immediately clear to me how we would ever get to the case where F is empty and R has values in it through only these two functions. I know that there are advanced functions for deletions and filtering and mapping and suchlike, so maybe this will be revealed to me later on.

This covers basic queueing! We have in/2, out/1, and even in_r/2 and out_r/1, all powered by an ordered pair of lists, that are kinda balanced out sometimes. But there is more to this module. A lot more, in fact. There are the basic iteration functions, things like all/1, and fold/3. These seem relatively simple to implement, though I wonder if they are converted to a list, or two iterations are run. My bet is the latter. Then, there are the basic functions that seem harder to implement to me. Things like split, join, and delete. I feel like these require some amount of rebalancing and reversing things. Finally, the documentation contains an “Okasaki” API, which links to a 1996 paper (book? It’s 130-odd pages) called “Purely Functional Data Structures”. This is up my alley, but will not be covered in this post. I am already reading a book right now. So, let’s look at splitting and joining.

Splitting

(In the following examples, I am going to omit the degenerate cases that handle errors and empty inputs. They are not algorithmically relevant.)

The API of split doesn’t just halve the queue, it puts the first N items into the first queue, and the remainder into the second queue. As I suspected, the code is a little gnarly, certainly more so than anything hitherto presented:

split(N, {R,F}=Q) when is_integer(N), N >= 1, is_list(R), is_list(F) ->
    Lf = erlang:length(F),
    if  N < Lf -> % Lf >= 2
            [X|F1] = F,
            split_f1_to_r2(N-1, R, F1, [], [X]);
        N > Lf ->
            Lr = length(R),
            M = Lr - (N-Lf),
            if  M < 0 ->
                    erlang:error(badarg, [N,Q]);
                M > 0 ->
                    [X|R1] = R,
                    split_r1_to_f2(M-1, R1, F, [X], []);
                true -> % M == 0
                    {Q,{[],[]}}
            end;
        true -> % N == Lf
            {f2r(F),r2f(R)}
    end;

The first thing is that this is some pretty fun indenting of if statements! The second thing is that this calls two little helper functions that are defined just below:

%% Move N elements from F1 to R2
split_f1_to_r2(0, R1, F1, R2, F2) ->
    {{R2,F2},{R1,F1}};
split_f1_to_r2(N, R1, [X|F1], R2, F2) ->
    split_f1_to_r2(N-1, R1, F1, [X|R2], F2).

%% Move N elements from R1 to F2
split_r1_to_f2(0, R1, F1, R2, F2) ->
    {{R1,F1},{R2,F2}};
split_r1_to_f2(N, [X|R1], F1, R2, F2) ->
    split_r1_to_f2(N-1, R1, F1, R2, [X|F2]).

Now I can actually look at this. This function principally branches on whether or not we can satisfy the first queue solely by pulling elements off the front of F. Namely, that F has at least N elements. There are three possible outcomes:

We have exactly N elements in F

In this case, we take F and R, and return them. We also go to the trouble of rebalancing them.

We have more than N elements in F

So I thought this would be the simple operation of pulling N elements of F, and maybe calling f2r on it. But that’s not what we do here. For reasons surpassing my understanding, we pull one element off the front of F, and pop it into the front of the second queue. Then we grab the remaining N-1 elements, and put them into the rear of the second queue.

This is achieved by that handy helper function split_f1_to_r2/5. This has a very nice recursive definition, but one thing to note is that the base case swaps its arguments around, returning {Q2, Q1}. This is where the swapping is done, ensuring that the front N items go into the front queue.

We have fewer than N elements in F

This means we are going to have to plunder R in order to get to our queue, and that is going to be at least somewhat slow. In this case, we further branch by M. M is rather confusingly written here, I would prefer it be defined as Lr + Lf - N, because double negatives are hard for me. But I’m a big boy and I’m used to not getting what I want at this point.

If M is less than 0, it means we do not have enough elements to move, and we simply bail. This feels cowardly to me, I would probably go for just putting the input queue into the first result, and leaving the second one empty, but I’m not about to come in and tell the standard library authors for a language I don’t even know how to decorate their house. I’m an idiot!

As it happens, though, that is exactly what we do if M is exactly 0, meaning that the input queue contains exactly N items.

In the final case, we actually do some splitting. In this case, we again delegate to a helper function, this time it’s split_r1_to_f2/5. It begins in much the same way, where we take the first element of R and put it into the rear of the second queue, then we ship the remaining M-1 elements into the front of the second queue. Since M is the number of elements left in the queue after removing the first N arguments, the first queue is the first queue according to the API contract, so no swapping occurs.

All of these structures are immutable, but it might be helpful to think of the previous case as moving the first N elements into a new queue, and this case as moving the last M elements into a new queue.

Joining

join({R1,F1}, {R2,F2}) when is_list(R1), is_list(F1), is_list(R2), is_list(F2) ->
    {R2,F1++lists:reverse(R1,F2)};

One thing I’ve learned through this analysis is that there are a number of valid representations for the same notional queue. And nothing throws this idea into quite such sharp relief as this snippet.

This function might seem complicated. It did to me, but it turns out that lists:reverse/2 only reverses the first argument, and just appends the tail.

And with that, I am out of questions about how Erlang’s queue works! There are other functions, and an entire API that I don’t understand (to be fair, it’s the names that make no sense to me, the behaviour seems relatively straightforward). The remaining functions usually involve visiting all elements. If the order is important, that in turn involves reversing one of the lists. In cases where we are searching for a single item and we need to traverse the queue in order (delete/2 et al), then the front portion of the queue will be traversed first, only reversing the rear if needed. Finally, one beautiful feature of this design is that it is totally symmetric. For all relevant functions, there exist _r variants that go from the other end. But what I find particularly pleasing here is that inverting a queue is a simple matter of swapping the front and rear component lists.