Unexpected Iterations

Mustafa Haddara on March 29, 2016

Here at Vena, we build a multidimensional database (also known as an OLAP Cube, or simply “The Cube”) for our customers to store and analyze their data. Part of the functionality we provide is a programming language to express relationships between different pieces of data. For example, a customer might want to do a currency conversion — if some of their data is input in USD, but they are a Canadian company, they will want to convert those values into CAD for reporting purposes.

The Vena Calc language includes methods for users to traverse the hierarchy of their OLAP Cube and derive some simple results. For example, the snippet [Q2].PrevPeriod().Leaves().Sum() would be broken down into [Q1].Leaves().Sum() and then into {[Jan], [Feb], [Mar]}.Sum() and then the sum would be calculated. As you can see, these methods can be chained after one another, similar to most standard programming languages.

Don’t Use a Stack for Stacking

As we link the script against the Vena Cube, we need to evaluate these methods in the correct order. The most natural way to do this is to push each method onto a Stack and then evaluate the methods in order. After parsing the snippet above, for example, our stack of methods might look like:

+---+------------+
| 2 | PrevPeriod |
+---+------------+
| 1 | Leaves     |
+---+------------+
| 0 | Sum        |
+---+------------+

Now we begin at the top of the stack and evaluate it with the [Q2] member.

Written in Java, that code would look a little like the following.

Stack<Vena.lang.Method> methods = new Stack<>();

// populate the stack
Vena.lang.Member member = new Vena.lang.Member("Q2");  
for (Vena.lang.Method method: methods) {  
    logger.log(method);
    member = method.invoke(member);
}

We store the result in the member object and thereby “push forward” the result of one method as the argument to the next method in the chain.

However, when we turned on our logging, we noticed something odd. The output, using the above example, was

<"Sum">
<"Leaves">
<"PrevPeriod">

Our first reaction was that we populated the stack in the wrong order. So: let’s check the stack. We dumped the contents of the stack before entering the for loop. To our surprise, the stack was ordered exactly as we expected it to be.

Now the question is: does java.util.Stack iterate in the order we’re expecting? Our initial assumption was that a stack would iterate in Last-In-First-Out (LIFO) order, since that is what makes it a stack. This seemed the most intuitive; the most in line with the Principle of Least Surprise. However, seeing the output here brought that assumption to question. But…there’s no way that a stack iterates in First-In-First-Out (FIFO) fashion. Right? That would make it a queue. Right?

Wrong.

A Small Detour into .NET

Let’s take a small detour here. In, for example, C#, a stack can be iterated over like so:

Stack s = new Stack();  
s.Push("3");  
s.Push("4");  
s.Push("5");

foreach (String c in s) {  
    Console.WriteLine(c);
}

As expected, the output here is

5
4
3

That is to say, the iteration order is in LIFO-order, as a stack is expected to behave.

But How Does Java do it?

In Java, however, the same snippet would be written

Stack<String> s = new Stack();  
s.push("3");  
s.push("4");  
s.push("5");

for (String c : s) {  
    System.out.println(c);
}

And you would see the following output:

3
4
5

What.

A quick google search turns up this bug report from 2001, from when Sun owned Java. The submitter produces a similar example to the above. I’ll quote from the response:

EVALUATION
It was an incorrect design decision to have Stack extend Vector (“is-a” rather than “has-a”). We sympathize with the submitter but cannot fix this because of compatibility.

What.

This means that under the covers, java.util.Stack is implemented a little like so:

class Stack<T> extends Vector<T> {  
    public T pop() {
        T last = this.lastElement();
        this.remove(this.size()-1);
        return last;
    }

    public void push(T elem) {
        this.add(elem);
    }
}

And so the iterator follows the standard Vector iterator, from beginning to end, in FIFO-fashion, instead of going in LIFO order as would be expected from a Stack.

Some Computer Science Theory

Now, in theory, you should never be able to access elements of the Stack without popping them off the top. And iteration as we were doing above actually does not remove any elements from the Stack, which makes it an undefined operation. So, technically, there is no specified order to the iteration over a Stack that we do above. This just boils down to a violation of the Principle of Least Surprise: when asked, most developers would assume that a Stack iterates in pop() order. A small minority might assume that iteration over a Stack is forbidden, which is also a valid (and theoretically correct) assumption. But to have the Stack iterate in bottom-up order, just like a Queue? It comes as a complete surprise to pretty much everyone who hasn’t been bitten by this problem.

With the introduction of the java.util.Deque class, Oracle/Sun advise against using a Stack at all, and in fact have a note advising against its use in the documentation:

A more complete and consistent set of LIFO stack operations is provided by the Deque interface and its implementations, which should be used in preference to this class.

The problem is now that the java.util.Deque class is almost like a cross between a Queue and Stack — one can insert and read from either end of the data structure. And using a Deque in place of a Stack requires using methods like addFirst() and removeFirst() in place of push() and pop(), and iteration requires calling descendingIterator().

It works, to an extent, but at that point, what benefit does it provide over using a plain array? When we use a Stack, we use it because we know, in advance, that we will only be accessing the contents of that Stack in a specific order. Using an array, or even a Deque, violates that contract and forces the programmer to be conscious of iteration order and insertion order, all of which are supposed to be abstracted away behind the Stack interface.

Now, if you thought iteration over a Stack was awful, you’ll love to see how iteration over a PriorityQueue works.

PriorityQueues Lacking Priorities

Destructive Iteration

A PriorityQueue is a data structure which holds elements along with a measure of their importance. Unlike a regular Queue, where asking for the next item in the Queue will result in the oldest item, asking for the next item in a PriorityQueue results in the item with the highest priority.

In Java, a PriorityQueue will use either the natural order of objects, or a comparator provided in the constructor of the PriorityQueue, to determine priority of objects.

In the following example, we will rely on the natural ordering of elements.

PriorityQueue<Integer> pqueue = new PriorityQueue();  
pqueue.add(9);  
pqueue.add(3);  
pqueue.add(7);  
pqueue.add(4);

while (!pqueue.isEmpty()) {  
    System.out.println(pqueue.poll());
}

The poll() method removes the item with the highest priority (ie. the “smallest”) from the PriorityQueue and returns it. Thus, the output is

So far, so good.

The Non-Destructive Case

The only problem is that this is destructive. Once the while loop completes, our PriorityQueue is empty, because each call to poll() removes an object from the PriorityQueue. What if we wanted to iterate non-destructively? We might write something like:

PriorityQueue<Integer> pqueue = new PriorityQueue();  
pqueue.add(9);  
pqueue.add(3);  
pqueue.add(7);  
pqueue.add(4);

for (Integer i: pqueue) {  
    System.out.println(i);
}

In this case, actually, everything comes out as one would expect. Our output is:

It looks alright, doesn’t it? Well, this is one of those times that looks are deceiving.

Suppose our initialization of the PriorityQueue was slightly different. Suppose our code looked like this:

PriorityQueue<Integer> pqueue = new PriorityQueue();  
pqueue.add(9);  
pqueue.add(3);  
pqueue.add(4);  //  Swap these two lines  
pqueue.add(7);  //  Swap these two lines

for (Integer i: pqueue) {  
    System.out.println(i);
}

Now our output is

Interesting.

We can keep going with this, keep flipping around the order of the initialization, and sometimes we get a different order to our output, and sometimes we do not.

So it’s not quite a queue, not quite a stack, insertion order isn’t the only factor, and it doesn’t even iterate according to the priorities of the elements. What’s going on?

Under the covers, a PriorityQueue uses a binary heap to store the elements with their priorities. This means that asking for the first item is easy (just grab the root node of the tree) but getting the rest of the items requires mutating the rest of the tree. We speculate that the mistake was made to not walk that tree in order and instead return flat arrays of nodes.

Essentially, when iterating over a PriorityQueue like this, you can be reasonably certain that the first element out will be the element with the highest priority. After that? The elements might come out in any order at all.

In the documentation, actually, they tell us that

The Iterator provided in method iterator() is not guaranteed to traverse the elements of the priority queue in any particular order. If you need ordered traversal, consider using Arrays.sort(pq.toArray()).

That’s…useful, I suppose, but now if we want to iterate over a PriorityQueue non-destructively, it costs us O(n) memory and O(n*log(n)) time to compute, and that is only if our object comparator can compare in O(1) time. If, for example, we have a PriorityQueue of strings, then iteration costs O(k*n*log(n)) time, where k is the length of the longest string and n is the number of strings in the PriorityQueue.

Ouch.

Category: Vena Engineering
Tag: computer science, data structures, java

Posted by Mustafa Haddara

Professional copypaster fascinated with GPU programming and infosec. I also love rock climbing, video games, and interesting bits of design.

All Posts Website