The Inflatable Transpose Tree: a highly compact data structure

In honour of Claude Shannon's 100th birthday, I want to tell you about two ideas I had for storing data efficiently. These ideas are pretty nifty individually, but together, they form a data structure that allows a large amount of data to be stored in a remarkably small space.

First idea: Inflatable data structures

The first idea was one I had maybe eight years ago or so, and I referred to it as inflatable data structures. Suppose you have a class with a reference to an enum having three values. In Java, that reference occupies 32 bits, yet its three values could be represented using just two bits. This isn't only true for enums: it's true anywhere you have an object reference that can point only to objects within a certain restricted set; for instance, a company with 210 employees needs only 8 bits to store an employee reference. However, it would be unwise to declare an employee field of type byte: what happens when employee #256 is hired? Long gone are the days when a program can simply terminate due to an implementation limit.

But suppose you could start with a one-byte field, and enlarge it later if you discover that one byte is not enough? This is the core concept behind inflatable data structures. Many structures, like ArrayList, can grow longer to accommodate more data entries; an inflatable data structure can grow wider to accommodate bigger entries.

I tried to implement this idea using template tricks in C++. That did not go well. In fact, after 13 years as a professional C++ programmer, it prompted me to post this:

Ok, that's enough. I have absolutely had it with C++. Writing software in C++ feels exactly like writing an essay for a teacher who has disallowed words containing the letter "e": you expend all your ingenuity trying to figure out how to express yourself rather than solving the problem in front of you.

So, needless to say, I put the idea of inflatable data structures on the back burner.

Second idea: Transposed data structures

The second idea is one I'm sure many people have had, if they work in a language like Java whose objects possess an unavoidable memory overhead. No matter how small the overhead, it can become prohibitive if the data payload in the object is sufficiently small, and you have sufficiently many objects.

In Java, an object header consumes 12 bytes, and since they're aligned to 8 bytes, you have a bare minimum of 16 bytes per object, even with no fields. That may sound small, but it puts a hard limit of 62 million objects per gigabyte of heap. Furthermore, sub-integer types consume 4 bytes each, so you save no storage by using short, byte, or even boolean fields.

A particularly egregious example of this is TreeMap.Entry:

    static final class Entry<K,V> implements Map.Entry<K,V> {
        K key;
        V value;
        Entry<K,V> left, right, parent;
        boolean color;

With a 12-byte header and 4 bytes per reference, this adds up to 32 bytes without the color flag. Thanks to the 8-byte object alignment requirement, the boolean flag brings the total size up to 40 bytes. That boolean costs eight bytes!

So what do we do instead?

The big idea is to turn things sideways. In C parlance, instead of an array of structs, let's have a struct of arrays. Instead of a million objects with six fields each, how about six arrays with a million elements each?

Where the old layout would look like this, with an object header per entry:

One object per entry

...the new layout looks like this, with one object header per field-array:

One array per field

(This reminiscent of a column-oriented database, though our emphasis is on the total size of the data, rather than the amount of data loaded to satisfy queries.)

Here's a hypothetical transposed version of TreeMap, with its six fields per Entry replaced by six field-arrays, in C++-like pseudocode:

class TreeMap<K,V> {  
        K key[NUM_ENTRIES];
        V value[NUM_ENTRIES];
        int left[NUM_ENTRIES];
        int right[NUM_ENTRIES];
        int parent[NUM_ENTRIES];
        boolean color[NUM_ENTRIES];

What used to be an Entry object is now identified by an array index, and the same index is used to get data from all six of these arrays. Entry #123 would have its key in key[123], its value in value[123], and so on. For large trees, the object header overhead is now negligible; the storage cost for this structure is 21 bytes per entry—about half that of TreeMap. And if your keys and/or values are primitives, you can easily save even more storage by using arrays of primitives in place of the K and V arrays.

It's especially important to notice how the left, right, and parent arrays work. These arrays contain indexes of other tree entries, which means they are serving the purpose that object references would serve in a normal (not transposed) data structure. In this example, these references are ints, which are actually the same size as references, but they could have been shorts, or bytes. This is taking Java's compressed references to a new level: instead of using just 32 bits for a reference, these super-compressed references could be using 16 bits, or 8 bits.

In fact, they can use even fewer bits. I wrote a class called BitBlock, which acts like a growable array (like ArrayList) of n-bit entries. This way, our hypothetical enum with three values would consume just 2 bits per entry. Under the covers, it packs as many entries as possible into each 64-bit long, which it stores compactly in a growable array.

However, what happens if you discover you were wrong about how many bits you need? What happens when employee #256 joins the company, and now you wish you had used 9 bits per entry instead of 8?

Good question...

Transposition makes inflation simple

At this point, a thought occurred to me that is perhaps obvious by now to the astute reader: wouldn't it be simpler to enlarge a particular field if all occurrences of that field were collected together into a single object that you could replace? Well, shoot, it turns out this is exactly what transposition does!

So I implemented an InflatableBlock that looks from the outside like an ArrayList of longs. Under the covers, though, it actually packs the entries into a BitBlock; it starts with a small entries, and then graduates to larger entries whenever an oversized entry is added, copying all existing entries over to a new BitBlock transparently.

If you use an InflatableBlock for one or more of your field-arrays, then tada! Your transposed data structure is inflatable!

Putting it all together: the Transpose Tree

The Transpose Tree was the first data structure I implemented this way. It is a red-black tree implemented using inflatable field-arrays in place of object references. To use a Transpose Tree, you extend TransposeTree and add additional field-arrays to hold whatever would have been in fields, had you used ordinary key and value objects. If these field-arrays are InflatableBlocks, then your data structure is inflatable.

Now, there is a down-side to all this memory efficiency: we have abandoned Java's object-oriented paradigm in favour of bit-twiddling, so using TransposeTree is not as straightforward as using a data structure from the class library. There's no way to use TransposeTree without extending it; in fact, TransposeTree has no public methods—a deliberate design decision that gives subclasses complete freedom to define their own public interface.

The subclass must implement a compare method that allows TransposeTree to set up the tree references without ever manipulating the keys directly. This way, a key can be any primitive, or any combination of primitives. In other words, compound keys are straightforward. (And hey—the keys can even be objects, if you want.)

A contrived example

To show the benefit this technique can have on scalar data, here's a silly example: keeping property information on one million longs. For some reason, we want to create a tree that maps each long to a collection of properties:

  • its last digit,
  • the number of digits it has, and
  • its sign.

Using objects, we might store the information this way:

    static class Properties {
        final byte finalDigit, numDigits;
        final boolean signBit;
    TreeMap<Long, Properties> treeMap = new TreeMap<>();

The memory consumption per entry is as follows:

  • 40 bytes for a TreeMap.Entry
  • 16 bytes for a Long key
  • 24 bytes for a Properties value

Total: 80 bytes per entry.

In contrast, to use TransposeTree, we must extend it. Here are the first few lines of a NumberPropertiesTree class implementing an inflatable transpose tree:

    static class NumberPropertiesTree extends TransposeTree {

        final PrimitiveBlock values, finalDigit, numDigits, signBit;

        protected NumberPropertiesTree() {
            this.values     = new LongBlock();
            this.finalDigit = new InflatableBlock();
            this.numDigits  = new InflatableBlock();
            this.signBit    = new InflatableBlock();

        protected int compare(int index1, int index2) {
            return, value.get(index2));

(I've omitted some constructor arguments here for brevity.)

We've added field-arrays for value (the key, in this case), finalDigit, numDigits, and signBit. Together, the memory consumption per element is as follows:

  • 64 bits for value;
  • 42 bits for the red-black tree's left and right field-arrays (inflated to 21 bits each, to accommodate array indexes up to a million)
  • 1 bit for isBlack information; and
  • 10 bits for the properties.

Total: 117 bits = 14.7 bytes per entry.

Of these 117 bits, 43 bits are for managing the tree; 64 bits are for storing the original long value; and the remaining 10 bits store the properties.

If we stop at this point, NumberPropertiesTree contains no public methods. To make the class usable, we'll add two: the first is propertiesFor, which produces a stream of Properties objects matching a given query; the Properties objects will be constructed on-the-fly from the data stored in the array-fields.

        public Stream<Properties> propertiesFor(NodeLocator locator) {
            return super.allIndexesMatching(locator).mapToObj(i->
                new Properties(values.get(i), (byte)finalDigit.get(i), (byte)numDigits.get(i), signBit.get(i) == 1)

The allIndexesMatching method is inherited from TransposeTree, and provides a stream containing the indexes of all the tree nodes matching the given criterion, in the form of a NodeLocator. The NodeLocator is a bit like Java's Comparator: by returning a negative or positive value, it tells the tree search algorithm whether to go left or right; and given this information, allIndexesMatching returns the indexes of all nodes for which the locator returns zero.

The second public method is add, which computes properties for a given value and adds them to the tree:

        void add(long value) {
            long abs = Math.abs(value);
            values.set(insertionPoint(), value);
            finalDigit.set(insertionPoint(), abs % 10);
            numDigits.set(insertionPoint(), Long.toString(abs).length());
            signBit.set(insertionPoint(), value >>> 63);

Note how this operates: at any time, TransposeTree.insertionPoint() returns an unused index where the field values for a new tree entry can be stored. The subclass stores those field values, and then calls insert() to add the new item into the tree. Afterward, insertionPoint points to a new location for the next insertion.

TransposeTree also has a method called shrinkwrap that re-packs the data into smaller arrays, much like ArrayList.trimToSize(). On average, this can be expected to free about a quarter of each array, which is significant. By overriding shrinkwrap, our subclass can participate in this savings by shrinkwrapping all the array-fields too:

        public void shrinkwrap() {

At this point, we have the fewest possible entries, each of which occupies the fewest possible bits. We're ready to run some tests.

A disclaimer, though: Whatever we determine about memory consumption, never forget that the one using TreeMap was fully implemented in a dozen lines of code. Using TransposeTree is quite complicated by comparison, so it should be reserved for very large, regular data structures whose size would otherwise be prohibitive.

And now, on with the show.

Experimental method and results

I measured the size of the two data structures by creating an object that points to one of each, inserting the same 1,000,000 random numbers, and then triggering a heap dump. The insertion code looked like this:

    static class ThingsToMeasure {
        TreeMap<Long, Properties> treeMap = new TreeMap<>();
        NumberPropertiesTree transposeTree = new NumberPropertiesTree(1);

        void build() {
            // Fill up a traditional TreeMap
            numbers().forEach(number->treeMap.put(number, Properties.from(number)));

            // Fill up a transpose tree

Here's the resulting heap analysis:

Top heap objects

As you can see, treeMap consumes 80MB, which is 640 bits per entry; while transposeTree consumes just 118 bits per entry. The TransposeTree is 5.4 times smaller than the TreeMap. The total overhead of object headers and references amounts to one-twentieth of one percent.

The price for this compact representation is that Properties objects need to be constructed on demand, while the TreeMap stores them all pre-built. There's also a price to be paid in complexity: until we reach a glorious future in which the JVM can automatically transpose your data structures for you, the code to do this explicitly is undoubtedly cumbersome.

Prototype code

I've placed a prototype of the TransposeTree code on in my personal GitHub account here.

This is not a ready-made library. It is the prototype-quality code I used to run these experiments, so you can better understand exactly how an inflatable transpose tree works.

I hope it will mature into a proper library one day, with a proper Maven/Ant setup, and documentation, and unit tests. It currently has none of these things.

Use at your own risk.

Addendum: Penny Pinching

Wait a minute—we predicted 117 bits per entry for the TransposeTree, and measured 118. Where's the missing bit?

The answer is internal fragmentation inside BitBlock. It can fit only three 21-bit values in a single 64-bit long, and only twelve 5-bit values. In each case, this wastes one-third of a bit per value, and since we have three such fields (prev, next, and numDigits) we waste a total of one bit per entry.

In fact, now that we're counting bits, we could get by with 20 bits for the prev and next fields, though we could still fit only three of them in a 64-bit long, so there's really no point in ever having 20-bit fields in a BitBlock.

But if we're counting bits to this extreme, where do we stop?

The finalDigit field can take on only ten different values. Information theory tells us that 10 equally likely values require only 3.322 bits each, but we allotted 4 bits, so there's another 0.678 bits wasted. Perhaps we could pack three of them into ten bits, thereby using just 3⅓ bits each!

Shall we keep going?

The numDigits field can take any value from 1 to 19, which requires only 4.3 bits each, but we allotted 5 bits, wasting another 0.7 bits. Moreover, the values that this field takes on are not equally likely; in fact, they are highly skewed. Most of the numbers coming from Random.nextLong() have 19 or 18 digits. My back-of-the-envelope calculation gives a Shannon entropy of just 0.54 bits for this field!

There's really no end to this process. Data compression is a fascinating topic that can employ endless ingenuity to squeeze more and more bits out of your data. Compression algorithms can and do become so advanced that they qualify as artificial intelligence.

But we've got to stop somewhere.

† Assuming compressed references, or a 32-bit JVM. For a 64-bit JVM with uncompressed references, this enum reference would occupy 64 bits.

‡ This is in Oracle and OpenJDK 64-bit with compressed references. For 32-bit, the header is 8 bytes; for uncompressed references, it's 16 bytes. IBM's Java has 8 bytes of overhead per object, for both 32-bit and 64-bit compressed; and if you promise never to synchronize on your objects, you can reduce this to 4 bytes using the (very oddly documented) gurus-only -Xlockword option. In all cases, arrays consume 4 extra bytes for the size.

Discuss on Hacker News

Header photo by Bill Wrigley [Public domain], via Wikimedia Commons

Vena is hiring in Toronto!
Learn about our culture, if you think you're a good fit, apply!