# The Inflatable Transpose Tree: a highly compact data structure

In honour of Claude Shannon’s 100th birthday, I want to tell you about two ideas I had for storing data efficiently. These ideas are pretty nifty individually, but together, they form a data structure that allows a large amount of data to be stored in a remarkably small space.

# First idea: Inflatable data structures

The first idea was one I had maybe eight years ago or so, and I referred to it as inflatable data structures. Suppose you have a class with a reference to an enum having three values. In Java, that reference occupies 32 bits, yet its three values could be represented using just two bits. This isn’t only true for enums: it’s true anywhere you have an object reference that can point only to objects within a certain restricted set; for instance, a company with 210 employees needs only 8 bits to store an employee reference. However, it would be unwise to declare an employee field of type `byte`: what happens when employee #256 is hired? Long gone are the days when a program can simply terminate due to an implementation limit.

But suppose you could start with a one-byte field, and enlarge it later if you discover that one byte is not enough? This is the core concept behind inflatable data structures. Many structures, like `ArrayList`, can grow longer to accommodate more data entries; an inflatable data structure can grow wider to accommodate bigger entries.

I tried to implement this idea using template tricks in C++. That did not go well. In fact, after 13 years as a professional C++ programmer, it prompted me to post this:

Ok, that’s enough. I have absolutely had it with C++. Writing software in C++ feels exactly like writing an essay for a teacher who has disallowed words containing the letter “e”: you expend all your ingenuity trying to figure out how to express yourself rather than solving the problem in front of you.

So, needless to say, I put the idea of inflatable data structures on the back burner.

# Second idea: Transposed data structures

The second idea is one I’m sure many people have had, if they work in a language like Java whose objects possess an unavoidable memory overhead. No matter how small the overhead, it can become prohibitive if the data payload in the object is sufficiently small, and you have sufficiently many objects.

In Java, an object header consumes 12 bytes, and since they’re aligned to 8 bytes, you have a bare minimum of 16 bytes per object, even with no fields. That may sound small, but it puts a hard limit of 62 million objects per gigabyte of heap. Furthermore, sub-integer types consume 4 bytes each, so you save no storage by using `short``byte`, or even `boolean` fields.

A particularly egregious example of this is `TreeMap.Entry`:

``````    static final class Entry<K,V> implements Map.Entry<K,V> {
K key;
V value;
Entry<K,V> left, right, parent;
boolean color;
``````

With a 12-byte header and 4 bytes per reference, this adds up to 32 bytes without the `color` flag. Thanks to the 8-byte object alignment requirement, the boolean flag brings the total size up to 40 bytes. That boolean costs eight bytes!

So what do we do instead?

The big idea is to turn things sideways. In C parlance, instead of an array of structs, let’s have a struct of arrays. Instead of a million objects with six fields each, how about six arrays with a million elements each?

Where the old layout would look like this, with an object header per entry:

…the new layout looks like this, with one object header per field-array:

(This reminiscent of a column-oriented database, though our emphasis is on the total size of the data, rather than the amount of data loaded to satisfy queries.)

Here’s a hypothetical transposed version of `TreeMap`, with its six fields per `Entry` replaced by six field-arrays, in C++-like pseudocode:

``````class TreeMap<K,V> {
K key[NUM_ENTRIES];
V value[NUM_ENTRIES];
int left[NUM_ENTRIES];
int right[NUM_ENTRIES];
int parent[NUM_ENTRIES];
boolean color[NUM_ENTRIES];
}
``````

What used to be an `Entry` object is now identified by an array index, and the same index is used to get data from all six of these arrays. Entry #123 would have its key in `key[123]`, its value in `value[123]`, and so on. For large trees, the object header overhead is now negligible; the storage cost for this structure is 21 bytes per entry—about half that of `TreeMap`. And if your keys and/or values are primitives, you can easily save even more storage by using arrays of primitives in place of the `K` and `V` arrays.

It’s especially important to notice how the `left``right`, and `parent` arrays work. These arrays contain indexes of other tree entries, which means they are serving the purpose that object references would serve in a normal (not transposed) data structure. In this example, these references are `int`s, which are actually the same size as references, but they could have been `short`s, or `byte`s. This is taking Java’s compressed references to a new level: instead of using just 32 bits for a reference, these super-compressed references could be using 16 bits, or 8 bits.

In fact, they can use even fewer bits. I wrote a class called `BitBlock`, which acts like a growable array (like `ArrayList`) of n-bit entries. This way, our hypothetical enum with three values would consume just 2 bits per entry. Under the covers, it packs as many entries as possible into each 64-bit `long`, which it stores compactly in a growable array.

However, what happens if you discover you were wrong about how many bits you need? What happens when employee #256 joins the company, and now you wish you had used 9 bits per entry instead of 8?

Good question…

# Transposition makes inflation simple

At this point, a thought occurred to me that is perhaps obvious by now to the astute reader: wouldn’t it be simpler to enlarge a particular field if all occurrences of that field were collected together into a single object that you could replace? Well, shoot, it turns out this is exactly what transposition does!

So I implemented an `InflatableBlock` that looks from the outside like an `ArrayList` of longs. Under the covers, though, it actually packs the entries into a `BitBlock`; it starts with a small entries, and then graduates to larger entries whenever an oversized entry is added, copying all existing entries over to a new `BitBlock` transparently.

If you use an `InflatableBlock` for one or more of your field-arrays, then tada! Your transposed data structure is inflatable!

# Putting it all together: the Transpose Tree

The Transpose Tree was the first data structure I implemented this way. It is a red-black tree implemented using inflatable field-arrays in place of object references. To use a Transpose Tree, you extend `TransposeTree` and add additional field-arrays to hold whatever would have been in fields, had you used ordinary key and value objects. If these field-arrays are `InflatableBlocks`, then your data structure is inflatable.

Now, there is a down-side to all this memory efficiency: we have abandoned Java’s object-oriented paradigm in favour of bit-twiddling, so using `TransposeTree` is not as straightforward as using a data structure from the class library. There’s no way to use `TransposeTree` without extending it; in fact, `TransposeTree` has no public methods—a deliberate design decision that gives subclasses complete freedom to define their own public interface.

The subclass must implement a `compare` method that allows `TransposeTree` to set up the tree references without ever manipulating the keys directly. This way, a key can be any primitive, or any combination of primitives. In other words, compound keys are straightforward. (And hey—the keys can even be objects, if you want.)

# A contrived example

To show the benefit this technique can have on scalar data, here’s a silly example: keeping property information on one million longs. For some reason, we want to create a tree that maps each long to a collection of properties:

• its last digit,
• the number of digits it has, and
• its sign.

Using objects, we might store the information this way:

``````    static class Properties {
final byte finalDigit, numDigits;
final boolean signBit;
}
⋮
TreeMap<Long, Properties> treeMap = new TreeMap<>();
``````

The memory consumption per entry is as follows:

• 40 bytes for a `TreeMap.Entry`
• 16 bytes for a `Long` key
• 24 bytes for a `Properties` value

Total: 80 bytes per entry.

In contrast, to use `TransposeTree`, we must extend it. Here are the first few lines of a `NumberPropertiesTree` class implementing an inflatable transpose tree:

``````    static class NumberPropertiesTree extends TransposeTree {

final PrimitiveBlock values, finalDigit, numDigits, signBit;

protected NumberPropertiesTree() {
this.values     = new LongBlock();
this.finalDigit = new InflatableBlock();
this.numDigits  = new InflatableBlock();
this.signBit    = new InflatableBlock();
}

@Override
protected int compare(int index1, int index2) {
return Long.compare(value.get(index1), value.get(index2));
}
⋮
}
``````

(I’ve omitted some constructor arguments here for brevity.)

We’ve added field-arrays for `value` (the key, in this case), `finalDigit``numDigits`, and `signBit`. Together, the memory consumption per element is as follows:

• 64 bits for `value`;
• 42 bits for the red-black tree’s `left` and `right` field-arrays (inflated to 21 bits each, to accommodate array indexes up to a million)
• 1 bit for `isBlack` information; and
• 10 bits for the properties.

Total: 117 bits = 14.7 bytes per entry.

Of these 117 bits, 43 bits are for managing the tree; 64 bits are for storing the original `long` value; and the remaining 10 bits store the properties.

If we stop at this point, `NumberPropertiesTree` contains no public methods. To make the class usable, we’ll add two: the first is `propertiesFor`, which produces a stream of `Properties` objects matching a given query; the `Properties` objects will be constructed on-the-fly from the data stored in the array-fields.

``````        public Stream<Properties> propertiesFor(NodeLocator locator) {
return super.allIndexesMatching(locator).mapToObj(i->
new Properties(values.get(i), (byte)finalDigit.get(i), (byte)numDigits.get(i), signBit.get(i) == 1)
);
}
``````

The `allIndexesMatching` method is inherited from `TransposeTree`, and provides a stream containing the indexes of all the tree nodes matching the given criterion, in the form of a `NodeLocator`. The `NodeLocator` is a bit like Java’s `Comparator`: by returning a negative or positive value, it tells the tree search algorithm whether to go left or right; and given this information, `allIndexesMatching` returns the indexes of all nodes for which the locator returns zero.

The second public method is `add`, which computes properties for a given value and adds them to the tree:

``````        void add(long value) {
long abs = Math.abs(value);
values.set(insertionPoint(), value);
finalDigit.set(insertionPoint(), abs % 10);
numDigits.set(insertionPoint(), Long.toString(abs).length());
signBit.set(insertionPoint(), value >>> 63);
insert();
}
``````

Note how this operates: at any time, `TransposeTree.insertionPoint()` returns an unused index where the field values for a new tree entry can be stored. The subclass stores those field values, and then calls `insert()` to add the new item into the tree. Afterward, `insertionPoint` points to a new location for the next insertion.

`TransposeTree` also has a method called `shrinkwrap` that re-packs the data into smaller arrays, much like `ArrayList.trimToSize()`. On average, this can be expected to free about a quarter of each array, which is significant. By overriding `shrinkwrap`, our subclass can participate in this savings by shrinkwrapping all the array-fields too:

``````        @Override
public void shrinkwrap() {
super.shrinkwrap();
values.shrinkwrap();
finalDigit.shrinkwrap();
numDigits.shrinkwrap();
signBit.shrinkwrap();
}
``````

At this point, we have the fewest possible entries, each of which occupies the fewest possible bits. We’re ready to run some tests.

A disclaimer, though: Whatever we determine about memory consumption, never forget that the one using `TreeMap` was fully implemented in a dozen lines of code. Using `TransposeTree` is quite complicated by comparison, so it should be reserved for very large, regular data structures whose size would otherwise be prohibitive.

And now, on with the show.

# Experimental method and results

I measured the size of the two data structures by creating an object that points to one of each, inserting the same 1,000,000 random numbers, and then triggering a heap dump. The insertion code looked like this:

``````    static class ThingsToMeasure {
TreeMap<Long, Properties> treeMap = new TreeMap<>();
NumberPropertiesTree transposeTree = new NumberPropertiesTree(1);

void build() {
// Fill up a traditional TreeMap
//
numbers().forEach(number->treeMap.put(number, Properties.from(number)));

// Fill up a transpose tree
//
transposeTree.shrinkwrap();
}
}
``````

Here’s the resulting heap analysis:

As you can see, `treeMap` consumes 80MB, which is 640 bits per entry; while `transposeTree` consumes just 118 bits per entry. The `TransposeTree` is 5.4 times smaller than the `TreeMap`. The total overhead of object headers and references amounts to one-twentieth of one percent.

The price for this compact representation is that `Properties` objects need to be constructed on demand, while the `TreeMap` stores them all pre-built. There’s also a price to be paid in complexity: until we reach a glorious future in which the JVM can automatically transpose your data structures for you, the code to do this explicitly is undoubtedly cumbersome.

# Prototype code

I’ve placed a prototype of the `TransposeTree` code on in my personal GitHub account here.

This is not a ready-made library. It is the prototype-quality code I used to run these experiments, so you can better understand exactly how an inflatable transpose tree works.

I hope it will mature into a proper library one day, with a proper Maven/Ant setup, and documentation, and unit tests. It currently has none of these things.

Wait a minute—we predicted 117 bits per entry for the `TransposeTree`, and measured 118. Where’s the missing bit?

The answer is internal fragmentation inside `BitBlock`. It can fit only three 21-bit values in a single 64-bit `long`, and only twelve 5-bit values. In each case, this wastes one-third of a bit per value, and since we have three such fields (`prev``next`, and `numDigits`) we waste a total of one bit per entry.

In fact, now that we’re counting bits, we could get by with 20 bits for the `prev` and `next` fields, though we could still fit only three of them in a 64-bit `long`, so there’s really no point in ever having 20-bit fields in a `BitBlock`.

But if we’re counting bits to this extreme, where do we stop?

The `finalDigit` field can take on only ten different values. Information theory tells us that 10 equally likely values require only 3.322 bits each, but we allotted 4 bits, so there’s another 0.678 bits wasted. Perhaps we could pack three of them into ten bits, thereby using just 3⅓ bits each!

Shall we keep going?

The `numDigits` field can take any value from 1 to 19, which requires only 4.3 bits each, but we allotted 5 bits, wasting another 0.7 bits. Moreover, the values that this field takes on are not equally likely; in fact, they are highly skewed. Most of the numbers coming from `Random.nextLong()` have 19 or 18 digits. My back-of-the-envelope calculation gives a Shannon entropy of just 0.54 bits for this field!

There’s really no end to this process. Data compression is a fascinating topic that can employ endless ingenuity to squeeze more and more bits out of your data. Compression algorithms can and do become so advanced that they qualify as artificial intelligence.

But we’ve got to stop somewhere.

† Assuming compressed references, or a 32-bit JVM. For a 64-bit JVM with uncompressed references, this enum reference would occupy 64 bits.

‡ This is in Oracle and OpenJDK 64-bit with compressed references. For 32-bit, the header is 8 bytes; for uncompressed references, it’s 16 bytes. IBM’s Java has 8 bytes of overhead per object, for both 32-bit and 64-bit compressed; and if you promise never to synchronize on your objects, you can reduce this to 4 bytes using the (very oddly documented) gurus-only `-Xlockword` option. In all cases, arrays consume 4 extra bytes for the size.

### Posted by Patrick Doyle

Director of Engineering at Vena