OCaml mimicry in Java

Yaron Minsky from Jane Street gave a great talk on OCaml that serves as a good introduction for someone who is accustomed to Java. In particular, he gives an example for manipulating boolean algebra in just a few lines of code; the corresponding Java program is easily ten times longer.

This led me to ask: how close can we get to the succinctness of OCaml for this example, using only pure Java?

What “pure Java” means depends a little on your point of view, so to be clear, let me lay down the ground rules:

  • No native code that doesn’t ship with the JVM.
  • No “preprocessing step” generating code that is then compiled against the user’s code.
  • Dynamic bytecode generation is OK, because that can be done entirely in pure Java.
  • External libraries are OK if they follow these rules.

Boolean expression processing

Here’s the OCaml code I’ll be trying to emulate. This is Yaron’s code plus a few lines that evaluate an expression:

type 'a expr = | True  
               | False
               | And of 'a expr * 'a expr
               | Or  of 'a expr * 'a expr
               | Not of 'a expr
               | Base of 'a

let rec eval eval_base expr =  
    let eval' x = eval eval_base x in
    match expr with
    | True  -> true
    | False -> false
    | Base base  -> eval_base base
    | And (x,y)  -> eval' x && eval' y
    | Or  (x,y)  -> eval' x || eval' y
    | Not  x     -> not (eval' x)

let myExpression = Or(True, Not False)  
let eval_base = (fun f -> not (f = 0))  
let result = eval eval_base myExpression  

The first clause lists all the possible types of boolean expressions, and the second clause describes how to evaluate each one.

Our objective is to achieve the same level of type safety and pattern matching as this OCaml code, as succinctly as possible.

Type switches in Java

The OCaml code is using a simple form of pattern matching called a type switch. This coding style collects in one place all implementations of one function for all possible types. What’s more, it verifies that you have indeed listed all possible types; OCaml will report an error if you’ve forgotten any.

Java does not use type switches as an organizing principle. Instead, it uses class-based polymorphism. Rather than collect together one function for all possible types, it collects together all functions for one type, and calls that a “class”.

Both organizations have their uses: type switches make it easy to add new functions, while class-based polymorphism makes it easy to add new types.

Java has no built-in type switch syntax. Implementing a type switch in a class-based language entails writing a visitor interface that lists every possible type, and then teaching each type how to call the right method of that interface.

Let’s start writing the above example in Java by creating a visitor class that we’ll call BoolSwitch, because we’re going to use it as a type switch on boolean expressions. We’ll also declare the type BoolExpr for the boolean expressions themselves. The one thing we know all expressions have in common is that they know how to apply themselves to a given visitor object, by calling the appropriate visitor method.

So far, the code would look like this:

interface BoolSwitch<T> {  
    T True();
    T False();
    T And(BoolExpr x, BoolExpr y);
    T Or(BoolExpr x, BoolExpr y);
    T Not(BoolExpr x);
    T Base(Supplier<Boolean> x);
}

interface BoolExpr {  
    <T> T apply(BoolSwitch<T> arg);
}

The BoolExpr.apply method’s job is simply to call the appropriate method of BoolSwitch, and then return whatever that method returns. BoolSwitch is declared with a generic type parameter T, thereby allowing you to create type-switch statements returning any type of value. For example, you could write a BoolSwitch<String> that converts an expression to a string, and BoolSwitch<Boolean> that evaluates the expression.

Note that this BoolSwitch declaration looks a lot like the OCaml type declaration. If you squint.

Having declared this BoolSwitch interface, we can now use it to evaluate boolean expressions, like this:

// Declare the evaluation function
Function<BoolExpr, Boolean> eval = b->b.apply(new BoolSwitch<Boolean>() {  
    public Boolean True()                      { return true; }
    public Boolean False()                     { return false; }
    public Boolean And(BoolExpr x, BoolExpr y) { return x.apply(this) & y.apply(this); }
    public Boolean Or(BoolExpr x, BoolExpr y)  { return x.apply(this) | y.apply(this); }
    public Boolean Not(BoolExpr x)             { return !x.apply(this); }
    public Boolean Base(Supplier<Boolean> x)   { return x.get(); }
});

// Evaluate an expression
Boolean result = eval.apply(myExpression);  

But where did this myExpression come from?

Instantiating BoolExpr

One neat trick is to use a BoolSwitch to build the expression trees in the first place. In other words, this one visitor interface can serve double duty: not only can it process boolean expressions; it can also build them:

BoolSwitch<BoolExpr> bb = createBuilder();  
BoolExpr myExpression = bb.Or(bb.True(), bb.Not(bb.False()));  

Now, what is inside createBuilder?

Well, it needs to return an object that implements BoolSwitch<BoolExpr>. That object’s implementations of TrueFalseAnd, and so on, must return instances of BoolExpr, as shown in the example above. Those BoolExpr instances must, in turn, provide implementations of the apply method that call the corresponding method (TrueFalseAnd, etc.) on a given visitor. Those BoolExpr objects must hold on to the method arguments supplied when they were created, so they can pass those arguments along when applied to a visitor.

Yadda yadda yadda. Yep, this amounts to a lot of code, but it is all fairly routine. In fact, it is so routine that it could be automated. And we can generate it using pure Java!

So that’s what I did.

I’ve used the javassist library to write an implementation of createBuilder that generates all the necessary classes, with the necessary methods and fields. Using this tool, the code above is all you need to write!

The Java solution

To recap, here is the entire Java solution:

interface BoolSwitch<T> {  
    T True();
    T False();
    T And(BoolExpr x, BoolExpr y);
    T Or(BoolExpr x, BoolExpr y);
    T Not(BoolExpr x);
    T Base(Supplier<Boolean> x);
}

interface BoolExpr {  
    <T> T apply(BoolSwitch<T> arg);
}

public void testBoolean() throws Exception {  
    Function<BoolExpr, Boolean> eval = b->b.apply(new BoolSwitch<Boolean>() {
        public Boolean True()                      { return true; }
        public Boolean False()                     { return false; }
        public Boolean And(BoolExpr x, BoolExpr y) { return x.apply(this) & y.apply(this); }
        public Boolean Or(BoolExpr x, BoolExpr y)  { return x.apply(this) | y.apply(this); }
        public Boolean Not(BoolExpr x)             { return !x.apply(this); }
        public Boolean Base(Supplier<Boolean> x)   { return x.get(); }
    });

    BoolSwitch<BoolExpr> bb = createBuilder(BoolSwitch.class, BoolExpr.class);
    BoolExpr myExpression = bb.Or(bb.True(), bb.Not(bb.False()));
    Boolean result = eval.apply(myExpression);
}

This is the best I could do. If you compare it against Yaron’s original OCaml code, mine is much more verbose, because this is not Java’s usual mode of operation, and because Java doesn’t have type inference; but the important thing is that the two solutions have the same form. Unlike a more idiomatic Java solution, this code doesn’t contain a horde of declarations that OCaml doesn’t need. It just has a lot more syntactic vinegar sprinkled all over it.

It should also be noted that the Java solution is far more limited. For one thing, there’s no actual pattern matching going on. You can only match trivial patterns based on the type of the object at the top of the expression tree. That’s because we’re using method dispatch for pattern matching, and Java dispatches methods only on their first argument (the method’s receiver).

Another shortcoming is that the base expressions are assumed to be Supplier<Boolean>, while the OCaml solution can take base expressions of any type. I could have supported this in Java using more generic type parameters, but without inference, this gets pretty unwieldy pretty quickly.

Finally, and perhaps most importantly, the Java version also doesn’t actually declare subtypes for the various boolean expressions: they’re just BoolExpr. A serious consequence of this is that you can’t access members of a BoolExpr subtype (because, again, there actually are no subtypes). For example, if you have an And expression and you want to access its first child, the only way to do that is to use a BoolSwitch to fully unpack the And object and get its first child, which is exceedingly cumbersome.

So no, I don’t recommend you actually use this technique. But it was fun to try it.

Addendum: The builder-generator

This method is pretty complicated, but at least you only need to write it once…

static <T> T createBuilder(Class<T> switchClassInterface, Class<?> closureInterface) throws NotFoundException, CannotCompileException, InstantiationException, IllegalAccessException {  
    ClassPool pool = ClassPool.getDefault();
    CtClass jlObject = pool.get(Object.class.getName());
    CtClass switchClass = pool.get(switchClassInterface.getName());

    // Create the builder class
    //
    CtClass builder = pool.makeClass(switchClass.getName() + "$$_builder");
    builder.addInterface(switchClass);
    builder.addConstructor(CtNewConstructor.defaultConstructor(builder));

    // Create classes/methods for each method in switchClass
    //
    int uniqueNumber = 0;
    for (CtMethod method: switchClass.getMethods()) {
        if ((method.getModifiers() & Modifier.ABSTRACT) == 0) {
            continue;
        }

        // Create the closure class for this method
        //
        CtClass closureClass = pool.makeClass(switchClass.getName() + "$$_closure_" + method.getName() + "_" + (uniqueNumber++));
        closureClass.addInterface(pool.get(closureInterface.getName()));
        int parmCount = 0;
        for (CtClass parm: method.getParameterTypes()) {
            String fieldDecl = "final " + parm.getName() + " p" + (++parmCount) + ";";
            closureClass.addField(CtField.make(fieldDecl, closureClass));
        }

        // Implement the constructor
        //
        CtConstructor closureConstructor = new CtConstructor(method.getParameterTypes(), closureClass);
        closureClass.addConstructor(closureConstructor);
        StringBuilder constructorBody = new StringBuilder("{\n");
        for (int i = 1; i <= method.getParameterTypes().length; i++) {
            constructorBody.append("p" + i + " = $" + i + ";\n");
        }
        closureConstructor.setBody(constructorBody.append("}").toString());

        // Implement apply
        //
        CtMethod concreteUnpack = new CtMethod(
            jlObject,
            "apply",
            new CtClass[]{ switchClass },
            closureClass);
        closureClass.addMethod(concreteUnpack);
        StringBuilder unpackBody = new StringBuilder("{ return ((" + switchClass.getName() + ")$1)." + method.getName() + "(");
        String sep = "";
        for (int i = 1; i <= method.getParameterTypes().length; i++) {
            unpackBody.append(sep).append("p").append(i);
            sep = ", ";
        }
        concreteUnpack.setBody(unpackBody.append("); }").toString());

        // Implement the factory method
        //
        CtMethod factory = new CtMethod(
            method.getReturnType(),
            method.getName(),
            method.getParameterTypes(),
            builder);
        builder.addMethod(factory);
        StringBuilder factoryBody = new StringBuilder("{ return new " + closureClass.getName() + "(");
        sep = "";
        for (int i = 1; i <= method.getParameterTypes().length; i++) {
            factoryBody.append(sep).append("$").append(i);
            sep = ", ";
        }
        String bodyString = factoryBody.append("); }").toString();
        factory.setBody(bodyString);

        // Undo the abstractification and load the class
        //
        closureClass.setModifiers(closureClass.getModifiers() & ~Modifier.ABSTRACT);
        closureClass.toClass();
    }

    // Undo more abstractification
    //
    builder.setModifiers(builder.getModifiers() & ~Modifier.ABSTRACT);

    // Create the class, then instantiate it
    //
    return (T) builder.toClass().newInstance();
}

Posted by Patrick Doyle

Director of Engineering at Vena