Andreas Lundblad
Aug 24, 2016

Why it's wrong to explain Java semantics by looking at bytecode

There are many Java questions on Stack Overflow regarding how various language features work. Giving an answer by compiling some code and looking at the resulting bytecode is often misleading and formally speaking a logical fallacy. Here’s why…

The behavior of any given Java program is determined by the Java Language Specification (JLS). If you compile some Java code to bytecode, the result is just a “projection” of the original program. A projection whose meaning is not determined by JLS (JLS does in fact not even mention bytecode) but by another specification: the Java Virtual Machine Specification. Since there’s no strict 1-to-1 correspondence between the features of the two languages, any conclusions you draw from reasoning about the bytecode doesn’t neccesarily translate directly back to Java.

Here’s an analogy: Suppose someone asks

What does “你好,世界” mean in Chinese?

Then you can’t (in general) answer this reliably by first translating it to English and then use an English dictionary to explain the translated words. There may be multiple correct translations (just as when translating Java code to bytecode) leading to contradictory conclusions. The only sound approach is to look up the words in a Chinese dictionary, assuming you can read it.

Examples of fallacies

When is it reasonable to refer to the bytecode?

There are cases when it’s logically sound to reason about the Java language by looking at the bytecode generated by a spec compliant compiler. Here are a couple of examples:

  • You can illustrate that two snippets of Java code are equivalent by showing that both compile to the same bytecode
  • Show that it’s possible for a language feature function in a certain way. One can for instance use a spec compliant compiler to compile "a" + 1 to show that this doesn’t necessarily create a new string object.
  • You can prove that certain things are impossible. You can for instance show that generic types are not accessible in runtime by demonstrating how type erasure works in a spec compliant compiler.