Why it's wrong to explain Java semantics by looking at bytecode
There are many Java questions on Stack Overflow regarding how various language features work. Giving an answer by compiling some code and looking at the resulting bytecode is often misleading and formally speaking a logical fallacy. Here’s why…
The behavior of any given Java program is determined by the Java Language Specification (JLS). If you compile some Java code to bytecode, the result is just a “projection” of the original program. A projection whose meaning is not determined by JLS (JLS does in fact not even mention bytecode) but by another specification: the Java Virtual Machine Specification. Since there’s no strict 1-to-1 correspondence between the features of the two languages, any conclusions you draw from reasoning about the bytecode doesn’t neccesarily translate directly back to Java.
Here’s an analogy: Suppose someone asks
What does “你好，世界” mean in Chinese?
Then you can’t (in general) answer this reliably by first translating it to English and then use an English dictionary to explain the translated words. There may be multiple correct translations (just as when translating Java code to bytecode) leading to contradictory conclusions. The only sound approach is to look up the words in a Chinese dictionary, assuming you can read it.
Examples of fallacies
- Do interfaces inherit from Object class in java
The first comment suggests that OP should look at the generated class file, but the fact that a class file containing an interface has
java/lang/Objectset as the super class does not mean that interfaces inherit from
- String concatenation: concat() vs + operator
The accepted answer shows some compiled bytecode and presents that as “the” method of string concatenation. The only thing that can be said with certainty about string concatenation in the Java language is what §15.18.1. String Concatenation Operator + says (hint: it doesn’t mention
StringBuilder). In fact the javac implementation for string concatenation has changed a lot over time and the accepted answer is hoplessly outdated.
- Do Ternary and If/Else compile to the same thing, why?
The accepted answer referred to some bytecode to illustrate that if/else and ternary operators did not compile to the same thing. This statement is not supported by JLS. (This has been clarified since I pointed this out though.)
- Most questions on performance of various Java snippets
JLS doesn’t specify how long time statements should take to execute. If you ask me if
i++is faster than
i--I will ask you if you’re executing the Java code using pen and paper or some other method. If you want a decent answer, you’ll have to pin down which Java compiler you’re using, which JVM you’re using (assuming the aforementioned compiler targets bytecode), which hardware you’re executing on and so on.
When is it reasonable to refer to the bytecode?
There are cases when it’s logically sound to reason about the Java language by looking at the bytecode generated by a spec compliant compiler. Here are a couple of examples:
- You can illustrate that two snippets of Java code are equivalent by showing that both compile to the same bytecode
- Show that it’s possible for a language feature function in a certain way. One can for instance use a spec compliant compiler to compile
"a" + 1to show that this doesn’t necessarily create a new string object.
- You can prove that certain things are impossible. You can for instance show that generic types are not accessible in runtime by demonstrating how type erasure works in a spec compliant compiler.