crush depth

Mathematics With An Axe

I've reached peak frustration with jtensors.

The API is riddled with inconsistencies due to mistakes caused by the ridiculous amount of hand-specialization. The design of the API is also suboptimal on modern JVMs due to the use of interface types to abstract over vector implementations: Vector method call sites become megamorphic which prevents inlining and harms the ability of the JIT to produce good code.

The API also distinguishes between immutable and mutable vectors and matrices, the latter of which really only exist to allow for avoiding the allocation of temporary objects when working with vectors (and, via interface types, to mutate vectors held in off-heap memory). However, on modern JVMs that employ escape analysis, short-lived objects don't entail any allocations at all as long as the call sites that refer to them are at most bimorphic. The sheer number of interfaces and implementations prevents this important optimization. Without mutable vectors, these interfaces would most likely be pointless. If the API allowed the JVM's escape analysis to work well, the mutable vectors likely wouldn't be needed at all.

The API provides interfaces that abstract over readable and writable vectors so that APIs that use types from the jtensors package can specify types such as "any readable 4-element vector" and the like, without caring what the specific underyling type of vector is used. The interface types were originally introduced because I wanted to have lots of different vector implementations that had different approaches to storage. For example, some vectors might be backed by a ByteBuffer that contains IEEE754 Binary16-encoded ("half precision") values. Other vectors might be represented by pointers into large off-heap arrays.

So what's actually good about jtensors?

Personally, I find the use of static methods in the API to be more readable than other Java vector algebra libraries. For example, to me, this:

return add(v1, subtract(v2, v3));

... Reads a lot better than this:

return v1.add(v2.subtract(v3));

The API strongly distinguishes between immutable and mutable types to allow programmers to pick which guarantees they want. The API contains hand-specialized variants of vector and matrix types for float, double, long, and int. Finally, the API provides phantom typed variants of all of the types for enforcing the correctness of your mathematics at compile-time. I'm not aware of any other vector algebra package that provides this. This is extremely valuable when working with graphics systems! Matrix multiplication is not commutative and it's very easy to accidentally perform a multiplication in the wrong order. The usual result will be strange visual results or, even worse, a blank screen. Trying to track down bugs like this is mind-bendingly horrible so preventing as many of them as possible at compile-time is a must. The use of phantom types allows for writing code like this:

MatrixM4x4<Object, World> m_model;
MatrixM4x4<World, View> m_view;
MatrixM4x4<Object, View> m_modelview;

MatrixM4x4.multiply(m_view, m_model, m_modelview);

The multiply method takes a matrix of type MatrixM4x4<T, U>, a matrix of type MatrixM4x4<U, V> and writes the resulting multiplication to a matrix of type Matrix<T, V>. Any programmer familiar with something like OpenGL will have experienced the horror of accidentally switching the order of the matrices; the result is silent failure and blank screens. The use of phantom types in the jtensors API makes the above mistake a compile-time error. You are physically prevented from giving the matrices in the wrong order because the types won't line up. Additionally, they act as documentation. It's immediately obvious to anyone looking at the above that m_modelview is a matrix that transforms positions in Object space to their equivalent representation in View space. I've lost track of the number of times that I've been implementing graphics algorithms and have gotten coordinate spaces wrong because the original papers helpfully failed to specify them (and any example code had no way of expressing the coordinate spaces). The classic literature on normal mapping actually contained a serious error of this type as explained on The Tenth Planet blog and evidently nobody noticed it for years. Stronger types would have prevented it!

Finally, the implementation is heavily tested. The test suite may be the largest I've ever written and contains over 8000 test cases with 100% branch coverage. Algorithms have been checked against multiple textbook sources, all assumptions and conventions have been made explicit and documented, and the implementation results have been tested against results produced by multiple third-party implementations.

I have a ton of code that already depends on jtensors but I just can't bear to maintain it in its current form. Other Java vector algebra libraries do not have a feature set comparable to jtensors, so I can't just switch to one of those. In particular, I use the phantom typed API heavily. I'd like to do a clean-room rewrite of jtensors, fixing all of the above issues, generating as much of the code as possible, and drastically simplifying the implementation. I can't wait around for Java 10's value types, but I can at least reorganize things so that a transition to value types will be easier than it would be currently. I also now know much more about the shapes of code that modern JVMs like to consume than I did when I first started writing jtensors back in 2011. Indeed, those code shapes have changed since 2011! Don't forget that, at that time, the most commonly deployed version of Java was still Java 5! Escape analysis was added fairly early in Java 6's lifetime and has been heavily improved ever since.

So, what should a modern jtensors rewrite look like?

  1. Separate the types of tensors used for computation and storage.

    In other words, make any code that computes with tensors work purely with immutable tensors and keep that code strictly monomorphic. The package can still have mutable vectors and matrices for storage and can still abstract over storage tensors with interfaces, but the APIs for computing with tensors and matrices must yield monomorphic call sites to static methods for maximum performance.

    Additionally, because the types of computation and storage tensors are cleanly separated, the range of types of computation tensors can be limited to those directly supported by the JVM. In other words, tensors over int, long, float, and double because those are the four types that have bytecode instructions on the JVM. The API can also require that operations such as the dot product return a value of the highest-precision type variant applicable to the current type. That is, the dot product for int-typed vectors will be returned in a long value. The dot product for float-typed vectors will be returned in a double value, and so on. This will eliminate the annoying API inconsistencies I mentioned earlier.

    External APIs that used the interface types to accept "any readable 4-element vector" or "any writable vector" and the like should just accept immutable vectors of specific types. Tough luck.

  2. Generate as much as possible.

    The tensor types should be generated by Immutables and the computation APIs (including the test suite) should be generated using a template. No hand-specializaton. No hand-written equals, hashCode, toString, etc.

    This is the best that can be done without value types.

  3. Keep the phantom-typed variants.

    I actually use these more than I use the tensors that don't have type parameters.

  4. Provide a range of storage types.

    The computation types can be kept simple, immutable, and in a form that the JVM loves to compile as described above. The storage types, however, can be as JIT-hostile as they like without causing performance problems. IEEE754b16 matrices. Matrices stored in direct ByteBuffers. sun.misc.Unsafe! In addition, this may address performance problems like ticket 7 because intermediate computations won't incur the cost of reading from or writing to tensors with unusual storage characteristics.

  5. Handedness?

    I work in a right-handed coordinate system. jtensors has no support for anything else. Perhaps it'd be a good idea to mark those methods that give explicitly right-handed results as doing so, and then provide left-handed variants too?

    Even if no left-handed variants are provided at first, it'd make sense to do this to make the API clearer and to allow for the addition of left-handed variants at a later date whilst keeping the API consistent.

  6. Get it done in less than a month

    I've rewritten the jtensors codebase at least five times. With the addition of templating, I should be able to get the whole implementation done very quickly as there are essentially no unknowns. The main issue will then be updating all of the other packages that depend on jtensors. It'll be an enormously backwards-incompatible change, so I'll do the naming convention changes at the same time.

jtensors is dead. Long live jtensors.

Plans