Had a change of heart. Doing all of the package renames now rather than waiting for Java 9. I wrote:
There is the possibility that changing the entire name of a project could be considered a non-compatibility-breaking change according to semantic versioning...
I'm choosing to believe this is true and am renaming projects and modules without incrementing the major version number. I'm using japicmp to verify that I'm not introducing binary or source incompatible changes.
Sometimes, what you really need is a mutable, boxed integer.
While updating jcanephora, I
discovered that I needed to update jpra
to use the new jtensors
types. Whilst doing this, I discovered that the new simplified
implementation of the ByteBuffer
based storage tensors that
I'd implemented were too simple: The jpra
package made use
of the cursor-like API that the old jtensors-bytebuffered
package provided. I'd not provided anything analogous to this
in the new API, so I had to do some rewriting. In the process,
I discovered that the code that jpra
generated was using an
AtomicLong
value to store the current byte offset value. The
reason it used an AtomicLong
value was simply because there was
no mutable, boxed long
value in the Java standard library. To
remedy this, I've created a trivial mutable numbers package
upon which the com.io7m.jtensors.storage.bytebuffered
and
com.io7m.jpra.runtime.java
modules now depend. I should have done
this years ago but didn't, for whatever reason.
https://github.com/io7m/jmutnum
It may be the least interesting software package I've ever written.
Going to start working on moving
jcanephora to jtensors
8.0.0-SNAPSHOT in order to flush
out any problems with jtensors
before I try to do a stable 8.0.0
release.
The jtensors implementation
is basically done. I need to release the 1.0.0
version of
the primogenitor,
though, and I can't do this until the 0.10.0
version of
japicmp is released.
I like this sort of pure code because it allows for property-based testing ala QuickCheck. The general idea is to specify mathematical properties of the code abstractly and then check to see if those properties hold concretely for a large set of randomly selected inputs. In the absense of tools to formally prove properties about code, this kind of property-based testing is useful for checking the likelihood that the code is correct. For example, the test suite now has methods such as:
/** * ∀ v0 v1. add(v0, v1) == add(v1, v0) */ @Test @PercentagePassing public void testAddCommutative() { final Generator<Vector4D> gen = createGenerator(); final Vector4D v0 = gen.next(); final Vector4D v1 = gen.next(); final Vector4D vr0 = Vectors4D.add(v0, v1); final Vector4D vr1 = Vectors4D.add(v1, v0); checkAlmostEquals(vr0.x(), vr1.x()); checkAlmostEquals(vr0.y(), vr1.y()); checkAlmostEquals(vr0.z(), vr1.z()); checkAlmostEquals(vr0.w(), vr1.w()); }
Of course, in Haskell this would be somewhat less verbose:
quickCheck (\(v0, v1) -> almostEquals (add v0 v1) (add v1 v0))
The @PercentagePassing
annotation marks the test as being executed
2000
times (by default) with at least 95%
(by default) of the
executions being required to pass in order for the test to pass as
a whole. The reason that the percentage isn't 100%
is due to numerical
imprecision: The nature of floating point
numbers means that it's really only practical to try to determine if
two numbers are equal to each other within an acceptable margin of
error. Small (acceptable) errors can creep in during intermediate
calculations such that if the two results were to be compared for
exact equality, the tests would almost always fail. Sometimes, the
errors are large enough that although the results are "correct", they
fall outside of the acceptable range of error for the almost equals
check to succeed.
There's a classic (and pretty mathematically intense) paper on this called "What Every Computer Scientist Should Know About Floating-Point Arithmetic". This was given an extensive treatment by Bruce Dawson and his explanations formed the basis for my jequality package. I actually tried to use junit's built-in floating point comparison assertions for the test suite at first, but they turned out to be way too unreliable.
Update: Without even an hour having passed since this post was published, japicmp 0.10.0 has been released!
I've reached peak frustration with jtensors.
The API is riddled with inconsistencies due to mistakes caused by the ridiculous amount of hand-specialization. The design of the API is also suboptimal on modern JVMs due to the use of interface types to abstract over vector implementations: Vector method call sites become megamorphic which prevents inlining and harms the ability of the JIT to produce good code.
The API also distinguishes between immutable and mutable vectors and matrices, the latter of which really only exist to allow for avoiding the allocation of temporary objects when working with vectors (and, via interface types, to mutate vectors held in off-heap memory). However, on modern JVMs that employ escape analysis, short-lived objects don't entail any allocations at all as long as the call sites that refer to them are at most bimorphic. The sheer number of interfaces and implementations prevents this important optimization. Without mutable vectors, these interfaces would most likely be pointless. If the API allowed the JVM's escape analysis to work well, the mutable vectors likely wouldn't be needed at all.
The API provides interfaces that abstract over readable and
writable vectors so that APIs that use types from the jtensors
package can specify types such as "any readable 4-element vector"
and the like, without caring what the specific underyling type
of vector is used. The interface types were originally introduced
because I wanted to have lots of different vector implementations
that had different approaches to storage. For example, some
vectors might be backed by a ByteBuffer
that contains IEEE754
Binary16-encoded ("half precision")
values. Other vectors might be represented by pointers into large
off-heap arrays.
So what's actually good about jtensors
?
Personally, I find the use of static methods in the API to be more readable than other Java vector algebra libraries. For example, to me, this:
return add(v1, subtract(v2, v3));
... Reads a lot better than this:
return v1.add(v2.subtract(v3));
The API strongly distinguishes between immutable and mutable types
to allow programmers to pick which guarantees they want. The API contains
hand-specialized variants of vector and matrix types for float
, double
,
long
, and int
. Finally, the API provides phantom typed
variants of all of the types for enforcing the correctness of your
mathematics at compile-time. I'm not aware of any other vector algebra
package that provides this. This is extremely valuable when working with
graphics systems! Matrix multiplication is not commutative and it's
very easy to accidentally perform a multiplication in the wrong
order. The usual result will be strange visual results or, even worse,
a blank screen. Trying to track down bugs like this is mind-bendingly
horrible so preventing as many of them as possible at compile-time is
a must. The use of phantom types allows for writing code like this:
MatrixM4x4<Object, World> m_model; MatrixM4x4<World, View> m_view; MatrixM4x4<Object, View> m_modelview; MatrixM4x4.multiply(m_view, m_model, m_modelview);
The multiply
method takes a matrix of type MatrixM4x4<T, U>
,
a matrix of type MatrixM4x4<U, V>
and writes the resulting
multiplication to a matrix of type Matrix<T, V>
. Any programmer
familiar with something like OpenGL will
have experienced the horror of accidentally switching the order of
the matrices; the result is silent failure and blank screens. The
use of phantom types in the jtensors
API makes the above mistake
a compile-time error. You are physically prevented from giving the
matrices in the wrong order because the types won't line up. Additionally,
they act as documentation. It's immediately obvious to anyone looking
at the above that m_modelview
is a matrix that transforms positions
in Object
space to their equivalent representation in View
space.
I've lost track of the number of times that I've been implementing
graphics algorithms and have gotten coordinate spaces wrong because
the original papers helpfully failed to specify them (and any example
code had no way of expressing the coordinate spaces). The classic
literature on normal mapping actually contained a serious error of this
type as explained on The Tenth Planet
blog and evidently nobody noticed it for years. Stronger types would
have prevented it!
Finally, the implementation is heavily tested. The test suite may be the largest I've ever written and contains over 8000 test cases with 100% branch coverage. Algorithms have been checked against multiple textbook sources, all assumptions and conventions have been made explicit and documented, and the implementation results have been tested against results produced by multiple third-party implementations.
I have a ton of code that already depends on jtensors
but I just
can't bear to maintain it in its current form. Other Java vector
algebra libraries do not have a feature set comparable to jtensors
,
so I can't just switch to one of those. In particular, I use the
phantom typed API heavily. I'd like to do a clean-room rewrite of
jtensors
, fixing all of the above issues, generating as much of
the code as possible, and drastically simplifying the implementation.
I can't wait around for Java 10's value types, but I can at least
reorganize things so that a transition to value types will be easier
than it would be currently. I also now know much more about the
shapes of code that modern JVMs like to consume than I did when I
first started writing jtensors back in 2011. Indeed, those code
shapes have changed since 2011! Don't forget that, at that time,
the most commonly deployed version of Java was still Java 5! Escape
analysis was added fairly early in Java 6's lifetime and has been
heavily improved ever since.
So, what should a modern jtensors
rewrite look like?
Separate the types of tensors used for computation and storage.
In other words, make any code that computes with tensors work purely with immutable tensors and keep that code strictly monomorphic. The package can still have mutable vectors and matrices for storage and can still abstract over storage tensors with interfaces, but the APIs for computing with tensors and matrices must yield monomorphic call sites to static methods for maximum performance.
Additionally, because the types of computation and storage
tensors are cleanly separated, the range of types of computation
tensors can be limited to those directly supported by the JVM. In
other words, tensors over int
, long
, float
, and double
because those are the four types that have bytecode instructions
on the JVM. The API can also require that operations such as the dot
product return a value of the
highest-precision type variant applicable to the current type. That
is, the dot product for int
-typed vectors will be returned in a
long
value. The dot product for float
-typed vectors will be
returned in a double
value, and so on. This will eliminate the
annoying API inconsistencies I mentioned earlier.
External APIs that used the interface types to accept "any readable 4-element vector" or "any writable vector" and the like should just accept immutable vectors of specific types. Tough luck.
Generate as much as possible.
The tensor types should be generated by Immutables
and the computation APIs (including the test suite) should be
generated using a template. No hand-specializaton. No hand-written
equals
, hashCode
, toString
, etc.
This is the best that can be done without value types.
Keep the phantom-typed variants.
I actually use these more than I use the tensors that don't have type parameters.
Provide a range of storage types.
The computation types can be kept simple, immutable, and in a form
that the JVM loves to compile as described above. The storage
types, however, can be as JIT-hostile as they like without
causing performance problems. IEEE754b16
matrices. Matrices
stored in direct ByteBuffers
. sun.misc.Unsafe
! In
addition, this may address performance problems like ticket
7 because intermediate
computations won't incur the cost of reading from or writing to
tensors with unusual storage characteristics.
Handedness?
I work in a right-handed coordinate
system. jtensors
has no support for anything else. Perhaps it'd be
a good idea to mark those methods that give explicitly right-handed
results as doing so, and then provide left-handed variants too?
Even if no left-handed variants are provided at first, it'd make sense to do this to make the API clearer and to allow for the addition of left-handed variants at a later date whilst keeping the API consistent.
Get it done in less than a month
I've rewritten the jtensors
codebase at least five times. With
the addition of templating, I should be able to get the whole
implementation done very quickly as there are essentially no
unknowns. The main issue will then be updating all of the other
packages that depend on jtensors
. It'll be an enormously
backwards-incompatible change, so I'll do the naming convention changes
at the same time.
jtensors
is dead. Long live jtensors
.