crush depth

Mathematics With An Axe

I've reached peak frustration with jtensors.

The API is riddled with inconsistencies due to mistakes caused by the ridiculous amount of hand-specialization. The design of the API is also suboptimal on modern JVMs due to the use of interface types to abstract over vector implementations: Vector method call sites become megamorphic which prevents inlining and harms the ability of the JIT to produce good code.

The API also distinguishes between immutable and mutable vectors and matrices, the latter of which really only exist to allow for avoiding the allocation of temporary objects when working with vectors (and, via interface types, to mutate vectors held in off-heap memory). However, on modern JVMs that employ escape analysis, short-lived objects don't entail any allocations at all as long as the call sites that refer to them are at most bimorphic. The sheer number of interfaces and implementations prevents this important optimization. Without mutable vectors, these interfaces would most likely be pointless. If the API allowed the JVM's escape analysis to work well, the mutable vectors likely wouldn't be needed at all.

The API provides interfaces that abstract over readable and writable vectors so that APIs that use types from the jtensors package can specify types such as "any readable 4-element vector" and the like, without caring what the specific underyling type of vector is used. The interface types were originally introduced because I wanted to have lots of different vector implementations that had different approaches to storage. For example, some vectors might be backed by a ByteBuffer that contains IEEE754 Binary16-encoded ("half precision") values. Other vectors might be represented by pointers into large off-heap arrays.

So what's actually good about jtensors?

Personally, I find the use of static methods in the API to be more readable than other Java vector algebra libraries. For example, to me, this:

return add(v1, subtract(v2, v3));

... Reads a lot better than this:

return v1.add(v2.subtract(v3));

The API strongly distinguishes between immutable and mutable types to allow programmers to pick which guarantees they want. The API contains hand-specialized variants of vector and matrix types for float, double, long, and int. Finally, the API provides phantom typed variants of all of the types for enforcing the correctness of your mathematics at compile-time. I'm not aware of any other vector algebra package that provides this. This is extremely valuable when working with graphics systems! Matrix multiplication is not commutative and it's very easy to accidentally perform a multiplication in the wrong order. The usual result will be strange visual results or, even worse, a blank screen. Trying to track down bugs like this is mind-bendingly horrible so preventing as many of them as possible at compile-time is a must. The use of phantom types allows for writing code like this:

MatrixM4x4<Object, World> m_model;
MatrixM4x4<World, View> m_view;
MatrixM4x4<Object, View> m_modelview;

MatrixM4x4.multiply(m_view, m_model, m_modelview);

The multiply method takes a matrix of type MatrixM4x4<T, U>, a matrix of type MatrixM4x4<U, V> and writes the resulting multiplication to a matrix of type Matrix<T, V>. Any programmer familiar with something like OpenGL will have experienced the horror of accidentally switching the order of the matrices; the result is silent failure and blank screens. The use of phantom types in the jtensors API makes the above mistake a compile-time error. You are physically prevented from giving the matrices in the wrong order because the types won't line up. Additionally, they act as documentation. It's immediately obvious to anyone looking at the above that m_modelview is a matrix that transforms positions in Object space to their equivalent representation in View space. I've lost track of the number of times that I've been implementing graphics algorithms and have gotten coordinate spaces wrong because the original papers helpfully failed to specify them (and any example code had no way of expressing the coordinate spaces). The classic literature on normal mapping actually contained a serious error of this type as explained on The Tenth Planet blog and evidently nobody noticed it for years. Stronger types would have prevented it!

Finally, the implementation is heavily tested. The test suite may be the largest I've ever written and contains over 8000 test cases with 100% branch coverage. Algorithms have been checked against multiple textbook sources, all assumptions and conventions have been made explicit and documented, and the implementation results have been tested against results produced by multiple third-party implementations.

I have a ton of code that already depends on jtensors but I just can't bear to maintain it in its current form. Other Java vector algebra libraries do not have a feature set comparable to jtensors, so I can't just switch to one of those. In particular, I use the phantom typed API heavily. I'd like to do a clean-room rewrite of jtensors, fixing all of the above issues, generating as much of the code as possible, and drastically simplifying the implementation. I can't wait around for Java 10's value types, but I can at least reorganize things so that a transition to value types will be easier than it would be currently. I also now know much more about the shapes of code that modern JVMs like to consume than I did when I first started writing jtensors back in 2011. Indeed, those code shapes have changed since 2011! Don't forget that, at that time, the most commonly deployed version of Java was still Java 5! Escape analysis was added fairly early in Java 6's lifetime and has been heavily improved ever since.

So, what should a modern jtensors rewrite look like?

  1. Separate the types of tensors used for computation and storage.

    In other words, make any code that computes with tensors work purely with immutable tensors and keep that code strictly monomorphic. The package can still have mutable vectors and matrices for storage and can still abstract over storage tensors with interfaces, but the APIs for computing with tensors and matrices must yield monomorphic call sites to static methods for maximum performance.

    Additionally, because the types of computation and storage tensors are cleanly separated, the range of types of computation tensors can be limited to those directly supported by the JVM. In other words, tensors over int, long, float, and double because those are the four types that have bytecode instructions on the JVM. The API can also require that operations such as the dot product return a value of the highest-precision type variant applicable to the current type. That is, the dot product for int-typed vectors will be returned in a long value. The dot product for float-typed vectors will be returned in a double value, and so on. This will eliminate the annoying API inconsistencies I mentioned earlier.

    External APIs that used the interface types to accept "any readable 4-element vector" or "any writable vector" and the like should just accept immutable vectors of specific types. Tough luck.

  2. Generate as much as possible.

    The tensor types should be generated by Immutables and the computation APIs (including the test suite) should be generated using a template. No hand-specializaton. No hand-written equals, hashCode, toString, etc.

    This is the best that can be done without value types.

  3. Keep the phantom-typed variants.

    I actually use these more than I use the tensors that don't have type parameters.

  4. Provide a range of storage types.

    The computation types can be kept simple, immutable, and in a form that the JVM loves to compile as described above. The storage types, however, can be as JIT-hostile as they like without causing performance problems. IEEE754b16 matrices. Matrices stored in direct ByteBuffers. sun.misc.Unsafe! In addition, this may address performance problems like ticket 7 because intermediate computations won't incur the cost of reading from or writing to tensors with unusual storage characteristics.

  5. Handedness?

    I work in a right-handed coordinate system. jtensors has no support for anything else. Perhaps it'd be a good idea to mark those methods that give explicitly right-handed results as doing so, and then provide left-handed variants too?

    Even if no left-handed variants are provided at first, it'd make sense to do this to make the API clearer and to allow for the addition of left-handed variants at a later date whilst keeping the API consistent.

  6. Get it done in less than a month

    I've rewritten the jtensors codebase at least five times. With the addition of templating, I should be able to get the whole implementation done very quickly as there are essentially no unknowns. The main issue will then be updating all of the other packages that depend on jtensors. It'll be an enormously backwards-incompatible change, so I'll do the naming convention changes at the same time.

jtensors is dead. Long live jtensors.


Distraction Scenario

I have to admit: Reorganizing a codebase to move to generating code that I've already written (and rewritten several times over the past five years) is on the far side of tedious.

On the plus side, I just heard about Project Amber. This is almost certainly the start of the process to get algebraic data types into Java (and hopefully, the JVM infrastructure to allow for a common representation of those types between JVM languages).

Foolish Inconsistency

Been working on moving the jtensors codebase over to source generation as I mentioned previously. I've discovered some annoying inconsistencies in the API that are making it harder to generate the sources from a single template. For example, the VectorM4I type has a method that takes a double-typed parameter as a scaling value, but has a scaleInPlace method that takes an int-typed parameter as a scaling value.



The fact that this hasn't been noticed up until now is both evidence to the lack of utility of the latter method, and a testament to the fallibility of humans when it comes to performing repetitive tasks.

In any case, fixing the above would be a backwards-incompatible change and I'm really trying to avoid those until Java 9 appears. Likely going to deprecate the old methods and add new ones with the correct types.

japicmp update

Big thanks to Martin Mois for implementing a recent feature request to relax the rules for semantic versioning enforcement in japicmp when the current project version is less than 1.0.0.

My basic problem was that I wanted to configure the plugin once in the primogenitor POM and then have the semantic versioning check automatically start working when the API is marked as stable (in other words, when it reaches 1.0.0). Without the feature above, I'd have had to redeclare the plugin's configuration in every project, disable it, and then remember to re-enable it every time a project reached 1.0.0.

jregions 0.0.2

First public release of jregions.

Unfortunately, the Travis CI build is failing because 0.0.1 was never deployed to Central. This means that when japicmp tries to analyze the API against the old version of the library, it can't find the old version. This will self correct when 0.0.3 is released.

The next step is to move any libraries that were using jareas or jboxes over to jregions. That's jcanephora, r2, and jsycamore, at the very least.

That bin directory. No, not that one, the other one.

For about a week, I've been having DNS resolution issues on one server. The machine runs a tinydns server for publishing internal domain names, and it seemed that after roughly 24 hours of operation, the server would simply stop responding to DNS requests. After exhausting all of the obvious solutions, I restarted the jail that housed the daemon and everything mysteriously started working.

I checked the logs and suddenly realized that there were no messages in the log newer than about a week. I checked the process list for s6-log instances and noticed that no, there were no s6-log instances running in the jail. I checked /service/tinydns/log/run, which looked fine. I tried executing /service/tinydns/log/run and saw:

exec: /usr/local/sbin/s6-setuidgid: not found

OK. So...

# which s6-setuidgid

Apparently, at some point, the s6 binaries were moved from /usr/local/sbin to /usr/local/bin. This is not something I did! There was no indication of this happening in any recent port change entry nor anything in the s6 change log.

The "outage" was being caused by the way that logging is handled. The tinydns binary logs to stderr instead of using something like syslog, with the error messages being piped into a logging process in the manner of traditional UNIX pipes. This is normally a good thing, because syslog implementations haven't traditionally been very reliable. The problem occurs when the process that's reading from the standard error output of a preceding process stops reading. Sooner or later, any attempt made by the preceding process to write data to the output will block indefinitely (presumably, it doesn't happen immediately due to internal buffering by the operating system kernel). In a simple single-threaded design like that used by tinydns, this essentially means the process stops working as the write operation never completes and no other work can be performed in the mean time.

I'm currently going through all of the service entries to see if anything else has quietly broken. Perhaps I need process supervision for my process supervision.

Fighting fires


Had a minor outage yesterday apparently due to a bug in the older FreeBSD images that DigitalOcean provide. To their credit, they responded extremely quickly to my support request and provided a link to a GitHub repository with a fix that could be applied to live systems.

Glossaries And Modularization

A close friend of mine suggested that this blog might be more readable with a glossary, and I agreed. Computer science is fraught with terms that may have one of several similar-but-not-quite-the-same meanings depending upon the context in which they're used. I've added a glossary that I'll try to keep updated and will link to from the first uses of significant terms when they arise. Note that the glossary tends towards definitions from the perspective of this blog. That is, it lists only my intended definitions of terms as opposed to listing every possible accepted definition of each of them. I've gone back through old posts and tidied up the usages somewhat. I'll be making an effort to use more consistent terminology from now on.

I also took this opportunity to aggressively modularize the zeptoblog codebase and introduce a general API for implementing post generators. The glossary page is an implementation of a post generator that builds a static page from a database of definitions.


The Maven POM files in io7m projects contain a fair amount of duplication with respect to each other. The (possibly not entirely rational) reason for this is that when I started moving projects to Maven about five years ago, I had an instinctive lack of trust for project inheritance after seeing what a disaster inheritance usually implies in object-oriented programming languages. Five years of experience, however, have taught me that in a non-Turing-complete description language such as Maven's POM, the problems usually caused by inheritance tend not to occur. I can't give any formal reasoning for this, it's purely anecdotal.

I've introduced inheritance in steps: The root POM of each project does most of the work, and then each of the POMs in the modules of the project specify the bare minimum extra information such as dependencies, plugin executions, etc. In practice, most modules just specify some dependencies and an OSGi manifest.

This is fine, but it does mean that the root POMs of all of the projects mostly contain the same few hundred lines of XML. If I make a change to the logic in one POM that I think would be useful in other projects, I have to make the same change in those projects too. Therefore, I'd like to complete the progression and move towards all projects inheriting from a common primogenitor POM. In addition I'd like to add in some extra information such as inserting the current Git revision into produced JAR files, and statically analyzing the bytecode of compiled files to ensure that semantic versioning rules are being followed. I had a fairly productive conversation with Curtis Rueden about some practical aspects of this (Curtis maintains a very large collection of projects that inherit from a common root POM) and I've made a first attempt at a new primogenitor POM. I'll try moving a few of the more recent leaf projects such as jregions to this new root and see how things work out.

Source Generation

Spent most of yesterday and a fairly decent amount of time today replacing the bulk of the hand-specialized jregions code with code generated from a template instead. I don't typically like templating as a rule: If I'm going to be producing text for a language (such as Java code, XML, etc) that's then going to be parsed and consumed by an external system (such as a compiler, an XHTML renderer, etc) then constructing the text step-by-step using an AST representation guarantees that the output will be well-formed. Templating systems don't provide any guarantees. In this case though, the output of the templating system is going to be immediately consumed by the Java compiler, which is then going to indicate any syntax and/or type errors before any other system has a chance to consume the code.

Of the templating systems I know about, only StringTemplate seems to have been developed with any kind of discipline: The primary concept is the strict separation of presentation and logic. A StringTemplate template does not contain any logic and simply defines formal parameters that must be supplied with values when the template is rendered. Note the must here: Failing to provide a value for a parameter causes an error to be raised at rendering time. Most template systems (notably Maven's resource filtering) tend to silently fail or insert garbage in this situation.

I used Kevin Birch's StringTemplate Maven plugin to generate sources as part of Maven's generate-sources phase. It appears to work well, but has some nasty failure modes when individual templates can't be found. Basically, if you tell the plugin to open a template file that doesn't exist, or if the name of the template doesn't match the name of the file within which it is defined, you'll get an unhelpful error like this:

[ERROR] Failed to execute goal com.webguys:string-template-maven-plugin:1.1:render (generate-D) on project com.io7m.jregions.core: Unable to execute template. -> [Help 1]

Even with Maven's -X switch (enables the display of exception stack traces and other debugging output) there was no useful information available. I ended up using the little-known mvnDebug executable (bundled with every Maven install but apparently undocumented) to step through the execution of the plugin and work out what was going wrong. For those that don't know, mvnDebug loads a Java agent into the JVM that causes Maven to wait until an external debugger (in my case, Intellij IDEA) connects before running the build. It turned out that the Maven plugin wasn't exactly at fault: StringTemplate's internal APIs indicate failure by returning null and maybe writing an error message to a provided mutable sequence. In my case, there were several mistakes:

  1. I specified the name of a template file including the suffix: Internally, StringTemplate appends its own suffix, so the API was looking for, failing to find it, but not providing an error message.

  2. I specified the name of a template file that didn't exist at all. Same failure mode as the above.

  3. I specified the name of a template P. StringTemplate looked at the file, found the file and parsed it, but it turned out that actually contained a template called Q. Again, this resulted in StringTemplate being unable to find the template I'd named, but refusing to tell me about it.

When those mistakes were found and corrected, the rest of the error reporting was decent. Failing to provide template parameters, making syntax errors within templates, etc, all provided good error messages with line and column numbers.

The result of this is that a lot of the jregions source code (and test suite) is now generated from a common template. No more manual specialization. I also added float and BigDecimal specialized types to exercise the generation system during development. I suspect that I'm going to apply this same methodology to rewrite the jtensors codebase when Java gets value types.

Better Than 100%

Was amused by the pingdom test results for

The lesson is that a lot of engineering problems can be solved by refusing to do anything.


Have started unifying jareas and jboxes into a new project: jregions.

The original projects were written about three years apart and I'd not realized how much overlap there was between them until it was too late. This sort of code is a prime target for value types: There are four sets of specialized classes for int, long, double, and BigInteger coordinates because Java's generics don't allow for abstraction over primitive types without boxing. This is something that Brian Goetz has complained about frequently. To paraphrase, "you sometimes end up writing the same code eight times".

The jregions project is also a first attempt at moving to the OSGi conventions I mentioned previously. Thought I might as well use them for all new code and migrate the old code when JDK 9 appears.

The Question

The Question

Breaking compatibility in a patch release

Broke a pure-ftpd install this morning by recklessly failing to read the change log before upgrading. Missed this note for 1.0.44:

The Perl and Python wrappers are gone. The daemon can now use a configuration file without requiring external dependencies.

This meant that the s6 run script had to be updated:

exec /usr/local/sbin/ /ftpd/pure-ftpd.conf 2>&1


exec /usr/local/sbin/pure-ftpd /ftpd/pure-ftpd.conf 2>&1

The documentation was not updated. I had to work out how to get the server to consume the configuration file by guessing, and had to trace the executable with ktrace to make sure that it actually was reading the file.

I tend to forget that not all projects use semantic versioning and what I expected to be a simple bug-fix update from 1.0.43 to 1.0.45 turned out to be a service-disrupting change.

If you maintain software and you're reading this, please make your version numbers mean something!


Java Module Renaming

Right now, all io7m modules are consistently named. For version 1.0.0 of a given project p, the project usually has artifacts with coordinates such as the following:


I follow the Maven conventions with the addition of an io7m- prefix on artifact names. This helps ensure uniqueness with respect to other Java projects when considering artifact names in isolation; people who aren't me are relatively unlikely to prefix their project names with io7m-.

However, the conventions used for OSGi projects typically look something like:


Examples on Maven Central

I suspect that this naming convention is rooted in the way that OSGi implementations typically deploy bundles: Bundles are placed into a single directory which is polled frequently by the container, with new bundles being automatically deployed. With the old Maven convention, the artifact names can come into conflict when placed in a single directory:

com.acme.math:math:1.0.0    -> math-1.0.0.jar
org.example.math:math:1.0.0 -> math-1.0.0.jar

With the OSGi naming conventions, this would not occur:

com.acme.math:com.acme.math:1.0.0       -> com.acme.math-1.0.0.jar
org.example.math:org.example.math:1.0.0 -> org.example.math-1.0.0.jar

As I move all of my projects over to OSGi, I suspect that I'm going to make sweeping major-version-incrementing changes to all projects by changing their names to use the OSGi naming conventions. Personally, I find it more aesthetically pleasing anyway.

There is the possibility that changing the entire name of a project could be considered a non-compatibility-breaking change according to semantic versioning: If I change the name of the project, I can't be breaking anyone's code because there could be no code in existence that has been compiled against the new name. In order to signal a clean break, however, I'm going to treat it as one and increment the major version numbers everywhere. One minor issue is that JDK 9 is due out in a few months and when that happens, I'm going to move all projects to using Java 9 as a minimum requirement. That is, they're going to require JDK 9 to build and all produced artifacts will be Java 9 bytecode. This is without a doubt a compatibility-breaking change. It may be better to wait until JDK 9 is out and then do the project renames, module descriptors, and the bytecode version increment all in one go.

No One Loves Assembly

Deeply unimpressed by the complete lack of response to a report of a fairly serious bug in the Maven Assembly plugin. Polite requests for assistance on the developer mailing list went ignored. Hopefully this is just down to everyone being busy with the 3.5.0 release and not indicative of the Assembly plugin basically being abandonware.

Right now, if you want to create a distribution archive when using version ranges to refer to artifacts within the current reactor, you're basically out of luck.

2017-02-28: The title of this post has been modified in order to protect the pedantic.

2017-06-17: This turned out to be my fault. See: Assembly Redux


Huge respect for the LWJGL project with one of the best responses to a bug report I've ever had the pleasure to be involved with.

As a result, I'm now maintaining OSGi bundles for LWJGL. I also have working OSGi bundles for JogAmp but the JogAmp projects appear to be on life support at best.

Independent Module Versioning

Been experimenting with independent module versioning by developing an OSGi IRC bot. The idea is to expose any deficiencies in tools when modules within a single Maven project have different version numbers.

Initial indications are good!

One serious issue is that with independent versions, sooner or later there's going to be a release of the project where one or more modules haven't been updated and will therefore have the same version numbers as existing already-deployed modules. It's therefore going to be necessary to work out how to prevent Maven deploying bundles that already exist. Apparently, Charles Honton has a plugin for this.

Conceptually, moving to independently versioned modules means that a given project version now describes a set of module versions as opposed to simply defining a single version for all modules. I might rewrite changelog to better fit with this fundamental conceptual change.