It's been about eight months since the release of Java 9.
I won't bore anyone with the details of everything it introduced as that information is available just about anywhere you care to look. The main thing it added, however, is the subject of this post: The Java Platform Module System.
The JPMS was introduced, and programmers reacted in the normal way that people working in a field that's supposed to demand a rational and critical mind react: They started frothing at the mouth, claiming that Oracle were trying to kill Java, and planning a mass exodus to other more hip languages. Kotlin or Rust, probably. Needless to say, Oracle weren't trying to kill Java. If Oracle wanted to kill Java, they could certainly find a way to do it that didn't require seven expensive years of intense, careful, and painful software engineering. So far, the exodus appears to have been quietly called off.
I'm guessing the people that complained the loudest are the sort of people that write a ton of fragile, error-prone, unsupported reflection hacks and then spew bile when some tiny aspect of the platform changes and their code breaks.
I have a ton of code, none of which I'd describe as legacy. I'm somewhat invested in OSGi for reasons I'll go into shortly.
Today: I'm conflicted and slightly nervous!
I decided when Java 9 came out that I was going to pursue full modularization for all of my projects. As I've said before, my code is already modular because I've been designing it around OSGi. However, in order for it to become modular in the JPMS sense, I'd have to write module descriptors for each project.
I'm developing a commercial (but open-source) game, where the game itself and third party modifications are handled by a well-specified, strongly-versioned module system. Consider something that (from the user's perspective) behaves a bit like the system in OpenTTD (except with the entire game delivered this way, not just third-party addons):
I briefly considered stopping using OSGi and using the JPMS to power the whole system. That experiment fairly quickly ended, though. Let's look at some of the things that OSGi does or has that are important to me:
OSGi has a full repository specification that gives applications the ability to download and install packages at run-time. You tell the system what packages you want, and it fetches those and all of their dependencies. Everything is standardized, from the API to the actual metadata served by repository implementations. Here's an example of a standard repository index (albeit styled with an XSL stylesheet).
OSGi bundles have rich metadata associated with them, including very fine-grained and versioned dependency metadata. If you have a bundle, you can find out very easily what else you need in order to be able to use it. The metadata can also express things beyond simple package dependencies.
OSGi can load and unload modules at run-time, and has standard APIs to do so (and standard APIs that application code can implement in order to react to modules being loaded and unloaded).
OSGi has wide support; it's fairly rare these days to find a Java project that isn't also implicitly an OSGi project. Making something OSGi-compatible is often just a case of adding a single Maven plugin invocation (assuming that the code doesn't make a lot of unfounded assumptions about class loaders).
Let's see what the JPMS has (or hasn't), in comparison:
The JPMS doesn't have anything like a repository specification. The closest analogy is probably fetching things off of Maven Central, but there's no standard API for doing this. Essentially, obtaining modules is Someone Else's Problem.
JPMS modules don't have much metadata associated with them. A module says what packages it exports and upon which other modules it depends. However, it doesn't say anything about which versions of those modules are required. Essentially, you're expected to use a system like Maven. This is Someone Else's Problem.
The JPMS can, in some sense, load modules at run-time via ModuleLayers. There's no support for explicitly unloading a module. There are no standard APIs that code can use to be notified that a module has been loaded. Essentially, you're expected to build your own ad-hoc OSGi-like system on top of module layers. This is Someone Else's Problem.
Most projects haven't even begun to modularize their code. Even in a well-staffed and extremely fundamental project like Google Protobuf, I'm still waiting after six months for them to make a harmless one-line JAR manifest update. I cannot begin to imagine what it would take to push full modularization through a system that slow moving. If a project you depend on hasn't modularized, you're not going to be modularizing either.
The first three points mean that I'd have to invent my own OSGi-like
system. I'd have to come up with a uniform, platform-agnostic metadata
standard that could be inserted into JAR files in order to specify
dependency information. Third party projects (at least those that
aren't written specifically for my game engine) aren't going to be
interested in adding metadata for the sake of one man's low budget
module system. I can't trust JAR files to contain Maven pom.xml
files (although many do), as there are plenty of other build systems
in use that don't include that information. Even then, you can't
just parse a Maven pom.xml
, you need to evaluate it. This would
mean pulling in Maven APIs as a dependency, and those neither work
in a JPMS modular context (because they're not modularized and never
will be) nor an OSGi context (because they use their own incompatible
module system called Plexus).
Let's assume by some miracle that I get a good percentage of the Java ecosystem to include dependency metadata in a format I can consume. I now need to come up with an API and tools to fetch modules from a repository. That's not hard, but it's still more than not having to do it at all. I'd want version ranges, so I'd also need to write code to solve the versioning problem which is NP-complete. Tricky.
So let's assume by now that I can, given the name of a module, fetch the dependencies at run-time using my shiny new API and metadata. I still need to be able to load and unload that set of modules at run-time, and have code in other modules react properly to that occurring. There aren't any standard APIs to do this in the JPMS, and so I'd have to write my own. Right now, the JPMS is not actually capable of expressing the kind of things that are possible with OSGi bundles, so any system I built would be strictly less capable than the existing OSGi system. At least part of the problem appears to come down to missing APIs in the JPMS, so at least some of this might be less of an issue in the future. Still, it's work I'd have to do.
So let's assume that I can fetch, load, and unload modules at run-time,
and code in modules can react to this happening. I still need modules
to put in the system, and not many Java projects are modularized. I
tried, for a few months, to use only projects that have either added
Automatic-Module-Name
entries to their JAR manifests, or have fully
modularized. It was fairly painful. I'd be limiting myself to a tiny
percentage of the vast Java ecosystem if I did this. This, however,
is something that OSGi also went through in the early days and is
no longer a problem. It's just a matter of time.
So why am I nervous and still conflicted?
The default effect suggests that, just because the JPMS is the default Java module system, the JPMS is the one that developers will turn to. Tool support for JPMS modules is already vastly better in Intellij IDEA than it is for OSGi. In Eclipse, the support is about equal.
The OSGi R7 JAR files will not contain Automatic-Module-Name entries which means that if you have any dependency on any part of OSGi (including parts that are not even used at run-time, like the annotations), your code cannot be fully modularized. If developers are forced to choose between supporting that old module system that nobody uses or that shiny new default module system that everyone's talking (or possibly frothing) about, which one do you think they'll pick?
I'm also aware that this kind of dynamic module loading and unloading is not something that most applications need. For years people were happy with a static (although extremely error-prone) class path mechanism, and the majority of applications are just recompiled and re-deployed when updated. The JPMS can support this kind of deployment sufficiently. Previously, if developers wanted any kind of safe module system at all, they had to turn to OSGi. They might only use it in the context of a static application that is not dynamically updated, but they'd still use it. Why would those same developers still turn to OSGi when the JPMS exists?
Finally, I pay attention to the various new developments to the Java language and JVM and, fairly often, new features are proposed that have subtle (or not-so-subtle) dependencies on the new module system. There are various optimization strategies, for example, that can be enabled if you know that instances of a given type are completely confined to a single module. Unless the VM can be magically persuaded that OSGi bundles are JPMS modules (this is unlikely to happen), then code running in OSGi is very much going to become a second-class citizen.
So, I'm nervous and conflicted. I don't want to build some sort of ad-hoc OSGi-lite system on the JPMS. I don't want to do the work, I don't want to maintain it, and I don't think the result would be very good anyway. I also, however, am unsure about continuing to base my code on a system that's going to have to work hard not to be considered irrelevant. I believe OSGi is the superior choice, but it's not the default choice, and I think that's going to matter more than it should.
I suspect I'm going to finish assisting any remaining projects that I've started helping to modularize, and not do any more. I had decided that I was going to push hard to move all of my projects to requiring Java 9 as part of the modularization effort, but unfortunately this would leave me unable to deploy code on FreeBSD as there's no JDK 9 there and likely won't be. With the new six-month release cycle (and 18-month long-term release cycle), porting efforts will likely be directed towards JDK 11. This means that I won't be able to deploy anything newer than Java 8 bytecode on FreeBSD for at least another six months.
I wrote a while back about issues with IPv6 on Linux. It turns out that most of the pain occurs for two reasons:
Linux doesn't accept router advertisements by default. If you configure your router to tell everyone on the network that it has a nonstandard MTU, Linux will ignore the advertisements.
Linux will act upon any Packet Too Large
messages it receives
and in fact will create a temporary route (visible from the
ip route
command) that has the correct reduced MTU but, for
whatever reason, most programs won't use the new route without
a restart. That is, if my mail client hangs due to a
Packet Too Large
message, it won't be able to send any mail
until I restart the program.
The first point can be addressed by adding the following to /etc/sysctl.conf
:
net.ipv6.conf.default.accept_ra=1 net.ipv6.conf.default.accept_ra_mtu=1
Usually, net.ipv6.conf.default.accept_ra_mtu
is already set to 1
, but
it's worth being explicit about it.
I also add net.ipv6.conf.default.autoconf=0
because I statically
assign addresses.
The second point can be addressed by restarting the program, as sad as that is.
https://github.com/io7m/modulechaser
I'm trying to get to the point where all of my projects are fully modularized as JPMS modules. My code already follows a very modular architecture thanks to designing it around OSGi, but it's necessary to write module descriptors to complete the picture. To write module descriptors, the projects upon which a project depends must first be modularized. This can either mean writing module descriptors for those projects, or it can simply mean assigning an Automatic-Module-Name. Writing a full module descriptor is better, because this means that the project can be used in combination with jlink to produce tiny custom JVM distributions.
My problem is that I have rather a large number of dependencies across
all of my projects, and I need to know the most efficient order in
which to approach maintainers in order to get them to modularize
their projects. If a project A
depends on project B
, then project
A
can't be modularized before project B
so it's a waste of my
time to go asking project A
's maintainers before B
is modularized.
I wrote a Maven plugin to assist with this problem. It produces reports like this.
The plugin tells me if the current version of a dependency is modularized, if the latest version of the dependency on Maven Central is modularized, and whether the dependency has been fully modularized or simply assigned a name. The table of dependencies is shown in reverse topological order: Start at the top of the table and work downwards, contacting each project maintainer as you go.
Some dependencies will, of course, never be modularized. Whomever published the javax.inject
jar, for example, didn't even bother to create a manifest in the jar file.
I'm not sure that even constitutes a valid jar file according to the
specification. Some dependencies, like javaslang
, were renamed (javaslang
became vavr
)
and so code should move to using the newer names instead. Some projects
can barely manage to publish to Maven Central (like Xerces)
and still appear to use tools and processes from the previous century,
so are unlikely to be modularizing any time soon.
In a crippling bout of sinusitis, and after reading that Chrome is going to mark http sites as insecure, I decided to put good sense aside and deploy Let's Encrypt certificates on the io7m servers.
I've complained about the complexity of this before, so I started thinking about how to reduce the number of moving parts, and the number of protection boundary violations implied by the average ACME setup.
I decided the following invariants must hold:
The web server must not have write access to its own configuration, certificates, or logs. This is generally a given in any server setup. Logging is actually achieved by piping log messages to a log program such as svlogd which is running as a different user. In my case, I can actually go much further and state that the web server must not have write access to anything in the filesystem. This means that if the web server is compromised (by a buffer overflow in the TLS library, for example), it's not possible for an attacker to write to any data or code without first having to find a way to escalate privileges.
The web server must have read access to a challenge directory. This is the directory containing files created in response to a Let's Encrypt (LE) challenge.
The acme
client must have read and write access to the certificates,
and it must have write access to a challenge directory,
but nothing else in the filesystem. Specifically, the client must
not have write access to the LE account key or the directory that
the web server is serving. This means that if the client is
compromised, it can corrupt or reissue certificates but it can't
substitute its own account key and request certificates on behalf
of someone else.
The acme
client must not have any kind of administrative access
to the web server. I don't want the acme
client helpfully restarting
the web server because it thinks it's time to do so.
There are some contradictions here: The acme
client must not be able to
write to the directory that the web server is serving, and yet the web
server must be able to serve challenge responses to the LE server. The
acme
client must not be able to restart the web server, and yet the web
server must be restarted in order to pick up new certificates when they're
issued.
I came up with the following:
An httpd-reloader
service sends a SIGHUP
signal to the web server
every 30 minutes. This causes the web server to re-read its own
configuration data and reload certificates, but does not kill any
requests that are in the process of being served and specifically
does not actually restart the web server.
The acme
client writes to a private challenge
directory, and
a private certificates
directory. It doesn't know anything about
the web server and is prevented from accessing anything other than
those directories via a combination of access controls and chroot
ing.
The web server reads from a read-only nullfs
mount of a wwwdata
directory, and the challenge
directory above
is placed in wwwdata/.well-known/acme-challenge
via another
read-only nullfs
mount. The web server also reads from a read-only
nullfs
mount of the certificates
directory above in order to
access the certificates that the acme
client creates.
The acme
client is told to update certificates hourly, but the
acme
client itself decides if it's really necessary to update
the certificates each time (based on the time remaining before
the certificates expire).
The intrusion detection system has to be told that the web server's certificates are permitted to change. The account key is never permitted to change. I don't want notifications every ~90 days telling me the certificates have been modified.
I set up all of the above and also took some time to harden the
configuration by enabling various HTTP headers such as Strict-Transport-Security
,
Content-Security-Policy
, Referrer-Policy
, etc. I'm not going to require
https
to read io7m.com
and I'm not going to automatically redirect
traffic from the http
site to the https
site. As far as I'm concerned,
it's up to the people reading the site to decide whether or not they want
https
. There are plenty of browser addons that can tell the browser to
try https
first, and I imagine Chrome is already doing this without
addons.
The Qualys SSL Labs result:
Now we can all sleep soundly in the knowledge that a third party that we
have no reason whatsoever to trust is telling us that io7m.com
is safe.
I'm going to start making all projects use a common set of Checkstyle rules rather than having each project carry its own rules around. I can't remember exactly why I avoided doing this in the beginning. I think it may've been that I wasn't confident that I could write one set of rules that would work everywhere. I've decided instead that I'll beat code with a shovel until it follows the rules, rather than beat the rules with a shovel until they follow the code.