I do a bit of 3D rendering now and then using Blender. Blender has multiple rendering engines: Blender Internal and Cycles. Blender Internal is a 1990s style CPU-based raytracer, and Cycles is a physically-based renderer that can be executed on either CPUs or GPUs. Cycles is technologically superior to Blender Internal in every way except for one: The performance when running on CPUs is essentially hopeless. In order to get reasonable render times, you really need a powerful GPU with an appropriate level of driver support. I refuse to use proprietary hardware drivers for both practical reasons and reasons of principle. This has meant that, to date, I haven't been able to use Cycles with GPU acceleration and have therefore stuck to using Blender Internal. As I don't aim for photorealism in anything I render (I prefer a sort of obviously-rendered pseudo-impressionist style), Blender Internal has been mostly sufficient.
However, I've recently moved to rendering 1080p video at 60fps. This means that even if it only takes ~10 seconds to render a single frame, that's still about 35 hours of rendering time to produce a 3.5 minute video.
I have a few machines around the place here that spend a fair amount
of time mostly idle. I decided to look for ways to distribute rendering
tasks across machines to use up the idle CPU time. The first thing I
looked at was netrender.
Unfortunately, netrender
is both broken and abandonware. I spent
a good few hours spinning up VM instances and trying to get it to
work, but it didn't happen.
I've been using Jenkins for a while now to build and test code from hundreds of repositories that I maintain. I decided to see if it could be useful here... After a day or so of experimentation, it turns out that it makes an acceptable driver for a render farm!
I created a set of nodes used for rendering tasks. A node in
Jenkins speak is an agent program running on a computer that accepts
commands such as "check out this code now", "build this code now",
etc. I won't bother to go into detail on this, as setting up nodes
is something anyone with any Jenkins experience already knows how
to do. Nodes can be assigned labels so that tasks can be assigned
to specific machines. For example, you could add a linux
label to
machines running Linux, and then if you had software that would only
build on Linux, you could set that job to only run on nodes that
are labelled with linux
. Basically, I created one node for each
idle machine here and labelled them all with a blender
label to
distinguish them from the nodes I use to build code.
I then placed my Blender project and all of the required assets into a Git repository.
I created a new job in Jenkins that checks out the git repository above and runs the following pipeline definition included in the repository:
#!groovy // Required: https://plugins.jenkins.io/pipeline-utility-steps node { def nodes = nodesByLabel label:"blender" def nodesSorted = nodes.sort().toList() def nodeTasks = [:] def nodeCount = nodesSorted.size() for (int i = 0; i < nodeCount; ++i) { def nodeName = nodesSorted[i] def thisNodeIndex = i nodeTasks[nodeName] = { node(nodeName) { stage(nodeName) { checkout scm sh "./render.sh ${thisNodeIndex} ${nodeCount}" } } } } parallel nodeTasks }
This uses the pipeline-utility-steps
plugin to fetch a list of online nodes with a particular label from Jenkins. I make
the simplifying assumption that all online nodes with the blender
label will be
participating in render tasks. I assign each node a number, and I create a new task
that will run on each node that runs a render.sh
shell script from the repository.
The tasks are executed in parallel and the job is completed once all subtasks have
run to completion.
The render.sh
shell script is responsible for executing Blender.
Blender has a full command-line interface for rendering images without opening
a user interface. The main command line parameters we're interested in are:
blender \ --background \ coldplanet_lofi.blend \ --scene Scene \ --render-output "${OUTPUT}/########.png" \ --render-format PNG \ --frame-start "${NODE_INDEX}" \ --frame-end "${FRAME_COUNT}" \ --frame-jump "${NODE_COUNT}" \ --render-anim
The --frame-start
parameter indicates at which frame the node should start
rendering. The --frame-end
parameter indicates the last frame to render. The
--frame-jump
parameter indicates the number of frames to step forward each
time. We pass in the node index (starting at 0
) as the starting frame,
and the number of nodes that are participating in rendering as the frame
jump. Let's say there are 4
nodes rendering: Node 0
will start at
frame 0
, and will then render frame 4
, then frame 8
, and so on. Node
1
will start at frame 1
, then render frame 5
, and so on. This means
that the work will be divided up equally between the nodes. There are
no data dependencies between frames, so the work parallelizes easily.
When Jenkins is instructed to run the job, all of the machines pull the git repository and start rendering. At the end of the task, they upload all of their rendered frames to a server here in order to be stitched together into a video using ffmpeg. Initial results seem promising. I'm rendering with three nodes, and rendering times are roughly 30% of what they were with just the one node. I can't really ask for more than that.
There are some weaknesses to this approach:
Rendering can't be easily resumed if the job is stopped. This
could possibly be mitigated by building more intelligence into
the render.sh
file, but maybe not.
Work is divided up equally, but nodes are not equal. I initially tried adding a node to the pool of nodes that was much slower than the others. It was assigned the same amount of work as the other nodes, but took far longer to finish. As a result, the work actually took longer than it would have if that node hadn't been involved at all. In fact, in the initial tests, it took longer than rendering on a single node!
There's not really a pleasant way to estimate how much longer rendering is going to take. It's pretty hard to read the console output from each machine in the Jenkins UI.
Finally: All of this work is probably going to be irrelevant very soon. Blender 2.8 has a new realtime rendering engine - EEVEE - which should presumably mean that rendering 3.5 minutes of video will take me 3.5 minutes.