Render Farm Build 47

DISCLAIMER: This is a continuing series detailing the painful story of a DIY render farm build.  It is terribly technical and
somewhat frustrating.  Those who are unprepared for such “entertainment” are advised to ignore these posts.

It’s been a while.  I have an extensive and somewhat super confusing build log for DQOR that is ultimately frustrating.  I seem to have spent tim ever the last four or five months not getting it to work. I have even emailed the creator of Dr. Queue hoping to get some kind of answer, but that’s not happened yet.

One good thing is that I found a cheap Wifi card for JHVH-1 ($15 from China).  It installed like a dream, and works perfectly.  I set up my school credentials and off I go!  Now making changes and updating things will be a lot easier.

Here’s what else happened.  On my Mac Pro cylinder at home I compiled Dr. Queue in five seconds flat with absolutely no issues.  I copied the “drqman” binary from home to school and set it up on the machine. Works perfectly.

Then, sadly, since I am connected to the network, OSX asked to update and – not thinking – I clicked yes.  If you remember, these old 2006 era towers are only functioning because they use a hacked version of the OS.  Thus with one keystroke I destroyed the PDC and have to start from scratch!  If only I had imaged the startup drive…

Once I redid the master I made a disk image in case I screw it up again.  And its at this point that I began my difficult courting of DQOR.  Because when you think about it, I really do want to be able to send jobs from a remote location.  Also, the farm has so much more use at school if students can, on their own, submit to the farm instead of having me do it.  Suffice to say the DQOR experiments – and I started from scratch multiple times – went like this:

  1. Install the old versions of both Ruby and Rails, since the osoftware was designed for Ruby 1.8,.7 and Rails 2.3.5 and the software has come a long way since then.
  2. Install dependencies.  Most went OK except mongrel, which is optional, and a previous version worked anyway.
  3. The the DrQueueRubyBindings would stall, every time, and would not build no matter what.
  4. Doing so manually involved running every process in the Ruby ext.conf script one line at a time through the terminal.  This worked more or less, but every line had adjustments and dependencies.  Chiefly…
  5. Swig, which finally generated a makefile, after really hammering away at it.
  6. But the final Makeful failed in the compiler sixteen hundred thousand ways.  I ran into errors called SEGMENTATION FAULTs which I recognized as being C++ issues and not anything I’m going to solve considering I know next to nothing about this stuff.
  7. Without the ruby bindings, DQOR will not work, no way no how.  I’m dead.

And now AEFX 2014 is crapping out, both on the master and on slaves.  Does everything need to be reimaged?  Everything seems broken…

In the meantime I try a Lux Render job script and find that drqman creates a job script, all right… with nothing in it!  The job script sets a few variables at the beginning and then does nothing.  So now everything is broken.  Terribly broken!

But you know what?  A power cycle fixed everything.

One more positive note.  I actually did rebuild everything on the 10.9 machine, and now it all seems much more stable.  The crazy errant “child arrived” errors have disappeared entirely.  The PDC and one node have been running for weeks without a single problem, and I’ve run several test jobs through it.  I think we’re good to go.  No remote, but what do you want for nothing?

Leave a Reply