Render Farm Build 31

DISCLAIMER: This is a continuing series detailing the painful story of a DIY render farm build.  It is terribly technics and
somewhat frustrating.  Those who are unprepared for such “entertainment” are advised to ignore these posts.

I was having even more connection problems.

failed

I would restart and Gog wold not be able to see Cyclops on the network.  In fact, eventually Gog could mount the drqueue folder on Cyclops (which it MUST do to have access to the logs and tmp folders), but it did so after about thirty minutes!  I had no idea what was going on there.  But it seemed as though there was a real problem here beyond just conf. files.

OSX 10.9 Mavericks,  I learned, does have a preference for SMB (Samba network protocol) over AFP (Apple File Protocol), so I turned on the SMB networking and asked Gog to connect to Cyclops via SMB on launch.  A careful reader will now see that I actually did turn on the SMB network on the PDC long, long ago.  But I thought AFP was the better choice at the time, mostly because I did not know I’d be using 10.9 – there did not seem to be any reason why I would be able to.

I was also still having trouble getting everything to start up automatically when the machine was turned on.  To get the environment variables set how they should be, I put a .plist file in ~/Library/LaunchAgents.  It said, essentially:

<key> Program </key>
<string> (the path to .sh file) </string>
<key>RunAtLoad </key>
<true/>

That little snippet was supposed to get launchd to run the shell script that would first set the environment variables and then start the slave.  So far so good, but why did the Dr Queue on Cyclops still not see Gog trying to join the pool?  Was it Cyclops not letting anyone in, or Gog not able to see an open door?  Network problems are like having double the number of issues constantly.

failed

Sadly, the launchd trick did not seem to work at all.  There was no action from the .plist file.  So I put the .sh script in the list of login items and made it executable with this command:

chmod +x (path to script)

That little bit of Unix is a piece I’m awfully proud of myself for remembering.  It makes any file executable.  So I then rebooted everything…  Again, something seemed to be happening to the master. When I saw the GUI (drqman) start up I ooh that as a sign that everything was OK.  Was it?  I ran the master on its own as a command-line program and found  a socket error.

failed

Socket error?  Criminy!  I had no idea what this was about.  Later I discovered that there was a socket error because I had not properly quit the master before I tried running another instance of it.  So the second instance was actually conflicting with the first, hence the socket error.  At the time I did not know this, though, and I took “socket error” as some kind of clue to follow.

Hours later and much Googling, but I still did not know much about what a socket error was or how I should rectify it.  It sounded grave.  I worried I would not be able to run the PDC off the mini, and maybe would have to run the master off one of the slaves.  It’s often done this way.  Maybe having a separate PDC was a problem when it was still on 10.5 while the slaves were on 10.9?

But later, “Eureka! ” Master sees slave!

What did I do? I still do not know!

But through some combination of tweaks and restarts, (and probably a restart after running too many instances of the master simultaneously) Gog was showing up in the Dr. Queue master window as an available machine.  The next step would be to send a job and get the slave working.

Unfortunately I did not have a job with me.  Also unfortunately, I did not think that any script, including this one:

echo "Hi there!"

could be sent.  Even worse, I did not know where to look for that output once I did send that script (hint – it’s in the log folder).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.