Setting up machines for satellite rendering



I saw a case recently where a customer was asking “what to install on a satellite machine,” and “what do I do for licensing?”

The simplest answer is “just install Softimage in 30-day trial mode” and then set up the satellite service. That’s it.

The satellites don’t use a license. Only the master computer needs a license, and it’s the master that controls the maximum number of satellite machines (4).

If you have a network license, you may want to set up the satellites so you can run Softimage or xsibatch. In that case, you’ll want to point the satellite to your license server. You can either do that during the install, or by editing setenv.bat afterwards.

Enabling and disabling satellites


There was a thread on xsibase the other day about using a batch file to turn satellite rendering on or off by changing the .ray3hosts file.

Instead of overwriting the .ray3hosts file, another way to do would be to have two versions of the .ray3hosts file, and use MI_RAY_HOSTSFILE to switch between them.

For example, you would create a batch file on your desktop that does this:

@echo off
call "C:\Program Files\Autodesk\Softimage 2012.SAP\Application\bin\setenv.bat"

rem Override the default MI_RAY_HOSTSFILE
set MI_RAY_HOSTSFILE=%XSI_USERHOME%\.ray3hosts-disabled

start "" "C:\Program Files\Autodesk\Softimage 2012.SAP\Application\bin\XSI.exe" %*

.ray3hosts-disabled would either not exist, or have different satellites listed. Softimage will try to load the specified file, and would not fall back on the other .ray3hosts file.

Tips for troubleshooting satellite rendering


So, you’ve set up your .ray3hosts file, got the raysat services running, and ping and telnet show that there’s no connection problems. But satellite still doesn’t seem to be working. What to do?

  • Check that the raysat service running on the master too. (it has to be running on both the master and the slaves)
  • Use the mental ray diagnostics to check whether the satellites are being used. In the mental ray renderer settings, click the Diagnostics tab, and select the Info and Progress checkboxes.
    If satellite is working, you should see something like this:

    // INFO : JOB  0.n  progr:    89.8%    rendered on MTL-SATELLITE:7020.6
    // INFO : JOB  0.13 progr:    90.9%    rendered on MTL-MASTER.13
    // INFO : JOB  0.6  progr:    91.9%    rendered on MTL-MASTER.6
    // INFO : JOB  0.n  progr:    92.9%    rendered on MTL-SATELLITE:7020.7
    // INFO : JOB  0.12 progr:    93.9%    rendered on MTL-MASTER.12
    // INFO : JOB  0.n  progr:    94.9%    rendered on MTL-SATELLITE:7020.2
    // INFO : JOB  0.n  progr:    95.9%    rendered on MTL-SATELLITE:7020.4
    // INFO : JOB  0.10 progr:    96.9%    rendered on MTL-MASTER.10
    // INFO : JOB  0.n  progr:    97.9%    rendered on MTL-SATELLITE:7020.0
    // INFO : JOB  0.n  progr:    98.9%    rendered on MTL-SATELLITE:7020.5
    
  • Use Process Monitor on the satellite machine to confirm that the master is actually connecting to the satellite.
  • On the master machine, in Softimage, open the script editor (ALT+4) and run this JScript:

    LogMessage( XSIUtils.Environment.Item("MI_RAY_HOSTSFILE") );
    

    This will log the .ray3hosts file that Softimage is using.
    Is this the same .ray3hosts file that you created?

    On Windows XP, UserTools creates the .ray3hosts file in the “wrong” location
    (UserTools puts it one place, but Softimage reads it from another place).

Troubleshooting satellite rendering with Process Monitor


In this video, I take a look at what to look for in a Process Monitor log from a satellite rendering computer. If you don’t see what I’m showing, then the master is not connecting to the satellite.

To use Process Monitor to confirm whether the master connects to the satellite:

  1. On the satellite machine, download Process Monitor.
  2. Extract Process Monitor from the downloaded file, and start procmon.exe.
  3. On the master machine, start Softimage. The master connects to the satellite at startup.
  4. After Softimage starts up on the master machine, go back to the satellite machine and stop capturing events in Process Monitor.
  5. Review the Process Monitor log (see the video for more info).

Another Error 1092 when trying to start raysat service


This one is pretty annoying. I remember once spending a lot of time on one case before I figured it out.

If you add an entry like to the end of the C:\windows\system32\drivers\etc\services file:

mi-raysatsi2012_3_9_1_44	7024/tcp

where there is no line break after the “tcp”, then the raysat service won’t start and will report “Error 1092″.

But if you add a line break after the “tcp”, or add a # comment like this, then you can start the raysat service:

mi-raysatsi2012_3_9_1_44    7024/tcp        #

Aaaargh.

Satellite rendering licensing


A recent drop of mental ray included a change to satellite licensing scheme.
Satellite licensing will be per machine, not per CPU. You’ll get four machines (instead of the current four CPUs).

Softimage 2011.5 (Subscription Advantage Pack) doesn’t have this particular drop, but you will see this change in subsequent releases.

ERROR : FATAL: DB 1.0 fatal 041500: interrupted by exception code 0xc0000005 (access violation)


‘ ERROR : FATAL: DB 1.0 fatal 041500: interrupted by exception code 0xc0000005 (access violation)

The exception code 0xc0000005 (access violation) indicates this is probably a memory access error. The software is trying to access a region of memory that it shouldn’t be accessing. In general, this could be a problem in the code (for example, a NULL pointer), faulty RAM, or even a bad device driver.

If you get this error with any scene (for example, with one of the sample scenes that ships with Softimage), then that may indicate the problem is specific to your computers. Or if you get the error only with a certain computer, that would indicate a possible problem with that one computer.

If you get this with just some scenes, then that points to a problem in the software (or perhaps the scene). It could be that something about the scene triggers certain conditions in the software, and the software then causes the error. In general, I would try breaking down the scene to try and isolate the root cause. It could be the overall complexity of the scene, or a specific element of the scene.

In one case I had recently, the user got this 0xc0000005 (access violation) error and then an endless series of bad message…0xbad0bad errors. We traced the access violation error back the number of different animated objects that were being instanced through ICE using assemblies.

Assemblies are used when an ICE trees use one of these compounds:

  • Set Instance Geometry
  • Set Particle Instance Animation Time
  • Control Instance Animation
  • Control Displacement Instance Animation

Using assemblies can be memory-intensive, so it appears that when you’ve got a lot of particles and a lot of animated instances of many objects, satellite rendering may fail with a memory access error (actually, the frame was rendered, but XSI was hung up after because of the errors).

Ref: Access Violation? How dare you …

ERROR : MSG 0.n error 011326: bad message received from host 1, 0xbad0bad


In a recent case, a customer getting a endless repetition of this error message when he used satellite rendering:

// ERROR : MSG  0.n  error  011326: bad message received from host 1, 0xbad0bad
// ERROR : MSG  0.n  error  011326: bad message received from host 1, 0xbad0bad
// ERROR : MSG  0.n  error  011326: bad message received from host 1, 0xbad0bad
...

By itself, this message doesn’t tell us much more than that something bad happened and now the master and the slave aren’t communicating.

  • MSG means this message is from the module that handles low-level message passing and thread management.
  • 0.n identifies the machine where the error occured. Machine 0 is the client machine where the render was started. The dot (.) separates the machine (host) number from the thread number.
  • Thread n is a special network communication thread that keeps contact with the satellilte machines if network parallelism is used.

Typically, the real error message is output just before all these bad message errors start. To catch this first error, we redirected the xsibatch output to a log file:

xsibatch.bat -render \\server\project\Scenes\test.scn" -verbose on > xsibatch.log

The initial error turned out to be a memory access error. More on that later.