PVM errors

From time to time we run into problems on our cluster with parallel jobs running under PVM. The main scenario is when running GOLD and programs from Openeyes, such as Omega2 and Rocs.

Jobs simply die with PVM errors such
/tmp/pvmtmp0113 permission denied. The causes are beyond us and our IT department to resolve. But we do have a very painful workaround. Basically you have to ssh to every affected node
on the cluster and delete everything starting with pvm that you own. Of course if you have succesfully started a job on that node don’t delete them, just on nodes which fail.

We are also in the painful situation at the moment where we have commissioned a new cluster and no PVM jobs run at all. They do give errors explicitly stating that PVM processes could not be spawned but all attempts to modify the way PVM starts jobs has not worked.

Advertisements

About martin

almost on holidays
This entry was posted in Chem_Comp and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s