By Date: <-- -->
By Thread: <-- -->

Bioclusters Digest, Vol 17, Issue 4




On 7 Mar 2006, at 8:05 pm, Shane Brubaker wrote:

Hi, Shane here from the JGI, I wanted to post back and attempt to answer some of these questions about our "disappearing" array job tasks.
I don't know the answer to all these, but the question about NIS errors pops out. We have been having NIS and NFS problems quite a bit,
so I suspect that could be why.


Soon we will be moving our cluster onto a better network switch, and also have increased a cache size on our LDAP server. We've been
working to improve our NFS problems too. It seems like that may help - lately the problems seem to have gone away. I've also implemented
a "cleanup" step in our workflow system which re-submits missing tasks one at a time just in case.



[ snip ]

Is there a network issue that I've caused by running too much stuff at the
same time, broken NIS/NFS?
Yes

Are your Linux nodes running the Name Service Caching Daemon (nscd)? We found that running that on all of our cluster nodes quite drastically reduces the pounding the NIS servers receive. It's not without its problems though; because it's a cache, it means that the nodes will sometimes take a while to notice any NIS map updates.


Is it not also possible to replicate your LDAP server, so that the load from the cluster nodes is distributed over more than one server?

Regards,

Tim

_______________________________________________
Bioclusters maillist  -  Bioclusters (at) bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters