Bioclusters Digest, Vol 17, Issue 4
- From: Tim Cutts <tjrc (at) sanger.ac.uk>
- Date: Wed, 8 Mar 2006 09:17:37 +0000
On 7 Mar 2006, at 8:05 pm, Shane Brubaker wrote:
Hi, Shane here from the JGI, I wanted to post back and attempt to
answer some of these questions about our "disappearing" array job
tasks.
I don't know the answer to all these, but the question about NIS
errors pops out. We have been having NIS and NFS problems quite a
bit,
so I suspect that could be why.
Soon we will be moving our cluster onto a better network switch,
and also have increased a cache size on our LDAP server. We've been
working to improve our NFS problems too. It seems like that may
help - lately the problems seem to have gone away. I've also
implemented
a "cleanup" step in our workflow system which re-submits missing
tasks one at a time just in case.
[ snip ]
Is there a network issue that I've caused by running too much stuff
at the
same time, broken NIS/NFS?
Yes
Are your Linux nodes running the Name Service Caching Daemon (nscd)?
We found that running that on all of our cluster nodes quite
drastically reduces the pounding the NIS servers receive. It's not
without its problems though; because it's a cache, it means that the
nodes will sometimes take a while to notice any NIS map updates.
Is it not also possible to replicate your LDAP server, so that the
load from the cluster nodes is distributed over more than one server?
Regards,
Tim
_______________________________________________
Bioclusters maillist - Bioclusters (at) bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters