Send Bioclusters mailing list submissions to
bioclusters (at) bioinformatics.org
To subscribe or unsubscribe via the World Wide Web, visit
https://bioinformatics.org/mailman/listinfo/bioclusters
or, via email, send a message with subject or body 'help' to
bioclusters-request (at) bioinformatics.org
You can reach the person managing the list at
bioclusters-owner (at) bioinformatics.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioclusters digest..."
Today's Topics:
1. Announcement: Sun Discovery Cluster for the Life Sciences
(Stefan Unger)
2. RE: Announcement: Sun Discovery Cluster for the LifeSciences
(Kathleen)
3. SGE Array Job tasks mysteriously disappear (Shane Brubaker)
4. RE: quick look see at fractal computing. (James Cuff)
5. Re: SGE Array Job tasks mysteriously disappear (James Cuff)
6. Re: SGE Array Job tasks mysteriously disappear (Chris Dagdigian)
----------------------------------------------------------------------
Message: 1
Date: Thu, 02 Mar 2006 13:59:59 -0800
From: Stefan Unger <Stefan.Unger (at) Sun.COM>
Subject: [Bioclusters] Announcement: Sun Discovery Cluster for the
Life Sciences
To: bioclusters (at) bioinformatics.org
Message-ID: <44076ADF.3070102 (at) sun.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
I'm not sure if this is ok, or not. Please let me know:
************
Sun Microsystems^TM Announces the Discovery Cluster for the Life Sciences
Exceptional Price/Performance in a Pre-Assembled Rack
Sun Microsystems announces the "Discovery Cluster for the Life
Sciences". The Discovery Cluster is a pre-assembled, base-level
configuration of a Sun Grid Rack System (SGRS) with components selected
especially for the Life Science HPC market.
The Discovery Cluster is Sun's solution approach to the compute needs
for the drug discovery process. It is based on the Sun Fire^TM X2100
64-bit x64 server, powered by the AMD Opteron^TM dual core processor.
The X2100 delivers up to one-and-a-half times the performance, and uses
about one-third of the power of competing systems, yet costs a fraction
of their price. Bioinformatics and molecular modeling benchmarks confirm
the exceptional price/performance advantages of the Sun Fire X2100 over
Intel Xeon based clusters. These highly reliable and energy efficient
X2100 servers are also the fastest enterprise x64 servers in their class.
At under $94,000 (US list price) per fully populated, pre-assembled
rack, the Discovery Cluster provides 1 TeraFlop of theoretical peak
performances in three racks for under $282,000. In addition, the power,
cooling and management requirements are substantially less than Intel
Xeon based clusters.
The Discovery Cluster comes pre-assembled, with hardware, cabling,
Solaris^TM 10 and Sun Grid Engine. Multiple operating systems (Solaris
10 x64, Linux (Red Hat, Suse), and Windows) are supported. Many
alternative configurations are available, and Sun's solution partners
provide a range of software options.
For more information, listen to a NetTalk webinar on the Sun Discovery
Cluster for Life Sciences, featuring the designer of the Sun Fire
"Galaxy" series servers, Andy Bechtolsheim, Sun Chief Architect and
Senior Vice President, Network Systems. For more information visit
www.sun.com/nettalk,
<http://www.sun.com/nettalk>www.sun.com/discoverycluster
<http://www.sun.com/discoverycluster>, or email
discoverycluster (at) sun.com. <http://www.sun.com/nettalk>
Media contacts:
Stefan Unger, PhD
stefan.unger (at) sun.com <mailto:stefan.unger (at) sun.com>
Business Development Manager
Life Sciences
Ulrich Meier, PhD
ulrich.meier (at) sun.com <mailto:ulrich.meier (at) sun.com>
Industry Marketing Manager
Life Sciences
Sun, Sun Microsystems, the Sun logo, Sun Fire, Solaris are trademarks or
registered trademarks of Sun Microsystems, Inc. in the United States and
other countries. AMD and Opteron are trademarks or registered trademarks
of Advanced Micro Devices.
--
*!*
Stefan Unger, PhD
Business Development Manager Life Sciences
949-682-4388 (x41821) AccessLine
http://www.sun.com/edu/commofinterest/compbio
http://www.sun.com/lifesciences
http://www.sun.com/discoverycluster
CB-SIG: to JOIN/DROP/POST email compbio-sig-info (at) sun.com
* BioIT World, Boston, April 3-5, 2006
* CB-SIG and HPC Consortium, GridAsia, May 14-15, 2006
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy
all copies of the original message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*!*
------------------------------
Message: 2
Date: Thu, 2 Mar 2006 15:08:54 -0700
From: "Kathleen" <kathleen (at) massivelyparallel.com>
Subject: RE: [Bioclusters] Announcement: Sun Discovery Cluster for the
LifeSciences
To: "'Clustering, compute farming & distributed computing in life
science informatics'" <bioclusters (at) bioinformatics.org>
Message-ID: <005c01c63e45$e682b8b0$0300a8c0 (at) KMElaptop>
Content-Type: text/plain; charset="us-ascii"
Does it come pre-loaded with applications? If so, which ones? -K
From: Stefan Unger [mailto:Stefan.Unger (at) Sun.COM]
Sent: Thursday, March 02, 2006 3:00 PM
To: bioclusters (at) bioinformatics.org
Subject: [Bioclusters] Announcement: Sun Discovery Cluster for the
LifeSciences
I'm not sure if this is ok, or not. Please let me know:
************
Sun Microsystems^TM Announces the Discovery Cluster for the Life Sciences
Exceptional Price/Performance in a Pre-Assembled Rack
Sun Microsystems announces the "Discovery Cluster for the Life Sciences".
The Discovery Cluster is a pre-assembled, base-level configuration of a Sun
Grid Rack System (SGRS) with components selected especially for the Life
Science HPC market.
The Discovery Cluster is Sun's solution approach to the compute needs for
the drug discovery process. It is based on the Sun Fire^TM X2100 64-bit x64
server, powered by the AMD Opteron^TM dual core processor.
The X2100 delivers up to one-and-a-half times the performance, and uses
about one-third of the power of competing systems, yet costs a fraction of
their price. Bioinformatics and molecular modeling benchmarks confirm the
exceptional price/performance advantages of the Sun Fire X2100 over Intel
Xeon based clusters. These highly reliable and energy efficient X2100
servers are also the fastest enterprise x64 servers in their class.
At under $94,000 (US list price) per fully populated, pre-assembled rack,
the Discovery Cluster provides 1 TeraFlop of theoretical peak performances
in three racks for under $282,000. In addition, the power, cooling and
management requirements are substantially less than Intel Xeon based
clusters.
The Discovery Cluster comes pre-assembled, with hardware, cabling,
Solaris^TM 10 and Sun Grid Engine. Multiple operating systems (Solaris 10
x64, Linux (Red Hat, Suse), and Windows) are supported. Many alternative
configurations are available, and Sun's solution partners provide a range of
software options.
For more information, listen to a NetTalk webinar on the Sun Discovery
Cluster for Life Sciences, featuring the designer of the Sun Fire "Galaxy"
series servers, Andy Bechtolsheim, Sun Chief Architect and Senior Vice
President, Network Systems. For more information visit www.sun.com/nettalk,
<http://www.sun.com/nettalk>www.sun.com/discoverycluster
<http://www.sun.com/discoverycluster>, or email discoverycluster (at) sun.com.
<http://www.sun.com/nettalk>
Media contacts:
Stefan Unger, PhD
stefan.unger (at) sun.com <mailto:stefan.unger (at) sun.com>
Business Development Manager
Life Sciences
Ulrich Meier, PhD
ulrich.meier (at) sun.com <mailto:ulrich.meier (at) sun.com>
Industry Marketing Manager
Life Sciences
Sun, Sun Microsystems, the Sun logo, Sun Fire, Solaris are trademarks or
registered trademarks of Sun Microsystems, Inc. in the United States and
other countries. AMD and Opteron are trademarks or registered trademarks of
Advanced Micro Devices.
--
*!*
Stefan Unger, PhD
Business Development Manager Life Sciences
949-682-4388 (x41821) AccessLine
http://www.sun.com/edu/commofinterest/compbio
http://www.sun.com/lifesciences
http://www.sun.com/discoverycluster
CB-SIG: to JOIN/DROP/POST email compbio-sig-info (at) sun.com
* BioIT World, Boston, April 3-5, 2006
* CB-SIG and HPC Consortium, GridAsia, May 14-15, 2006
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information. Any
unauthorized review, use, disclosure or distribution is prohibited. If you
are not the intended recipient, please contact the sender by reply email and
destroy all copies of the original message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*!*
_______________________________________________
Bioclusters maillist - Bioclusters (at) bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters
------------------------------
Message: 3
Date: Thu, 02 Mar 2006 14:12:49 -0800
From: Shane Brubaker <brubaker2 (at) llnl.gov>
Subject: [Bioclusters] SGE Array Job tasks mysteriously disappear
To: bioclusters (at) bioinformatics.org
Message-ID: <6.0.0.22.2.20060302141100.037184e0 (at) mail.llnl.gov>
Content-Type: text/plain; charset="us-ascii"; format=flowed
Hi, Shane from the JGI here.
We are finding some strange behavior in which a few tasks of an array job
never seem to complete.
The tasks do not go into an Error state, and they are listed as finished
with an exit status of 0, and they
have a valid start and end time for the task.
However, in the output log, the output clearly stops in between two print
statements near the top of the script.
Has anyone seen this? Any ideas?
Thanks,
Shane
------------------------------
Message: 4
Date: Thu, 2 Mar 2006 18:17:50 -0500 (EST)
From: James Cuff <jcuff (at) broad.mit.edu>
Subject: RE: [Bioclusters] quick look see at fractal computing.
To: Nick Robertson <nick (at) massivelyparallel.com>
Cc: "'Clustering, compute farming & distributed computing in life
science
informatics'" <bioclusters (at) bioinformatics.org>, 'Kevin Howard'
<kevin (at) massivelyparallel.com>
Message-ID:
<Pine.OSF.4.64.0603021718060.91263 (at) phosphorus.broad.mit.edu>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
On Thu, 2 Mar 2006, Nick Robertson wrote:
> It is still unclear to me why your results are markedly different
> from NCBI and MPT, but it's probably related to search parameters or some
> other difference.
Ahem, that could be my bad, I guess I should have explained, I thought it
was clear from the example command line I supplied.
-nT is the answer you are looking for here.
I used it quickly here to show the missing sub optimals. My reasoning
being that if MegaBlast with its large word size and greedy algorithm
approach could find the suboptimals, the standard version ought to nail
it.
I tend to use it automatically for near exact DNA/DNA searching, which is
what this example test was set to do. So that clears up changes in the
ordering.
However, you are _still_ not reporting the sub optimal alignments in your
report.
This is clear alone from just the sizes of the two files you provided me
with via your website. I guess it's just a printing error, you must be
calculating them. Probably a simple tweak for you to fix.
node221 /2ndrun/ du -sh ncbi_results.txt
3.4M ncbi_results.txt
node221 /2ndrun/ du -sh qid1597_results_1.txt
516K qid1597_results_1.txt
The example gi|27657458|emb|AL844150.6| on that web link I sent before
shows this.
MegaBlast (jcuff_results_1.txt) finds two such sub alignments, and regular
blast (jcuff2.blastn,ncbi_results.txt ) finds a whopping 16.
However qid1597_results_1.txt only shows the first alignment from bases
682 to 1330, with _no_ sub optimals being reported.
Thanks for the update. We probably ought to kill this thread and take it
off line if you want to discuss it further. I doubt it is very
interesting for folk.
Best,
J.
------------------------------
Message: 5
Date: Thu, 2 Mar 2006 18:34:40 -0500 (EST)
From: James Cuff <jcuff (at) broad.mit.edu>
Subject: Re: [Bioclusters] SGE Array Job tasks mysteriously disappear
To: "Clustering, compute farming & distributed computing in life
science informatics" <bioclusters (at) bioinformatics.org>
Message-ID:
<Pine.OSF.4.64.0603021824410.91263 (at) phosphorus.broad.mit.edu>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Hi Shane,
So you might want to give us a bit more information.
As to seeing weird stuff on clusters, yeah we see a lot of it, *way* too
much of it sometimes :)
Here come a bunch of questions I would ask myself if it happened to me:
Did I isolate it down to just an issue with the job array?
Does this only happen with this program or all programs I execute?
What is the code doing?
Are there "core" files in my output directory?
Are the binaries on an NFS server? If so is it having issues? Check the
logs for NFS timeouts.
Is a directory filling up /tmp /scratch what ever?
What do the syslogs on the remote machine say?
Is there a network issue that I've caused by running too much stuff at the
same time, broken NIS/NFS?
Is the OOM killer running on the remote node, have I filled up all the
memory?
Is it only happening on one node, some nodes or a subset?
Am I writing to a database and not catching an error?
Does it happen with a really simple example?
Does it only happen on a Tuesday evening (system maint for example)
etc. etc. It is a pain to debug things like this on a cluster, I feel
your pain.
Maybe have another look at what is going wrong and post back with some
more information. There are lots of people who can probably help, at the
moment there is not really enough for us to go on, as you see it could be
lots of things.
Best,
J.
On Thu, 2 Mar 2006, Shane Brubaker wrote:
> Hi, Shane from the JGI here.
>
> We are finding some strange behavior in which a few tasks of an array job
> never seem to complete.
>
> The tasks do not go into an Error state, and they are listed as finished
> with an exit status of 0, and they have a valid start and end time for
> the task.
>
> However, in the output log, the output clearly stops in between two print
> statements near the top of the script.
>
>
> Has anyone seen this? Any ideas?
>
>
> Thanks,
> Shane
>
> _______________________________________________
> Bioclusters maillist - Bioclusters (at) bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>
------------------------------
Message: 6
Date: Thu, 2 Mar 2006 19:03:00 -0500
From: Chris Dagdigian <dag (at) sonsorol.org>
Subject: Re: [Bioclusters] SGE Array Job tasks mysteriously disappear
To: "Clustering, compute farming & distributed computing in life
science informatics" <bioclusters (at) bioinformatics.org>
Message-ID: <15B721DA-E4FA-44C7-BEB1-F99919DD39A1 (at) sonsorol.org>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Debugging odd failures on clusters can really be hard.
For SGE clusters the best place of debug/failure info is always going
to be in the STDOUT/STDERR files produced by the jobs themselves.
Nine times out of ten this is where you'll find the most useful info.
Since it seems that you are not getting anything useful from those
files, the next place to look is the sge_execd logs from the machines
where the array tasks ran. The execd spool files will either be local
to the compute node or under your $SGE_ROOT/<cell>/spool/
<machineName>" directory if you are running everything off of a
shared filesystem.
After the execd spool logs, the qmaster and schedd messages files may
also be of use although they rarely give good info on job level issues.
A third place to look is "/tmp" on the compute nodes -- when all else
fails and grid engine is in a panic situation and unable to spool
normally it will log to /tmp/ on the host.
Something you should also try:
- Alter the value for "loglevel" in your grid engine configuration
-- you may want to temporarily set "loglevel=log_info"
This was discussed in a recent SGE users mailing list The thread is
here:
http://gridengine.sunsource.net/servlets/BrowseList?
list=users&by=thread&from=8137
The sge_conf man page has this to say about loglevel:
> loglevel
> This parameter specifies the level of detail that Grid
> Engine compo-
> nents such as sge_qmaster(8) or sge_execd(8) use to
> produce informa-
> tive, warning or error messages which are logged to the
> messages files
> in the master and execution daemon spool directories (see
> the descrip-
> tion of the execd_spool_dir parameter above). The
> following message
> levels are available:
>
> log_err
> All error events being recognized are logged.
>
> log_warning
> All error events being recognized and all
> detected signs of
> potentially erroneous behavior are logged.
>
> log_info
> All error events being recognized, all detected signs
> of poten-
> tially erroneous behavior and a variety of
> informative messages
> are logged.
The final troubleshooting step is to look into the Grid Engine
"KEEP_ACTIVE" execd parameter setting -- this will temporarily
disable deletion of the active_jobs/ directories that Grid Engine
uses to stage info while the job is active. Normally these
directories are deleted when the job drains from the system. Quite a
bit of useful environment, pid, trace and other information can be
found in these directories. This is one you'll have to watch out for
though -- disabling the cleanup function could consume disk space
rapidly.
Regards,
Chris
On Mar 2, 2006, at 5:12 PM, Shane Brubaker wrote:
> Hi, Shane from the JGI here.
>
> We are finding some strange behavior in which a few tasks of an
> array job never seem to complete.
>
> The tasks do not go into an Error state, and they are listed as
> finished with an exit status of 0, and they
> have a valid start and end time for the task.
>
> However, in the output log, the output clearly stops in between two
> print statements near the top of the script.
>
>
> Has anyone seen this? Any ideas?
------------------------------
_______________________________________________
Bioclusters maillist - Bioclusters (at) bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters
End of Bioclusters Digest, Vol 17, Issue 4
******************************************