BMERC benchmarks

This page describes the relative performance of many of the Unix machines currently installed at BMERC. The following table summarizes the results of four benchmarks on each machine, with gaps where software was unavailable (or my patience ran out). (It looks best in Netscape on most displays if the window is at least 700 or 725 pixels wide.) The machines are grouped by OS, and sorted in rough order from slowest to fastest. The benchmarks are described in more detail below the table.

Host name Host description Benchmark results
OS name Release CPU new-stat stat-test make-core 1bmt-cva
darcy ULTRIX 4.4 RISC 0.37 (77%) 0.74 (85%)    
darwin SunOS 4.1.3_U1 sun4c 0.70 (81%) 0.60 (90%)    
hodgkin SunOS 4.1.4 sun4m 1.1 (62%) 1.2 (86%)    
hinshelwood SunOS 4.1.4 sun4m 1.1 (62%) 1.1 (81%)    
dobzhansky SunOS 4.1.4 sun4m 0.90 (50%) 1.2 (85%) 0.99 (96%) 1.1 (99%)
gamow SunOS 4.1.4 sun4m 1.0 (55%) 1.0 (74%) 1.0 (97%) 1.0 (98%)
pauling SunOS 4.1.4 sun4m 1.1 (60%) 1.2 (86%) 1.0 (97%)  
monod SunOS 4.1.4 sun4m 1.2 (64%) 1.2 (85%)   1.1 (99%)
cyrus Solaris 5.6 sun4m 1.6 (70%) 1.9 (79%) 6.4 (90%) 2.0 (99%)
sewall Solaris 5.6 sun4m 1.6 (72%) 2.0 (78%) 6.0 (84%) 2.1 (99%)
mcclintock Solaris 5.6 sun4m 2.3 (63%) 2.9 (72%) 11 (91%) 2.8 (99%)
feynman Solaris 5.6 sun4u 5.8 (33%) 6.2 (41%)   11 (98%)
mbcrrc Solaris 5.7 sun4u 5.8 (35%) 6.6 (46%) 28 (68%) 9.7 (99%)
delbrueck Solaris 5.6 sun4u 9.1 (51%) 12 (72%) 35 (83%) 11 (98%)
jrsadler OSF1 V4.0 alpha 0.25 (91%) 0.85 (85%) 6.4 (90%) 1.9 (99%)
dayhoff OSF1 V4.0 alpha 1.6 (58%) 2.5 (74%) 3.5 (90%) 3.7 (99%)
huxley OSF1 V4.0 alpha 4.6 (34%) 6.9 (46%) 29 (64%) 16 (98%)
miescher OSF1 V4.0 alpha 3.4 (15%) 8.9 (30%)   27 (98%)
vavilov OSF1 V4.0 alpha 3.4 (14%) 8.9 (29%)   27 (99%)
amdk6 Linux 2.2.5-15 i586 6.4 (77.0%)     6.2 (97%)
sigler Linux 2.2.16-22 i686 4.1 (12.5%) 8.9 (14.9%) 15 (24.6%) 35 (97.8%)

The next-to-last host in the list, amdk6, is my vintage 1999 home computer, a 300MHz Linux PC with an AMD K6 Pentium-clone processor and 28MB of RAM. The last host, sigler, is a much newer (March 2001) PC with a 1200MHz AMD processor and 512MB of RAM.

For reference, here are miescher and vavilov before they were upgraded (i.e. the performance of the old boxes):

Host name Host description Benchmark results
OS name Release CPU new-stat stat-test make-core 1bmt-cva
vavilov OSF1 V4.0 alpha 3.2 (27%) 6.6 (44%) 28 (65%)  
miescher OSF1 V4.0 alpha 4.6 (37%) 6.3 (42%) 30 (68%) 15 (98%)

Results of testing the new vavilov and miescher are shown in the main table.

General comments

The "Host description" columns were produced by the uname program on each machine:

Timings were produced on an unloaded machine (to the extent possible), as they are intended to give an idea of the best realistic performance for that machine on that class of problem. The shorter tasks (specifically, new-stat and stat-test) were repeated twice, and the faster time (almost always the second one) was used.

The values given for each benchmark are of the form

    2.3 (63%)
The first number gives the relative speed of the machine in terms of elapsed time. The second number is the percentage of the CPU that was used during that time. These are from the third and fourth numbers given by the time command of csh, e.g.
    12.0u 6.0s 0:28 63% 0+0k 0+0io 0pf+0w
The first two numbers are user and system CPU time, respectively; these are not reported in the table since elapsed time is more meaningful for figuring out how long it will take to run a given program on a given machine. Notice, however, that the sum of the CPU times is approximately equal to the elapsed time divided by the processor utilization fraction. Consult the csh man page if you want to know more about these values.

I use gamow as the standard mostly because it is the machine on my desk; since the speed value for this example is 2.3, that means it runs 2.3 times as fast on that machine as it does on gamow (which took 1:03 on this problem). It also helps that gamow is not a server machine, which improves the reliability and reproducibility of timings done there. Finally, picking a slow machine as a standard makes the speed multiples mostly greater than one, which make comparing them more intuitive.

The table still has a few blank spots; it has taken quite a while to fill it out even this far, especially since I feel no pressure. Some machines (e.g. pauling and gamow) are almost identical in configuration, and should produce almost identical results.

A note on I/O traffic

Some of these benchmarks (new-stat and stat-test in particularl) tend to be I/O-bound, leading to lower numbers on the faster machines than for the compute-bound benchmarks (e.g. 1bmt-cva). That is why the percentage of processor utilization is also given along with the speed multiple. For a process that is not page-bound on machine that is not doing much else, processor utilization gives an indication of the degree to which performance is limited by I/O. For an I/O-bound job, this number will be lower, because a greater percentage of time is spent waiting for file access, usually over the network; these jobs are therefore more sensitive to file server loading. (All of these tests use files served by delbrueck exclusively, so the degree of processor loading on delbrueck can introduce error. I have tried to avoid running benchmarks at times when delbrueck is unusually busy, but since transient loading can be hard to detect, some noise of this nature is inevitable.)

It is interesting to compare the times for delbrueck to those of the other machines in its class, feynman and mbcrrc. Since delbrueck is the file server, its speed multiples for the I/O-bound tasks are more in line with the speed multiple for 1bmt-cva. This implies that gamow is not actually I/O-bound for these tasks; delbrueck can supply gamow with data as fast as gamow can accept it (though the low processor utilization for gamow is puzzling). feynman and mbcrrc have 1bmt-cva multiples similar to delbrueck, but, although the data are ambiguous, appear to lose a factor of two when they need to rely heavily on delbrueck for file service.

There may also be some vendor-dependence in file I/O performance, in that Sun will tune their NFS client using Sun servers, and DEC will tune their  NFS client using DEC servers, so a DEC machine talking to a Sun server may experience a larger performance hit for an I/O-bound job than a Sun machine performing the same job. This might explain why the Alphas do proportionally less well than the Solaris boxes on this task (though data are still scant).

Explanation of benchmarks

new-stat
This directory contains 5000+ lines of C source code, so this task tests make & gcc.
    cd ~thread/code/stat/new-stat/
    uptime; make clean; time make install; uptime
The standard time for this task on gamow is 1:03. The elapsed times range from 7 seconds to over four minutes (the latter on jrsadler, on which gcc seems to run painfully slowly). The roundoff error can be more than 10% on the faster machines, since those times only have one significant digit.

stat-test
Runs the code compiled above ( mrf-envs, mrf-counts, and mrf-scores, plus the sing-envs.pl script), which do a little bit of floating point, more perl to check the results, and lots of I/O throughout.

    cd ~thread/code/stat/test
    uptime; make clean > /dev/null; time make check > foo.text; uptime
This takes 8:17 on gamow.

make-core
Runs the DSSP code and the perl scripts necessary to produce the test set of cores included on the needle tools distribution tar file. (May be sensitive to system/version differences in DSSP speed.)
    cd ~thread/code/stat/dist/test-cores
    uptime; make clean > /dev/null; time make; uptime
This takes 32:34 on gamow. Since the perl scripts use internal pipes, it runs markedly faster on machines with multiple processors (the Solaris boxes each have four, as I understand it). It could run even faster on such machines by using the parallel processing capability of GNU make, but I am not doing this (as far as I know).

1bmt-cva
This runs calculate-vv-all (mostly), which is compute- and floating-point intensive, plus some testing scripts, on a chain of 246 residues.

    cd ~thread/code/stat/test
    uptime; rm -f new-env_1bmtA.pair; time make cmp-1bmtA; uptime
The calibration run on gamow takes fully 1:09:01, so the resolution is quite good even for the fastest machines. Note how the processor utilization figures are all up in the high 90's. This is because calculate-vv-all does little I/O and (apparently) doesn't need to page.


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Fri Apr 20 16:12:41 EDT 2001