Parallel performance of CM1:

This page presents information about the performance of CM1 on distributed memory supercomputers. The only formal document can be accessed here:

Document: "Performance of the Bryan-Fritsch numerical model on parallel computers". Nov 2002.

It is an excerpt from G. Bryan's PhD dissertation. Because this information is outdated, I am currently running some new tests of parallel performance using cm1r13. Here are some results:


Results for Sharcnet's saw: posted 22 October 2009

System: Sharcnet's saw: linux, 2.83 GHz Xeon processors, 8 processors per node, Infiniband interconnect.
Code: cm1r14
Compiler: Intel Fortran Compiler (ifort)
Configuration: Distributed memory (MPI)
Case: three-dimensional hurricane simulation, 1-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 600
Results:

N (# of processors) Time (s) Speedup Efficiency (%)
8 (1 node) 23,059 ---- ----
16 11,707 1.97 98
32 5,808 3.97 99
64 2,972 7.76 97
128 1,357 16.9 106
256 915 25.2 79


Results for Sharcnet's whale: posted 20 October 2009

System: Sharcnet's whale: linux, 2.2 GHz Opteron processors, Gigabit ethernet switch.
Code: cm1r13
Configuration: MPI (PathScale compiler)
Case: three-dimensional hurricane simulation, 1-h integration, 16-km horizontal grid spacing
Domain dimensions: 200 x 200 x 20
Time steps: 60
Results:
NOTE: timing includes I/O.

N (# of processors) Time (s) Speedup Efficiency (%)
1 420 ---- ----
2 185 2.3 114
4 112 3.5 88
8 69 6.4 80


Results for IBM Power 575: Last updated: 7 November 2009

System: NCAR's bluefire: IBM Power 575, 4.7 GHz Power6 processors (32 processors per node), Infiniband switch (2.5 GB/sec, 1.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI, using Simultaneous MultiThreading (SMT)
Case: three-dimensional hurricane simulation, 6-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 3,600
Results:
NOTE: timing includes I/O.

N (# of processors) Time (s) Speedup Efficiency (%)
32 (1 node) 9,227 ---- ----
64 4,436 2.1 104
128 2,179 4.2 106
256 1,125 8.2 103
512 626 14.7 92

Case: Within-node test: three-dimensional hurricane simulation, 1-h integration, 2-km horizontal grid spacing
Domain dimensions: 480 x 480 x 50
Time steps: 600
Results:

N (# of processors) Time (s) Speedup Efficiency (%)
1 ---- ----
2
4 6,027
8
16 1,510
32 794


Results for Linux cluster: posted 3 August 2009

System: NCAR's lightning: Linux cluster, 2.2 GHz Opteron processors (2 processors per node), Myrinet switch (248 MB/sec, 6.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI
Case: three-dimensional hurricane simulation, 3-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 1,800
Results:
NOTE: timing includes I/O.

N (# of processors) Time (s) Speedup Efficiency (%)
16 27,790 ---- ----
32 14,214 1.96 98
64 7,557 3.7 92
128 3,837 7.2 91
160 3,226 8.6 86
192 2,891 9.8 81


Last updated: 7 November 2009

return to cm1 home page