Parallel performance of CM1:
This page presents information about the performance of CM1 on distributed memory supercomputers. The only formal document can be accessed here:
Document: "Performance of the Bryan-Fritsch numerical model on parallel computers". Nov 2002.
It is an excerpt from G. Bryan's PhD dissertation. Because this information is outdated, I am currently running some new tests of parallel performance using cm1r13. Here are some results:
Results for Sharcnet's saw: posted 22 October 2009
System: Sharcnet's saw: linux, 2.83 GHz Xeon processors, 8 processors per node, Infiniband interconnect.
Code: cm1r14
Compiler: Intel Fortran Compiler (ifort)
Configuration: Distributed memory (MPI)
Case: three-dimensional hurricane simulation, 1-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 600
Results:
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 8 (1 node) | 23,059 | ---- | ---- |
| 16 | 11,707 | 1.97 | 98 |
| 32 | 5,808 | 3.97 | 99 |
| 64 | 2,972 | 7.76 | 97 |
| 128 | 1,357 | 16.9 | 106 |
| 256 | 915 | 25.2 | 79 |
Results for Sharcnet's whale: posted 20 October 2009
System: Sharcnet's whale: linux, 2.2 GHz Opteron processors, Gigabit ethernet switch.
Code: cm1r13
Configuration: MPI (PathScale compiler)
Case: three-dimensional hurricane simulation, 1-h integration, 16-km horizontal grid spacing
Domain dimensions: 200 x 200 x 20
Time steps: 60
Results:
NOTE: timing includes I/O.
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 1 | 420 | ---- | ---- |
| 2 | 185 | 2.3 | 114 |
| 4 | 112 | 3.5 | 88 |
| 8 | 69 | 6.4 | 80 |
Results for IBM Power 575: Last updated: 7 November 2009
System: NCAR's bluefire: IBM Power 575, 4.7 GHz Power6 processors (32 processors per node), Infiniband switch (2.5 GB/sec, 1.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI, using Simultaneous MultiThreading (SMT)
Case: three-dimensional hurricane simulation, 6-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 3,600
Results:
NOTE: timing includes I/O.
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 32 (1 node) | 9,227 | ---- | ---- |
| 64 | 4,436 | 2.1 | 104 |
| 128 | 2,179 | 4.2 | 106 |
| 256 | 1,125 | 8.2 | 103 |
| 512 | 626 | 14.7 | 92 |
Case: Within-node test: three-dimensional hurricane simulation, 1-h integration, 2-km horizontal grid spacing
Domain dimensions: 480 x 480 x 50
Time steps: 600
Results:
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 1 | ---- | ---- | |
| 2 | |||
| 4 | 6,027 | ||
| 8 | |||
| 16 | 1,510 | ||
| 32 | 794 |
Results for Linux cluster: posted 3 August 2009
System: NCAR's lightning: Linux cluster, 2.2 GHz Opteron processors (2 processors per node), Myrinet switch (248 MB/sec, 6.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI
Case: three-dimensional hurricane simulation, 3-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 1,800
Results:
NOTE: timing includes I/O.
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 16 | 27,790 | ---- | ---- |
| 32 | 14,214 | 1.96 | 98 |
| 64 | 7,557 | 3.7 | 92 |
| 128 | 3,837 | 7.2 | 91 |
| 160 | 3,226 | 8.6 | 86 |
| 192 | 2,891 | 9.8 | 81 |
Last updated: 7 November 2009