Parallel performance of CM1:
This page presents information about the performance of CM1 on distributed memory supercomputers.
Updated results for NCAR's bluefire using cm1r16: posted 6 February 2012
System: NCAR's bluefire: IBM Power 575, 4.7 GHz Power6 processors (32 cores per node), Infiniband switch (2.5 GB/sec, 1.3-microsecond MPI latency).
Code: cm1r16
CM1 Configuration: MPI, using Simultaneous MultiThreading (SMT)
Case: supercell simulation, 2-h integration, 250-m horizontal grid spacing
Domain dimensions: 512 x 512 x 64
Time steps: 3,600
Results:
| N (# of cores) | Total Time (s) | Overall Speedup | Overall Efficiency (%) | I/O time in s (and % of total time) | non-I/O time in s |
|---|---|---|---|---|---|
| 32 | 11,316 | ---- | ---- | 13 (0.1%) | 11,303 |
| 64 | 5,727 | 2.0 | 99 | 22 (0.4%) | 5,705 |
| 128 | 2,994 | 3.8 | 94 | 33 (1.1%) | 2,961 |
| 256 | 1,627 | 7.0 | 87 | 68 (4.2%) | 1,559 |
| 512 | 1,015 | 11.1 | 70 | 158 (15.6%) | 857 |
| 1,024 | 911 | 12.4 | 39 | 424 (46.5%) | 487 |
Updated results for Sharcnet's saw using cm1r15: posted 15 May 2011
System: Sharcnet's saw: HP, 2.83 GHz Xeon processors, 8 cores per node, Infiniband interconnect.
Code: cm1r15
Compiler: Intel Fortran Compiler (ifort), version 11.0
CM1 Configuration: Distributed memory (MPI)
Case: three-dimensional hurricane simulation, 1-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 62
Time steps: 600
Results:
| N (# of cores) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 8 (1 node) | 13,563 | ---- | ---- |
| 16 | 6,933 | 1.96 | 98 |
| 32 | 3,435 | 3.95 | 99 |
| 64 | 1,675 | 8.10 | 101 |
Results for Sharcnet's saw: posted 22 October 2009
System: Sharcnet's saw: linux, 2.83 GHz Xeon processors, 8 processors per node, Infiniband interconnect.
Code: cm1r14
Compiler: Intel Fortran Compiler (ifort)
Configuration: Distributed memory (MPI)
Case: three-dimensional hurricane simulation, 1-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 600
Results:
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 8 (1 node) | 23,059 | ---- | ---- |
| 16 | 11,707 | 1.97 | 98 |
| 32 | 5,808 | 3.97 | 99 |
| 64 | 2,972 | 7.76 | 97 |
| 128 | 1,357 | 16.9 | 106 |
| 256 | 915 | 25.2 | 79 |
Results for Sharcnet's whale: posted 20 October 2009
System: Sharcnet's whale: linux, 2.2 GHz Opteron processors, Gigabit ethernet switch.
Code: cm1r13
Configuration: MPI (PathScale compiler)
Case: three-dimensional hurricane simulation, 1-h integration, 16-km horizontal grid spacing
Domain dimensions: 200 x 200 x 20
Time steps: 60
Results:
NOTE: timing includes I/O.
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 1 | 420 | ---- | ---- |
| 2 | 185 | 2.3 | 114 |
| 4 | 112 | 3.5 | 88 |
| 8 | 69 | 6.4 | 80 |
Results for IBM Power 575: Last updated: 9 November 2009
System: NCAR's bluefire: IBM Power 575, 4.7 GHz Power6 processors (32 processors per node), Infiniband switch (2.5 GB/sec, 1.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI, using Simultaneous MultiThreading (SMT)
Case: three-dimensional hurricane simulation, 6-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 3,600
Results:
NOTE: timing includes I/O.
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 32 (1 node) | 9,227 | ---- | ---- |
| 64 | 4,436 | 2.1 | 104 |
| 128 | 2,179 | 4.2 | 106 |
| 256 | 1,125 | 8.2 | 103 |
| 512 | 626 | 14.7 | 92 |
Case: Within-node test: three-dimensional hurricane simulation, 1-h integration, 2-km horizontal grid spacing
Domain dimensions: 480 x 480 x 50
Time steps: 600
Results:
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 1 | 24,118 | ---- | ---- |
| 2 | 12,106 | 2.0 | 100 |
| 4 | 6,027 | 4.0 | 100 |
| 8 | 3,018 | 8.0 | 100 |
| 16 | 1,507 | 16.0 | 100 |
| 32 | 794 | 30.4 | 95 |
Results for Linux cluster: posted 3 August 2009
System: NCAR's lightning: Linux cluster, 2.2 GHz Opteron processors (2 processors per node), Myrinet switch (248 MB/sec, 6.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI
Case: three-dimensional hurricane simulation, 3-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 1,800
Results:
NOTE: timing includes I/O.
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 16 | 27,790 | ---- | ---- |
| 32 | 14,214 | 1.96 | 98 |
| 64 | 7,557 | 3.7 | 92 |
| 128 | 3,837 | 7.2 | 91 |
| 160 | 3,226 | 8.6 | 86 |
| 192 | 2,891 | 9.8 | 81 |
Last updated: 7 February 2012