Parallel performance of CM1:
This page presents information about the performance of CM1 on distributed memory supercomputers.
Results for UK national HPC facility hector using cm1r16: posted 12 April 2012
System: UK national HPC facility's hector: Cray XE6 system, 2.3 GHz AMD processors (32 cores per node).
Code: cm1r16 with new I/O method
CM1 Configuration: MPI
Case: supercell simulation, 2-h integration, 250-m horizontal grid spacing
Domain dimensions: 512 x 512 x 64
Time steps: 3,600
Results:
| N (# of cores) | Total Time (s) | Overall Speedup | Overall Efficiency (%) | I/O time in s (and % of total time) |
|---|---|---|---|---|
| 32 | 24,929 | ---- | ---- | 15 (0.1%) |
| 64 | 13,212 | 1.9 | 94 | 11 (0.1%) |
| 128 | 6,774 | 3.7 | 92 | 10 (0.2%) |
| 256 | 3,668 | 6.8 | 85 | 49 (1.3%) |
| 512 | 1,806 | 13.8 | 86 | 52 (2.9%) |
| 1,024 | 925 | 27.0 | 84 | 24 (2.6%) |
| 4,096 | 277 | 90.0 | 70 | 11 (4.0%) |
Updated results for NCAR's bluefire using cm1r16: Updated on 18 April 2012
System: NCAR's bluefire: IBM Power 575, 4.7 GHz Power6 processors (32 cores per node), Infiniband switch (2.5 GB/sec, 1.3-microsecond MPI latency).
Code: cm1r16 with new I/O method
CM1 Configuration: MPI, using Simultaneous MultiThreading (SMT)
Case: supercell simulation, 2-h integration, 250-m horizontal grid spacing
Domain dimensions: 512 x 512 x 64
Time steps: 3,600
Results:
| N (# of cores) | Total Time (s) | Overall Speedup | Overall Efficiency (%) | I/O time (seconds) |
|---|---|---|---|---|
| 32 | 11,430 | ---- | ---- | 73 |
| 64 | 5,795 | 2.0 | 99 | 74 |
| 128 | 3,032 | 3.8 | 94 | 83 |
| 256 | 1,607 | 7.1 | 89 | 55 |
| 512 | 890 | 12.8 | 80 | 27 |
| 1,024 | 506 | 22.6 | 71 | 24 |
Updated results for Sharcnet's saw using cm1r15: posted 15 May 2011
System: Sharcnet's saw: HP, 2.83 GHz Xeon processors, 8 cores per node, Infiniband interconnect.
Code: cm1r15
Compiler: Intel Fortran Compiler (ifort), version 11.0
CM1 Configuration: Distributed memory (MPI)
Case: three-dimensional hurricane simulation, 1-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 62
Time steps: 600
Results:
| N (# of cores) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 8 (1 node) | 13,563 | ---- | ---- |
| 16 | 6,933 | 1.96 | 98 |
| 32 | 3,435 | 3.95 | 99 |
| 64 | 1,675 | 8.10 | 101 |
Results for Sharcnet's saw: posted 22 October 2009
System: Sharcnet's saw: linux, 2.83 GHz Xeon processors, 8 processors per node, Infiniband interconnect.
Code: cm1r14
Compiler: Intel Fortran Compiler (ifort)
Configuration: Distributed memory (MPI)
Case: three-dimensional hurricane simulation, 1-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 600
Results:
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 8 (1 node) | 23,059 | ---- | ---- |
| 16 | 11,707 | 1.97 | 98 |
| 32 | 5,808 | 3.97 | 99 |
| 64 | 2,972 | 7.76 | 97 |
| 128 | 1,357 | 16.9 | 106 |
| 256 | 915 | 25.2 | 79 |
Results for Sharcnet's whale: posted 20 October 2009
System: Sharcnet's whale: linux, 2.2 GHz Opteron processors, Gigabit ethernet switch.
Code: cm1r13
Configuration: MPI (PathScale compiler)
Case: three-dimensional hurricane simulation, 1-h integration, 16-km horizontal grid spacing
Domain dimensions: 200 x 200 x 20
Time steps: 60
Results:
NOTE: timing includes I/O.
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 1 | 420 | ---- | ---- |
| 2 | 185 | 2.3 | 114 |
| 4 | 112 | 3.5 | 88 |
| 8 | 69 | 6.4 | 80 |
Results for IBM Power 575: Last updated: 9 November 2009
System: NCAR's bluefire: IBM Power 575, 4.7 GHz Power6 processors (32 processors per node), Infiniband switch (2.5 GB/sec, 1.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI, using Simultaneous MultiThreading (SMT)
Case: three-dimensional hurricane simulation, 6-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 3,600
Results:
NOTE: timing includes I/O.
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 32 (1 node) | 9,227 | ---- | ---- |
| 64 | 4,436 | 2.1 | 104 |
| 128 | 2,179 | 4.2 | 106 |
| 256 | 1,125 | 8.2 | 103 |
| 512 | 626 | 14.7 | 92 |
Case: Within-node test: three-dimensional hurricane simulation, 1-h integration, 2-km horizontal grid spacing
Domain dimensions: 480 x 480 x 50
Time steps: 600
Results:
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 1 | 24,118 | ---- | ---- |
| 2 | 12,106 | 2.0 | 100 |
| 4 | 6,027 | 4.0 | 100 |
| 8 | 3,018 | 8.0 | 100 |
| 16 | 1,507 | 16.0 | 100 |
| 32 | 794 | 30.4 | 95 |
Results for Linux cluster: posted 3 August 2009
System: NCAR's lightning: Linux cluster, 2.2 GHz Opteron processors (2 processors per node), Myrinet switch (248 MB/sec, 6.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI
Case: three-dimensional hurricane simulation, 3-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 1,800
Results:
NOTE: timing includes I/O.
| N (# of processors) | Time (s) | Speedup | Efficiency (%) |
|---|---|---|---|
| 16 | 27,790 | ---- | ---- |
| 32 | 14,214 | 1.96 | 98 |
| 64 | 7,557 | 3.7 | 92 |
| 128 | 3,837 | 7.2 | 91 |
| 160 | 3,226 | 8.6 | 86 |
| 192 | 2,891 | 9.8 | 81 |
Last updated: 18 April 2012