Parallel performance of CM1:

This page presents information about the performance of CM1 on distributed memory supercomputers.


Results for UK national HPC facility hector using cm1r16: posted 12 April 2012

System: UK national HPC facility's hector: Cray XE6 system, 2.3 GHz AMD processors (32 cores per node).
Code: cm1r16 with new I/O method
CM1 Configuration: MPI
Case: supercell simulation, 2-h integration, 250-m horizontal grid spacing
Domain dimensions: 512 x 512 x 64
Time steps: 3,600
Results:

N (# of cores) Total Time (s) Overall Speedup Overall Efficiency (%) I/O time in s (and % of total time)
32 24,929 ---- ---- 15 (0.1%)
64 13,212 1.9 94 11 (0.1%)
128 6,774 3.7 92 10 (0.2%)
256 3,668 6.8 85 49 (1.3%)
512 1,806 13.8 86 52 (2.9%)
1,024 925 27.0 84 24 (2.6%)
4,096 277 90.0 70 11 (4.0%)


Updated results for NCAR's bluefire using cm1r16: Updated on 18 April 2012

System: NCAR's bluefire: IBM Power 575, 4.7 GHz Power6 processors (32 cores per node), Infiniband switch (2.5 GB/sec, 1.3-microsecond MPI latency).
Code: cm1r16 with new I/O method
CM1 Configuration: MPI, using Simultaneous MultiThreading (SMT)
Case: supercell simulation, 2-h integration, 250-m horizontal grid spacing
Domain dimensions: 512 x 512 x 64
Time steps: 3,600
Results:

N (# of cores) Total Time (s) Overall Speedup Overall Efficiency (%) I/O time (seconds)
32 11,430 ---- ---- 73
64 5,795 2.0 99 74
128 3,032 3.8 94 83
256 1,607 7.1 89 55
512 890 12.8 80 27
1,024 506 22.6 71 24


Updated results for Sharcnet's saw using cm1r15: posted 15 May 2011

System: Sharcnet's saw: HP, 2.83 GHz Xeon processors, 8 cores per node, Infiniband interconnect.
Code: cm1r15
Compiler: Intel Fortran Compiler (ifort), version 11.0
CM1 Configuration: Distributed memory (MPI)
Case: three-dimensional hurricane simulation, 1-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 62
Time steps: 600
Results:

N (# of cores) Time (s) Speedup Efficiency (%)
8 (1 node) 13,563 ---- ----
16 6,933 1.96 98
32 3,435 3.95 99
64 1,675 8.10 101


Results for Sharcnet's saw: posted 22 October 2009

System: Sharcnet's saw: linux, 2.83 GHz Xeon processors, 8 processors per node, Infiniband interconnect.
Code: cm1r14
Compiler: Intel Fortran Compiler (ifort)
Configuration: Distributed memory (MPI)
Case: three-dimensional hurricane simulation, 1-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 600
Results:

N (# of processors) Time (s) Speedup Efficiency (%)
8 (1 node) 23,059 ---- ----
16 11,707 1.97 98
32 5,808 3.97 99
64 2,972 7.76 97
128 1,357 16.9 106
256 915 25.2 79


Results for Sharcnet's whale: posted 20 October 2009

System: Sharcnet's whale: linux, 2.2 GHz Opteron processors, Gigabit ethernet switch.
Code: cm1r13
Configuration: MPI (PathScale compiler)
Case: three-dimensional hurricane simulation, 1-h integration, 16-km horizontal grid spacing
Domain dimensions: 200 x 200 x 20
Time steps: 60
Results:
NOTE: timing includes I/O.

N (# of processors) Time (s) Speedup Efficiency (%)
1 420 ---- ----
2 185 2.3 114
4 112 3.5 88
8 69 6.4 80


Results for IBM Power 575: Last updated: 9 November 2009

System: NCAR's bluefire: IBM Power 575, 4.7 GHz Power6 processors (32 processors per node), Infiniband switch (2.5 GB/sec, 1.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI, using Simultaneous MultiThreading (SMT)
Case: three-dimensional hurricane simulation, 6-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 3,600
Results:
NOTE: timing includes I/O.

N (# of processors) Time (s) Speedup Efficiency (%)
32 (1 node) 9,227 ---- ----
64 4,436 2.1 104
128 2,179 4.2 106
256 1,125 8.2 103
512 626 14.7 92

Case: Within-node test: three-dimensional hurricane simulation, 1-h integration, 2-km horizontal grid spacing
Domain dimensions: 480 x 480 x 50
Time steps: 600
Results:

N (# of processors) Time (s) Speedup Efficiency (%)
1 24,118 ---- ----
2 12,106 2.0 100
4 6,027 4.0 100
8 3,018 8.0 100
16 1,507 16.0 100
32 794 30.4 95


Results for Linux cluster: posted 3 August 2009

System: NCAR's lightning: Linux cluster, 2.2 GHz Opteron processors (2 processors per node), Myrinet switch (248 MB/sec, 6.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI
Case: three-dimensional hurricane simulation, 3-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 1,800
Results:
NOTE: timing includes I/O.

N (# of processors) Time (s) Speedup Efficiency (%)
16 27,790 ---- ----
32 14,214 1.96 98
64 7,557 3.7 92
128 3,837 7.2 91
160 3,226 8.6 86
192 2,891 9.8 81


Last updated: 18 April 2012

return to cm1 home page