Parallel performance of CM1:

This page presents information about the performance of CM1 on distributed memory supercomputers.


Updated results for NCAR's bluefire using cm1r16: posted 6 February 2012

System: NCAR's bluefire: IBM Power 575, 4.7 GHz Power6 processors (32 cores per node), Infiniband switch (2.5 GB/sec, 1.3-microsecond MPI latency).
Code: cm1r16
CM1 Configuration: MPI, using Simultaneous MultiThreading (SMT)
Case: supercell simulation, 2-h integration, 250-m horizontal grid spacing
Domain dimensions: 512 x 512 x 64
Time steps: 3,600
Results:

N (# of cores) Total Time (s) Overall Speedup Overall Efficiency (%) I/O time in s (and % of total time) non-I/O time in s
32 11,316 ---- ---- 13 (0.1%) 11,303
64 5,727 2.0 99 22 (0.4%) 5,705
128 2,994 3.8 94 33 (1.1%) 2,961
256 1,627 7.0 87 68 (4.2%) 1,559
512 1,015 11.1 70 158 (15.6%) 857
1,024 911 12.4 39 424 (46.5%) 487


Updated results for Sharcnet's saw using cm1r15: posted 15 May 2011

System: Sharcnet's saw: HP, 2.83 GHz Xeon processors, 8 cores per node, Infiniband interconnect.
Code: cm1r15
Compiler: Intel Fortran Compiler (ifort), version 11.0
CM1 Configuration: Distributed memory (MPI)
Case: three-dimensional hurricane simulation, 1-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 62
Time steps: 600
Results:

N (# of cores) Time (s) Speedup Efficiency (%)
8 (1 node) 13,563 ---- ----
16 6,933 1.96 98
32 3,435 3.95 99
64 1,675 8.10 101


Results for Sharcnet's saw: posted 22 October 2009

System: Sharcnet's saw: linux, 2.83 GHz Xeon processors, 8 processors per node, Infiniband interconnect.
Code: cm1r14
Compiler: Intel Fortran Compiler (ifort)
Configuration: Distributed memory (MPI)
Case: three-dimensional hurricane simulation, 1-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 600
Results:

N (# of processors) Time (s) Speedup Efficiency (%)
8 (1 node) 23,059 ---- ----
16 11,707 1.97 98
32 5,808 3.97 99
64 2,972 7.76 97
128 1,357 16.9 106
256 915 25.2 79


Results for Sharcnet's whale: posted 20 October 2009

System: Sharcnet's whale: linux, 2.2 GHz Opteron processors, Gigabit ethernet switch.
Code: cm1r13
Configuration: MPI (PathScale compiler)
Case: three-dimensional hurricane simulation, 1-h integration, 16-km horizontal grid spacing
Domain dimensions: 200 x 200 x 20
Time steps: 60
Results:
NOTE: timing includes I/O.

N (# of processors) Time (s) Speedup Efficiency (%)
1 420 ---- ----
2 185 2.3 114
4 112 3.5 88
8 69 6.4 80


Results for IBM Power 575: Last updated: 9 November 2009

System: NCAR's bluefire: IBM Power 575, 4.7 GHz Power6 processors (32 processors per node), Infiniband switch (2.5 GB/sec, 1.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI, using Simultaneous MultiThreading (SMT)
Case: three-dimensional hurricane simulation, 6-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 3,600
Results:
NOTE: timing includes I/O.

N (# of processors) Time (s) Speedup Efficiency (%)
32 (1 node) 9,227 ---- ----
64 4,436 2.1 104
128 2,179 4.2 106
256 1,125 8.2 103
512 626 14.7 92

Case: Within-node test: three-dimensional hurricane simulation, 1-h integration, 2-km horizontal grid spacing
Domain dimensions: 480 x 480 x 50
Time steps: 600
Results:

N (# of processors) Time (s) Speedup Efficiency (%)
1 24,118 ---- ----
2 12,106 2.0 100
4 6,027 4.0 100
8 3,018 8.0 100
16 1,507 16.0 100
32 794 30.4 95


Results for Linux cluster: posted 3 August 2009

System: NCAR's lightning: Linux cluster, 2.2 GHz Opteron processors (2 processors per node), Myrinet switch (248 MB/sec, 6.3-microsecond MPI latency).
Code: cm1r13
Configuration: MPI
Case: three-dimensional hurricane simulation, 3-h integration, 1-km horizontal grid spacing
Domain dimensions: 480 x 480 x 100
Time steps: 1,800
Results:
NOTE: timing includes I/O.

N (# of processors) Time (s) Speedup Efficiency (%)
16 27,790 ---- ----
32 14,214 1.96 98
64 7,557 3.7 92
128 3,837 7.2 91
160 3,226 8.6 86
192 2,891 9.8 81


Last updated: 7 February 2012

return to cm1 home page