Background
WRF is intended to be performance portable, including vector.
What storage order should we chose for WRF? In other words, what is the penalty on Cache-based machines for chosing a vector-friendly ordering? Willing to pay?
M. Ashworth (Daresbury Lab), ECMWF ‘98 workshop presentation: compared K-inner versus I-inner orderings for array dimensions and loop nesting in a test kernel. (See “Optimisation for Vector and RISC Processors,” in Towards Teracomputing, World Scientific, River Edge, NJ. 1999. pp. 353-359.)
Ashworth found that RISC were relatively insensitive but vector was very sensitive, by up to factors of 4 on Fujitsu VPP300 and 8 on NEC SX-4
Conduct single-processor experiments using WRF prototype written in both KIJ and IKJ storage/loop orders:
- 2nd order Runge-Kutta, low-order advection
- Rudimentary moisture physics (Kessler)
- Initializing from idealized case, no I/O
Varying domain sizes over IX,JX=(81,41,21); KX=(81,41)
Single tile in X (minor) horizontal dimension, narrow (1,2,3,4-wide) tiles in Y.