the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Unlocking the Power of Parallel Computing: GPU technologies for Ocean Forecasting
Abstract. Operational ocean forecasting systems are complex engines that must execute ocean models with high performance to provide timely products and datasets. Significant computational resources are then needed to run high-fidelity models and, historically, technological evolution of microprocessors has constrained data parallel scientific computation. Today, GPUs offer an additional and valuable source of computing power to the traditional CPU-based machines: the exploitation of thousands of threads can significantly accelerate the execution of many models, ranging from traditional HPC workloads of finite-difference/volume/element modelling through to the training of deep neural networks used in machine learning and artificial intelligence. Despite the advantages, GPU usage in ocean forecasting is still limited due to the legacy of CPU-based model implementations and the intrinsic complexity of porting core models to GPU architectures. This review explores the potential use of GPU in ocean forecasting and how the computational characteristics of ocean models can influence the suitability of GPU architectures for the execution of the overall value chain: it discusses the current approaches to code (and performance) portability, from CPU to GPU, differentiating among tools that perform code-transformation, easing the adaptation of Fortran code for GPU execution (like PSyclone) or direct use of OpenACC directives (like ICON-O), to adoption of specific frameworks that facilitate the management of parallel execution across different architectures.
- Preprint
(1129 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on sp-2024-32', Anonymous Referee #1, 01 Oct 2024
Good summary of the problems of porting particularly FORTRAN programs intended for CPUs and MPI to GPUs to gain advantages of performance and - ideally - reduced power consumption and heat generation.
If the FORTRAN codes are too big, too complex or too brittle to be changed, then the interface to the distributed/parallel processing capability requires considerable work to be sufficiently general to handle a range of application codes on one hand, and a range of distributed/parallel processing systems on the other.
This difficulty - and possible solutions - should be discussed in the paper and would increase greatly the value of the contribution. These solutions range from using software other than MPI, defining a new interface (with middleware) between FORTRAN codes and parallel/distributed processing at both node and processor levels or revising the FORTRAN codes. Developments in processor technology may introduce other opportunities beyond the current generation of GPUs.
Also, it would appear that the problem is being addressed by OpenMPI and that they have a version of MPI that takes advantage of GPUs.
Citation: https://doi.org/10.5194/sp-2024-32-RC1 -
AC2: 'Reply on RC1', Andrew Porter, 16 Dec 2024
Many thanks for the review. Below we address the reviewer’s comments and suggestions (reviewer’s comments in italics, our responses follow):
"This difficulty - and possible solutions - should be discussed in the paper and would increase greatly the value of the contribution. These solutions range from using software other than MPI, defining a new interface (with middleware) between FORTRAN codes and parallel/distributed processing at both node and processor levels or revising the FORTRAN codes. Developments in processor technology may introduce other opportunities beyond the current generation of GPUs."
We concur with the reviewer regarding challenges and solutions, however we think that the manuscript does describe various solutions, i.e., we address the various ways in which OOFS can make use of GPUs, ranging from the addition of OpenACC/OpenMP directives to existing Fortran through to complete re-writes in C++, Python and Julia. We are unsure in what way we should extend the discussion.
"Also, it would appear that the problem is being addressed by OpenMPI and that they have a version of MPI that takes advantage of GPUs."
As you say, GPU-aware MPI implementations are important and we have extended the text to make this clear.
Citation: https://doi.org/10.5194/sp-2024-32-AC2
-
AC2: 'Reply on RC1', Andrew Porter, 16 Dec 2024
-
RC2: 'Comment on sp-2024-32', Mark R. Petersen, 30 Nov 2024
The comment was uploaded in the form of a supplement: https://sp.copernicus.org/preprints/sp-2024-32/sp-2024-32-RC2-supplement.pdf
-
AC1: 'Reply on RC2', Andrew Porter, 16 Dec 2024
Many thanks for the useful review and the additional technical information on the ocean component of E3SM that is being developed. It will be very interesting to see how your work there progresses. We have added this information to the text.
The reviewer comments:
"The AI/ML revolution is causing much greater uncertainty, as it may displace the numerical methods we’ve used for 70 years. I think this point deserves a few more sentences in the text."
This is a good suggestion. We've added some text about this issue.
To respond to the remaining points:
L43: We have updated the Top500 reference (and associated text) to use the latest list as of Nov 2024;
L59: Thanks for pointing out that we'd missed describing inter-processor communication and the additional complexities this brings for GPUs. We've added a paragraph explaining the need for this communication and the implications for GPU.
L98-112: We've added a little more detail here.
The Strauss et al. reference has been updated (thanks).
Citation: https://doi.org/10.5194/sp-2024-32-AC1
-
AC1: 'Reply on RC2', Andrew Porter, 16 Dec 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
72 | 51 | 252 | 375 | 3 | 3 |
- HTML: 72
- PDF: 51
- XML: 252
- Total: 375
- BibTeX: 3
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1