I have a somewhat frustrating issue:
The WU on my Vega GPU gets stuck at 99%
shutting down the process and restarting it reverts the WU to 67%
I'm using amdgpu-pro libraries on top of the native amdgpu 5.6 driver, as the chipset on my mainboard does not provide PCI atomics and therefore is uncompatible with ROCm
The log files do not show anything.
radeontop just shows the Graphics pipe, texture addresser and shader interpolator being used at 100%, but that's it.
top does not show any activity at all for the FAHCore on the GPU
It seems like the WU is waiting for the GPU to complete a final operation, but it never does.
the kernel log does not show anything, the GPU sensors show a temperature of +- 70°C, which is OK for a GPU at full load.
Any suggestions?