I think the Pascal header versions should not matter, as in versions 2.x they should be backward compatible and as the program itself contains no later features there should not be compatibility issues. Once I will have time, I will generate a new set of the Pascal wrapper anyway.
Regarding the earlier comment, re packrecords, - besides the fact that it did not work, - again it is unlikely to be the root cause. If Tensorflow is used in CPU mode, there was never a problem. So, I would assume without knowing too much about TF inside, that the building of the graph between Pascal-C API-TF is OK. The problem sounds to be somewhere between TF-Cuda-GPU.
Anyway I will raise the issue on TF Github as well, but as it is so complicated (cca. ten Cuda libraries, all with versions), plus the complex HW architecture) I am not sure this can be reproduced. If one has access to Nvidia GPU and could test this program and reproduce the error that would help.
Also, some magic debugging tool would help, but I am not too familiar with gdb, fpDebug. I tried to change the debugger in Lazaus, but as I see in Project Options/Compiler Commands/Show Options (this is where I get the options for fpc used on the HPC) there is no difference. So the binary is the same, and I can use gdb on the HPC. My worry is that the TF, Cuda libraries are highly optimized, they contain no debug information at all. Maybe recompiling the TF with debug info could help, but it is beyond my capabilities.