From what I've managed to accomplish with the GPU clients, I would recommend updating the ATI dll files to those of more recent ATI drivers. Granted, the dll names have been slightly changed, but by changing them to what the GPU client required, I have managed to utilize the ATI 10.x driver versions to increase the efficiency of the GPU clients significantly.
I would also like to recommend compressing the folding core programs with a packer such as UPX or something equally free and effective. I've personally managed to shrink the cores by 50-25% modestly while maintaining their executable nature. Another point I would like to make is that with the requirement of the processor usage along-side the GPU, it would make sense to allow for 64-bit processing so that the processor instructions can keep up with the faster-running GPU the way it's meant to be in a 64-bit processing environment. This will allow the full capability of the PCI-E graphics cards and may even enhance the speed of running the cores. But, if you were to go with my previous idea of compressing the executables and dll files via UPX (an executable packer), you would actually require a different executable packer which supports 64-bit executables. Either way, all of these options would significantly enhance the speed of the GPU folding programs.
Please see my SMP2 suggestions as well.