Page 2 of 2

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Sat Nov 30, 2013 11:28 pm
by davidcoton
I run Ubuntu, not Mint, and only a single GPU, but here are some thoughts:

1) Core17 needs a CPU to feed it. You have all 16 CPUs allocated to the CPU slot. Try running the CPU slot without GPUs. Get that working first. Then reduce the CPUs allocated to the CPU slot (in advanced control, configure|slots|cpu|edit and set CPUs allocated to 12). Then get the GPU slots working, one at a time. Finally, you could set up a second CPU slot to use the last two CPUs -- or you could leave them for non-folding use. [The number of CPUs allocated to a slot has to factorise without prime numbers above 5, or some WUs will fail. So 16 is good -- but does not allow Core17 to run without interference. 15 will work with no more than one Core17 GPU. 14 is bad (factor 7), 13 awful, 12 good.] If this fails, try running each GPU singly, with CPU folding paused.

2) 86C is high, but not alarming for a GPU (usually safe to 100C -- but most folders prefer cooler), so that is not likely the cause of failure. I would recommend improved cooling, but there are probably other more serious issues.

3) What PSU do you have? Can it supply enough current on the 12V lines for 2 GPUs? Or even one??

David

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Sun Dec 01, 2013 12:26 am
by apaseall
@davidcoton
Hi and thanks for your post.
I have changed down to 12 cores :)
CPU slot 16 works fine, tried that first. Wanted to get gpu going. Then bought a better card for some other reason. Thought I would move the old one over and fold with it.

The temp problem is rather silly really. The 470 is the card that has nothing plugged into it. I want to fold on it. Fair enough. When I fold on it temps look reasonable.
Start up the 760 and the temp soars. Gets quite hot. Thing is though that I was not reading the temps correctly.
It is the 470 that gets toasty NOT the 760.
Turns out that the exhaust from the 760 blows directly onto the back of the 470.

Some form of duct will happen real soon :D

PSU ? corsait atx1200i so should be plenty spare for those two cards after feeding the 2 xeons :D

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Sun Dec 01, 2013 9:38 pm
by apaseall
Well this is annoying.
I have 2 GPU slots. Both have the triple -1 settings. FAHControl provides a description for each slot which states which gpu it is.

It is WRONG.

I have 2 different GPUs. 470 & 760. FAHControl shows them as GK104 [GeForce GTX 760] & GK100 [GeForce GTX 470]

Both GPU paused.

Say I fold with GK104 [GeForce GTX 760]
Psensor GPU1 temp rises.
Nvidia X Server Settings shows a rise in temperature. It also shows a change in performance level from 0 to 3.
But for GPU1 (Geforce GTX 470).

Manual check - stick hand in each exhaust to see which one is hot.
Yes the 470 is toasty.

So FAHControl lies when it describes the GPU as a 760.

If I pause GK104 [GeForce GTX 760] and fold with GK100 [GeForce GTX 470] ...
Psensor shows the temp rise and fall as one GPU cools under no load whilst the other heats up as it munches.
Same with Nvidia X Server Settings.

FAHControl tells lies :( naughty FAHControl.

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Sun Dec 01, 2013 10:02 pm
by P5-133XL
This is the instance where manually adjusting the opencl-index, and gpu-index (values start at 0 with -1 being automatic) to force the slot descriptions to match the observed temps/video card has some value. Work with one slot at a time till that slot matches the observed and then move to the next. I will agree that it is a pain to do this and I know it shouldn't be necessary, but I know of no better way. Be methodical and you will solve it.

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Mon Dec 02, 2013 12:23 am
by bruce
Either GPU can be detected first (gpu 0) and the other one will be detected second (gpu 1). FAH may be detecting them in the opposite order from what you want them to be but that's not the same as a lie. For more information, search for the open function "lspci" which is used by FAH.

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Mon Dec 02, 2013 1:08 am
by 7im
Did you add the 2nd GPU after the client was already installed?

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Mon Dec 02, 2013 6:25 pm
by apaseall
To be clear about this, the lie is the description, not the gpu number.
Both cards were present when the app was installed.

Found another lie. Installed app on laptop, it wrongly reports that the gpu is a GT 555M where as in reality it is a GT 540M.

I will try using the index values for the 470 & 760 to see if they are described correctly.

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Mon Dec 02, 2013 6:39 pm
by apaseall
lspic reports [among the big list]
0c:00.0 VGA compatible controller: NVIDIA Corporation Device 1187 (rev a1)
08:00.0 VGA compatible controller: NVIDIA Corporation GF100 [GeForce GTX 470] (rev a3)

470 is before the 760.

Just deleted the existing gpu slots. made new ones. gpu0 is still reported as 760 with gpu1 as 470.
Pausing both and folding with one at a time continues to behave incorrectly.
Namely 470 running actually loads the 760 ie temp rise with nvidia-msi reporting memory usage.

So I stand by my comment, the descriptions are wrong, FAHControl is telling lies.

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Mon Dec 02, 2013 7:30 pm
by 7im
apaseall wrote:
Found another lie. Installed app on laptop, it wrongly reports that the gpu is a GT 555M where as in reality it is a GT 540M.
The GPU vendors used the same Device ID for multiple models of GPUs. The GPUs.txt file is based on the GPU description as listed inside the OEM driver. If the device is listed multiple times, the first example is typically used. And the description in the GPUs.txt file is purely cosmetic, so there is no functionality difference. I wouldn't call that a lie, but to each their own. It is at best a misnomer. ;)

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Tue Dec 03, 2013 2:14 am
by jimerickson
no the indexes are merely assigned wrong. like 7im said its a misnomer.

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Tue Dec 03, 2013 4:57 am
by bruce
jimerickson wrote:no the indexes are merely assigned wrong.
As I already said, wrong compared to what?

Most likely they're 0 and 1 so that makes they right.

If they're in the order preferred by lspci rather than the order preferred by you all that means is that you two don't agree.

LSPCI does not follow the order of the slots on your pci bus.

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Tue Dec 03, 2013 1:55 pm
by PantherX
apaseall wrote:...Pausing both and folding with one at a time continues to behave incorrectly.
Namely 470 running actually loads the 760 ie temp rise with nvidia-msi reporting memory usage.

So I stand by my comment, the descriptions are wrong, FAHControl is telling lies.
To clarify, this is what you are seeing:
Physical GTX 470 maps to F@H GPU Slot 760
Physical GTX 760 maps to F@H GPU Slot 470

If yes, then it is a bug but not very serious one. The reason being, WUs assigned to either GPUs would fold successfully. However Keplers are more efficient using FahCore_17 while Fermis are better on FahCore_15.

Furthermore, please note that this is the first time that GPU folding is natively being supported on Linux, thus, few bugs could be expected.

Please note that the physical layout of the GPU in the PCI-E Slots may not match the numbering in FAHControl, as stated above.

Re: NV 760&470 LinuxMint14 Bad PlatformID Size

Posted: Tue Dec 03, 2013 2:06 pm
by 7im
It's an easy fix. Follow the procedure I linked to on page one of this thread. It will straighten out the indexes.