GPUs mixed up

Moderators: Site Moderators, FAHC Science Team

Post Reply
Peter_Hucker
Posts: 308
Joined: Wed Feb 16, 2022 1:18 am

GPUs mixed up

Post by Peter_Hucker »

I installed Folding on a computer with a 4 core Intel CPU which has built in Intel HD graphics, and an AMD discrete GPU. I wished to run Folding on the AMD GPU. When I ticked this and only this in the web control, I find only the Intel HD graphics runs a task (obvious by looking at the % usage and temperatures in MSI Afterburner). I then tried ticking the Intel HD box thinking they were mixed up, but that one gives the error:

15:49:51:E ::WU13:HTTP_SERVICE_UNAVAILABLE: {"error":"No appropriate assignment"}
15:49:51:I1::WU13:Retry #7 in 2 mins 8 secs

Even worse, I then plugged in four AMD GPUs (which had been running folding just fine on another machine), and it can't use any of them. It tries, but one task goes onto the Intel graphics, and the others keep failing with this:

16:24:47:I1::WU28:Failed to create OpenCL context:
16:24:47:I1::WU28:Illegal value for DeviceIndex: 2
16:24:47:I1::WU28:ERROR:125: Failed to create a GPU-enabled OpenMM Context.
16:24:47:I1::WU28:Saving result file ..\logfile_01.txt
16:24:47:I1::WU28:Saving result file science.log
16:24:47:I1::WU28:Folding@home Core Shutdown: BAD_WORK_UNIT
16:24:48:W ::WU28:Core returned BAD_WORK_UNIT (114)

The hardware is set up fine, I can run OpenCL tasks from Boinc ok.

"Failed to create a GPU-enabled OpenMM Context" suggests a driver is missing, but the drivers are the same ones I use everywhere for those cards, and Boinc is happy with them.
Peter_Hucker
Posts: 308
Joined: Wed Feb 16, 2022 1:18 am

Re: GPUs mixed up

Post by Peter_Hucker »

19:39:08:I1::WU43: opencl-device 2 specified
19:39:20:I1::WU43:WARNING:Console control signal 1 on PID 4172
19:39:20:I1::WU43:Exiting, please wait. . .
19:39:23:I1:OUT14:< vav19.fah.temple.edu:443 HTTP/1.1 200 HTTP_OK
19:39:24:I1::WU44:Received WU
19:39:25:I3::WU43:Dumping NWAt8YSJTROdhR80olH18g0iP6flLCTYa2KXcBxpnq4
19:39:26:I1::WU43:ERROR:102: Core startup was interrupted by client.
19:39:26:I1::WU43:Folding@home Core Shutdown: INTERRUPTED
19:39:26:I3::WU44:Dumping XPQ10vJRdVeasZ5zyat00wi5YoLeYgG6zfHXtNEl_nI
19:39:26:I1::WU44:Sending dump report
19:39:26:I1:OUT15:> POST https://vav19.fah.temple.edu/api/results HTTP/1.1
19:39:26:I3:Connecting to vav19.fah.temple.edu:443
19:39:27:I1:OUT15:< vav19.fah.temple.edu:443 HTTP/1.1 200 HTTP_OK
19:39:27:I1::WU44:Dumped
19:39:27:I1::WU43:Core returned INTERRUPTED (102)
19:39:27:I1::WU43:Sending dump report
calxalot
Site Moderator
Posts: 892
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: GPUs mixed up

Post by calxalot »

Is this fah v8.1.14?
What OS?
Peter_Hucker
Posts: 308
Joined: Wed Feb 16, 2022 1:18 am

Re: GPUs mixed up

Post by Peter_Hucker »

8.1.13, I wasn't aware of an update, I'll go get it.
Windows 11, all updates.
Peter_Hucker
Posts: 308
Joined: Wed Feb 16, 2022 1:18 am

Re: GPUs mixed up

Post by Peter_Hucker »

8.1.4 still has the problem. It may well be the computer's fault. It's a second hand mining motherboard I got so I could plug many GPUs into it for folding. The board is a bit dodgy - it claims a CMOS checksum error, even after a battery replacement and CMOS reset. It only boots off legacy devices, not UEFI, despite me adjusting the settings.

So instead I've got folding running on a decent machine by plugging 5 cards (soon to be 8) into a splitter card somebody over at Einstein@Home donated to me. It takes 8 cards on USB risers and connects them to a single PCI-Express x4 port. Better than the usual 4 way adapters which only connect to an x1 port and Folding can't get enough bandwidth to all the cards.
hojomojo
Posts: 2
Joined: Sat Jul 01, 2023 7:27 pm

Re: GPUs mixed up

Post by hojomojo »

I'm getting this same issue on Debian.

It just started happening out of the blue. No hard or software changes (other than the upgrade to V8).

22:31:43:W ::WU3790:Core returned BAD_WORK_UNIT (114)
22:32:29:W ::WU3791:Core returned BAD_WORK_UNIT (114)
22:32:44:W ::WU3792:Core returned BAD_WORK_UNIT (114)
22:33:12:W ::WU3795:Core returned BAD_WORK_UNIT (114)
22:33:28:W ::WU3796:Core returned BAD_WORK_UNIT (114)
22:34:17:W ::WU3797:Core returned BAD_WORK_UNIT (114)
22:34:37:W ::WU3799:Core returned BAD_WORK_UNIT (114)
22:34:47:W ::WU3800:Core returned BAD_WORK_UNIT (114)
22:35:03:W ::WU3801:Core returned BAD_WORK_UNIT (114)
22:35:52:W ::WU3802:Core returned BAD_WORK_UNIT (114)
22:36:39:W ::WU3803:Core returned BAD_WORK_UNIT (114)
22:37:26:W ::WU3804:Core returned BAD_WORK_UNIT (114)
22:38:18:W ::WU3806:Core returned BAD_WORK_UNIT (114)
22:38:49:W ::WU3809:Core returned BAD_WORK_UNIT (114)
22:38:56:W ::WU3810:Core returned BAD_WORK_UNIT (114)
22:39:06:W ::WU3812:Core returned BAD_WORK_UNIT (114)
22:39:21:W ::WU3814:Core returned BAD_WORK_UNIT (114)
22:39:44:W ::WU3817:Core returned BAD_WORK_UNIT (114)
22:40:09:W ::WU3819:Core returned BAD_WORK_UNIT (114)
22:40:21:W ::WU3821:Core returned BAD_WORK_UNIT (114)
22:42:15:W ::WU3822:Core returned BAD_WORK_UNIT (114)
22:42:29:W ::WU3824:Core returned BAD_WORK_UNIT (114)
22:42:37:W ::WU3825:Core returned BAD_WORK_UNIT (114)

I saw someone had a similar issue and the answer blamed his machine. But that was Windows, but is Debian Linux.

Any ideas?
Peter_Hucker
Posts: 308
Joined: Wed Feb 16, 2022 1:18 am

Re: GPUs mixed up

Post by Peter_Hucker »

One of my Windows machines did that too. I paused it when I saw it, thinking there was something wrong with it, and put it on another project. But I came back a couple of days later and it's fine. I did notice a lot of errors connecting to servers, 4 attempts to get a work unit. I think it tries different servers to see what's available? Perhaps some problems with one of the servers?
hojomojo
Posts: 2
Joined: Sat Jul 01, 2023 7:27 pm

Re: GPUs mixed up

Post by hojomojo »

Resolved!
Laying in bed, I remembered that when I approached my machine that evening, just before this issue started, it was shut down. I don't know why it was shut down, perhaps I got an update? Anyhow, when I brought it back up, that's when this issue started. So I took the problem device out of the mix, rebooted, then put it back in. And shazam! Problem solved. I hope this posting helps someone who might also have this issue! Good night folks!
Peter_Hucker
Posts: 308
Joined: Wed Feb 16, 2022 1:18 am

Re: GPUs mixed up

Post by Peter_Hucker »

I've seen Folding get mixed up with my machine with 6 GPUs after a reboot. They go in the wrong order and some work units refuse to go on a different one, I guess it populated them in a different order and other work units were ok with a different card. In this case it downloads more work and the fussy ones sit and wait for the right one to become free, it sorts itself.

I've also seen Folding (and Boinc) not see a card (maybe that's what yours did?) because of some change during a reboot which caused a driver to be reinstalled or updated. The driver wasn't finished installing when Folding started, so the card isn't there, or has no driver and goes wrong. In this case rebooting fixes it (or I guess I could just restart Folding).
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: GPUs mixed up

Post by toTOW »

hojomojo wrote: Fri Jul 07, 2023 5:15 am Resolved!
Laying in bed, I remembered that when I approached my machine that evening, just before this issue started, it was shut down. I don't know why it was shut down, perhaps I got an update? Anyhow, when I brought it back up, that's when this issue started. So I took the problem device out of the mix, rebooted, then put it back in. And shazam! Problem solved. I hope this posting helps someone who might also have this issue! Good night folks!
Automatic kernel updates tend to break NV drivers and FAH ... depending on how you installed the drivers, a reboot might be enough, or a reboot and a drivers reinstallation (this is true for the drivers installed with the .run package from NV website for instance).
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply