Project 18251 very low PPD on RTX 2060s

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
appepi
Posts: 43
Joined: Wed Mar 18, 2020 2:55 pm
Hardware configuration: HP Z600 (5) HP Z800 (3) HP Z440 (3)
ASUS Turbo GTX 1060, 1070, 1080, RTX 2060 (3)
Dell GTX 1080
Location: Sydney Australia

Project 18251 very low PPD on RTX 2060s

Post by appepi »

I am not fussed about PPD in general, but lately my three RTX 2060s are getting Project 18251 jobs that run for around 10-11 hours and are showing around 500K PPD performance against the usual 1.5-2M PPD for these devices and their settings, on other projects. Have just updated to latest NVIDIA studio drivers with no obvious difference. LARS shows an "average" PPD of 2M or so for 2060s on this project, but right now I have Z441 at 505K PPD on its 9th hour, of Project (18251,41,0,53) and next to it Z442 is at 2.2M PPD on a different project. This is typical. If those running Project 18251 are poverty-stricken researchers who can't afford the usual rate, that's fine, since points aren't really worth anything anyway. But maybe there is something odd about the combination - like, I note the absence of high-end GPUs working on this project in the LARs list, and maybe it's not suitable for us low-end donors?

PS: By accident, after writing the above I started a Z800 with a Dell GTX1080 in it, and it picked up a Project 18251 job (206,3,25)and it is running at 1.5 MPPD. This is just a bit better than LAR's average for that Project and a 1080. Both my GTX 10xx and RTX 2060 GPUs are using driver 566.14. So why is it different for 2060s?
Last edited by appepi on Thu Nov 21, 2024 3:21 pm, edited 1 time in total.
Image
BobWilliams757
Posts: 524
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 18251 very low PPD on RTX 2060s

Post by BobWilliams757 »

Wow.... I'm not the only one!

From what I understand in Discord, it's a strange project that didn't scale well with larger GPU's, thus the assignment. But I've run into the same issue on my 1660 Super, but only on later projects. My earlier runs gave full points, in my case 1.2-1.4M if I recall correctly. The researcher did some runs and could find no errors on the later ones that ran slow for me.

In my case I did find one stick of memory testing bad. So I yanked it out and am currently running in single channel mode waiting for memory to arrive. But I picked up another 18251.... and it ran slow again. Single channel memory might cause a slight slowdown, but not half usual points. So I'm still at a loss as to what triggered it exactly.

If I figure anything out I'll let you know. I will probably also pass this on to the researcher. He was very helpful in looking into the issue when I first reported it on Discord, and he might want your PRCG info. Also, if you use HFM or have old logs, take a look and see if all your runs of this project ran at the current speed. Mine were fine until a certain date, then started running half speed. Assuming it was the later found hardware issue, I shrugged it off.

If it makes you feel better I think some runs on my 1660 Super were almost 14 hours. :shock: But they stayed stable and completed.
Fold them if you get them!
appepi
Posts: 43
Joined: Wed Mar 18, 2020 2:55 pm
Hardware configuration: HP Z600 (5) HP Z800 (3) HP Z440 (3)
ASUS Turbo GTX 1060, 1070, 1080, RTX 2060 (3)
Dell GTX 1080
Location: Sydney Australia

Re: Project 18251 very low PPD on RTX 2060s

Post by appepi »

Hmmm ... Very helpful to know it's not just me. And just to put the barista-grade fern on top, I had two of these low-PPD 18251 jobs running when I went to shut down the devices this morning. Ordinarily I try to limit folding to 10pm - 7am local time, (9 hours/day) when electricity is at the cheapest "off peak" rate. Lately I have been letting them start 2 hours earlier and use 8pm-10pm which is "shoulder" rates or 22% more per kWh. I set the LAR systems timer to finish at 7am, and by the time I wake up (being retired, this is at a more civilized hour) the jobs are usually done. If not, I let them finish up and stop at "shoulder" rates as long as they will end before 2pm, when we hit peak rates at 119% more. It is so rare for a job to need to run that long after 7 am that I noticed the repeated occurrence of these long-running Project 18251 ones quickly. One of the Project 18251 jobs will end not long after 2pm, so it is using 7 hours extra at "shoulder" rates, and costing me an extra 8.54 off-peak hours beyond what I aim at donating to Folding. I let it run. The other was planning to gobble up several of the expensive "peak" hours as well, so I paused it and shut the device down. It can start again at 8pm and will yield even fewer points, and slow down the research. This is not a good outcome from anyone's point of view, but there is no mechanism for preventing a device taking on a job whose ETA is beyond a designated limit.

I also had a closer look at the LAR systems "Project PPD by GPU" listing for Project 18251. I don't know how to insert a local images here, but at https://folding.lar.systems/projects/fo ... site_links ypu can see that there are two very different rankings for a 2060, namely #6 with 2.3M PPD and 0.985 MPts for 10hr 54min average work, and #22 with 0.7M PPD and 0.32 Mpts for 11hrs 8 min average work - the latter being the sort of performance I am getting, the former being what I would expect. Anyway, maybe these diverse averages reflect a change in the project at some point, or else an unknown difference between sub-populations of 2060s. I also note that a 1660 (rank #10) is supposedly averaging 1.7 MPPD and 0.12 Mpts for 2hr 38 min work. Yes well, maybe this curious interaction is the cyber-equivalent of the penicillin mould in the Petrie dish?....
Image
BobWilliams757
Posts: 524
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 18251 very low PPD on RTX 2060s

Post by BobWilliams757 »

Hard to say with the LAR rankings, there are a lot of differing models on many of these GPU's. On top of that, they get bad data at times from people either crashing/pausing/running lower power limits. And it looks like the 1660 data on this project is just flat wrong, I doubt I could hit that with an extreme overclock.

There is one other user on Discord now reporting slow 18251 projects on AMD. I pinged the researcher to see if he wants any info, but being a holiday week it might be a while before he even notices it.
Fold them if you get them!
appepi
Posts: 43
Joined: Wed Mar 18, 2020 2:55 pm
Hardware configuration: HP Z600 (5) HP Z800 (3) HP Z440 (3)
ASUS Turbo GTX 1060, 1070, 1080, RTX 2060 (3)
Dell GTX 1080
Location: Sydney Australia

Re: Project 18251 very low PPD on RTX 2060s

Post by appepi »

As suggested I checked old logs, though the earliest available were from the start of November, and for those 2060s the PPD for Project 18251 were all very similar and about one-third of what the same GPU does in general. So its certainly consistent at present.

I did a test run on the weekend, when there's only off peak and Shoulder rates for electricity - ie no peak rates - and I set the four 10xx devices (1060 6GB,1070 8GB, 2x1080) ) to grab Alzheimer's disease jobs if they could. I only picked up 3 AD18251 observations with them . The 1080's behaved as usual, and as per 1080s on LAR. No problem with them- 2x the PPD of the 2060s. The 1070 in Z602 performed at about half or less of its usual rate. I ran the Unigine Superposition 1080P extreme benchmark on the 1070 with the "Folding" settings and it produces much the same result as 2 years ago (3044 vs 2964). On Geekbench "compute/OpenCL) settings the performance is also much the same. Neither of these is necessarily predictive of FaH performance but the card is the same one that behaves normally on other projects and is an ordinary 1070 on other benchmarks, with and without the temperature controls I use for folding. It has processed 1,411 work units. The 2060s have processed 3484 WUs (Z441), 3313 WU (Z442) and 3221 WUs (Z443,). So I don't think it is a fault in the hardware except in interaction with whatever 18251 does with it. And why would the researcher suppose that all is well with the science of these computations when "the same" software behaves so differently on "the same" hardware?
Image
BobWilliams757
Posts: 524
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 18251 very low PPD on RTX 2060s

Post by BobWilliams757 »

Interesting. So to clarify, now it's both the 2060's and the 1070 that have given higher tpf/lower speeds?

Similar to you, I tried all kinds of things to figure out if it could be on my end. Since I did find some bad memory, I figured that was it. But in single channel mode it still ran slow. Now I've got new memory in the box, so when I pick one up I'll find out if that changed anything.

As for the researcher, he did investigate and ran a bunch of units on a machine trying to figure it out. Maybe my memory replacement will confirm or deny that as a cause, but I have faith that they do the best they can with all the projects. So if more run slowly, I'll just fold them in hopes it helps the science behind all of this figure out why they act that way.
Fold them if you get them!
appepi
Posts: 43
Joined: Wed Mar 18, 2020 2:55 pm
Hardware configuration: HP Z600 (5) HP Z800 (3) HP Z440 (3)
ASUS Turbo GTX 1060, 1070, 1080, RTX 2060 (3)
Dell GTX 1080
Location: Sydney Australia

Re: Project 18251 very low PPD on RTX 2060s

Post by appepi »

Fair enough. This morning an 18251 run had just started on Z441 when I checked at 7am, so I paused it and switched off and it will start again at 8pm - a delay of 13+ hours but since the points don't matter to me and the $$ cost of electricity does, that's the solution for now. I noted on the project description website that they need more memory - I assume CPU RAM - than usual and thus limit access to this project to folk with 8MiB or more. Two of my Z440s with the 2060s have 4x8GB RAM (4 channel) and one has only a single 16GB stick ( 1 channel), but they all have the same problem so RAM or its configuration does not seem a likely source of the low PPD for this particular project only. The 1080s that perform with normal PPD's on 18251 were running with very different configurations: (Z803 2x Xeon X5675 48GB RAM all channels/Dell 1080)) versus (Z805/1 x Xeon 5620 16GB RAM 1 channel/ASUS 1080). So a single channel of RAM is unlikely to be the problem. Z602 with the 1070 8GB has 2x Xeon 5660/RAM all channels 3x8GB each CPU).

Rev 1: Checked and corrected RAM details.

Rev 2: I had Z441 and Z442 side by side, the former running the 18251 job I shut down this morning, which wants to run for another 10 hours to yield <500K PPD; the other running another project at >2M PPD. The former is a running FahCore 24, the latter Fahcore 23. The logs say they know there's 32GB of RAM around, most of it available. OK, so I fired up the HP Performance Monitor and looked at memory use, wondering if maybe the 18251 app is using paged/virtual memory when it doesn't need to. I would also expect this to show up in writes to the Samsung 970 EVO NVME boot drive, but this level of geekery is well above my pay grade. I also ran the workstation monitor on the Fah processes. Both 2060s are flat out, the CPUs aren't, the total physical memory use is only around 13GB of the 32GB available, and there's not much writing to the boot disk going on. The only obvious difference is that that when these devices run an 18251 job their output shrinks to a faction of what they usually do, and to achieve it wears out a 2060 for 11 hours when the same work could be done by another device with much less wear and tear and electricity use.

Rev3: As of 29 Nov I have set the 2060s to prefer Cancer jobs, which is the best I can do to avoid running 18251s when other devices can do 18251 better and the 2060s seem to be fine for all other projects.

Rev4: Corrected RAM of Z803 from 96GB to 48GB (had forgotten that I took out the other 6x8GB sticks.
Last edited by appepi on Tue Dec 10, 2024 2:52 pm, edited 1 time in total.
Image
BobWilliams757
Posts: 524
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 18251 very low PPD on RTX 2060s

Post by BobWilliams757 »

I haven't caught the project starting since replacing my memory, but when on a single 8GB stick there is no doubt it was swapping, and the memory used was very unusual. It would go up, down, up some more, max out, then settle finally. On my modest CPU it also takes these projects quite a while to load, that that was with both full speed and slower projects.

I really can't think of anything else hardware related to try. I did 14+ hours of Memtest86 without a single error on the new memory, and have tried Prime95 and a few others to stress it as well. So it's NOT what I was hoping it might be.

I've been playing with anything I can think of when the project is running, and it doesn't seem to change it much. When testing and monitoring it I did also notice that once loaded it really doesn't use much GPU memory at all. Surprising given the number of atoms.
Fold them if you get them!
appepi
Posts: 43
Joined: Wed Mar 18, 2020 2:55 pm
Hardware configuration: HP Z600 (5) HP Z800 (3) HP Z440 (3)
ASUS Turbo GTX 1060, 1070, 1080, RTX 2060 (3)
Dell GTX 1080
Location: Sydney Australia

Re: Project 18251 very low PPD on RTX 2060s

Post by appepi »

Having set the three Z440/2060 systems to prefer cancer jobs it seems I kept them from wasting everyone's time and electricity on 18251 jobs that other devices could do more efficiently, and they are back to producing their usual PPD.

Meantime, I noted in the project description that the protein in 18251 flops about more than usual so they need more "waters" for it and that uses more memory and so they limit it to to running on machines with "minimum system memory requirement of 8000 MiB".

I decided to explore this on the device Z805 with 2x8GB of RAM and a GTX1080, which previously had behaved normally on 18251 jobs, and sent it hunting for Alzheimer's jobs. When it got one, and was producing 1.4M PPD or thereabouts as expected, with ETA about 6.5 hours away. I fired up HP's Performance Adviser and looked at RAM usage overall and for the Fahcore_24 process. All seemed well, with overall usage rarely topping 8GB of physical memory and FahCore_24 not using all of it. I set the Workstation Monitor feature to record performance and let it run.

I note that one might clearly exceed a single 8GB stick if enough other things are using RAM.

But next time I looked the PPD had dropped to half and ETA was 15 hours away. The Workstation Monitor log had been set to record for 10 hours, so I expected to see something that might indicate why the change occurred, and decided to let it continue. Next day, Z805 was dark, whereas it should just have applied the LAR Systems Finish and Stop and be paused. When I tried to fire up Z805 I got a 5-beep code, which means memory needs re-seating or replacing or (worst case) a system board problem. I will skip over the next few hours of repair, and simply note that Z805 is running again with a single 4GB stick. It had completed the 18251 job OK. The Workstation Monitor log ran from ~8.24 PM on 30 Nov to 6:35am on 1 Dec, and there was another log that ran for about 2 hours afterwards. The summary of the first log shows utilisation as follows:

Xeon 5620 (4C,8T) 74% avg, 100% max
Physical Memory (16GB) 52% avg 55% max
GTX 1080 87% avg, 100% max
Samsung 860EVO system disk (500GB) 3% Avg, 78% Max, 350 reads (0.01GB), 15839 writes (0.39GB)
[6x4TB drives in a 12TB RAID10 array -zero reads and writes)

Process Watch on FahCore_24: CPU Use :12.9% avg.50.0% max; Maximum Physical memory 36.2%, Maximum Virtual Memory 53.7%

The monitor software noted no problem with memory and simply suggested I needed a more powerful CPU and GPU because of the high utilisation. I assume the 390MB written to the SSD was temporary working saves that did not need to be read for a successfully-completed job, and even though the SSD is on SATA2, it still writes at about 260MB/sec so 0.39GB wouldn't take 6 hours! So it is still mysterious. Unfortunately Z805 wasn't an ideal test case. It had a hard life before I rescued it and assigned it to light duties, and even these have included 1,521 FaH WU's with a 99.8% completion rate. So I wouldn't necessarily blame 18251 for the memory loss, which in any case seems to have occurred after the job was completed, but just in case I've left it with the 4TB so it will be safe from 18251s in future.
Image
BobWilliams757
Posts: 524
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 18251 very low PPD on RTX 2060s

Post by BobWilliams757 »

Strange that one slowed down part way through a work unit.

I've also noticed that for this project anything above about a 70% power limit on my GPU actually slows it down a little. But going any lower doesn't help either... I just have to wait them out.


Thankfully I folded on an iGPU for a couple of years, so I'm used to long runs for few points. :mrgreen: It really doesn't bother me that it returns low points, at this point it's more a curiosity as to why it is happening. And I don't think I can think of anything new to rule out hardware stability.
Fold them if you get them!
Nicolas_orleans
Posts: 119
Joined: Wed Aug 08, 2012 3:08 am

Re: Project 18251 very low PPD on RTX 2060s

Post by Nicolas_orleans »

Hello

It folds well with my two rigs, but I remember it was a very special project based on the researcher's comments in the project description. Did not see something like this before. Here is the researcher's comment:
N.B. because tau is an intrinsically disordered protein, it can fully unfold and refold quite rapidly. To ensure the protein remains in water the entire simulation, we have included a large number of waters in the system. As a result these simulations are a good deal more RAM intensive than prior FAH simulations. Accordingly, we have implemented a minimum system memory requirement of 8000 MiB to run these simulations.
from: https://stats.foldingathome.org/project/18251

So maybe a stupid question but do you have enough free RAM on your rig ?

Best regards

Nicolas
MSI Z77A-GD55 - Core i5-3550 - EVGA GTX 980 Ti Hybrid @ 1366 MHz - Ubuntu 24.04 - 6.8 kernel
MSI MPG B550 - Ryzen 5 5600X - PNY RTX 4080 Super @ 2715 MHz - Ubuntu 24.04 - 6.8 kernel
appepi
Posts: 43
Joined: Wed Mar 18, 2020 2:55 pm
Hardware configuration: HP Z600 (5) HP Z800 (3) HP Z440 (3)
ASUS Turbo GTX 1060, 1070, 1080, RTX 2060 (3)
Dell GTX 1080
Location: Sydney Australia

Re: Project 18251 very low PPD on RTX 2060s

Post by appepi »

Thanks Nicholas - as you see above we have explored RAM issues to some extent and there is plenty of free RAM on my three Z440s with RTX2060s that perform badly on 18251 (2 with 32GB and 1 with 16GB). These devices have been doing little other than folding ~9 hrs/day for a couple of years, and the other concurrent processes use about 7GB. They are running Fahcore_22 jobs at the moment, which require around 0.4GB, and returning around 2.1 - 2.3 MPPD, which is the norm for them. The test with Z805 above (16GB RAM) and GTX 1080 was simply confusing. In response to your suggestion however, I have fired up Z803 (HP Z800, Dell GTX 1080, 48GB) and sent it after Alzheimer's Disease jobs. It usually produces about 1.0-1.4 MPPD, so I will see what happens and revise this post when I have results.

Rev 1: Corrected 1825 to 18251 in 2nd line.
News1: Z803 did not pick up any 18251s despite preferring Alzheimer's Disease jobs, using only a small fraction of its 48GB RAM and collecting around 1.0 - 1.5 MPPD under temperature restrictions to 70 Deg C. This took it to Donor Rank #17,660 with 240M Points and 1557 WUs @99.35% completion rate. I have set it running again. I note that this "normal" performance has been with FahCore_22 and _23 jobs, whereas the 18251 jobs were Fahcore_24 if I remember rightly. In the LAR Systems list for project 18251, a 1080 averages 1.4M PPD, so we'll see.

News2: When checking if it were Fahcore_24, Google found a related post that had received no replies:
Z3R0 wrote: Wed Sep 18, 2024 1:00 pm There is something strange about the way core24 handles memory with project 18251. Only 1.3gb of vram is being used (out of 16 so there is plenty available) and at the same time the Fahcore_24.exe uses over 4gb of system ram. And on the GPU stats I see that the copy engine is working like crazy, my GPU is barely warming up (at 50c) so likely the Cuda cores are waiting for data most of the time.

So the question is: why does core24 use so much time copying data to & from system memory when it could be much more efficient if the data was stored in vram? This might not be the case but at least in my opinion it is what my analysis points toward...

Any other project runs just fine, the gpu copy engine sits idle and gpu temps closer to 70c.
News3: While reading around, I found a post saying that ancient devices like 2060s should be replaced with much more efficient 4060s. This led me to check out 4060 performance on 18251 jobs at LARS, at https://folding.lar.systems/projects/fo ... file/18251. There is no 4060 data, but I noted that the difference in performance ranking of 2060s that I mentioned in my original post seems to a "sub-model" variation. That is, TU104-based 2060s average 2.4M PPD, whereas TU106-based 2060s average 0.7M PPD. Mine are TU106's and would have contributed to that average so it isn't an entirely independent observation. Apparently a TU104 2060 has a 2070/2080 chip that was defective but OK to be re-purposed for 2060 work, and in some tasks may perform better than your regular TU106 2060. But a >3x performance seems a bit high!
Image
Post Reply