Project 13421

Moderators: Site Moderators, FAHC Science Team

bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 13421

Post by bruce »

BobWilliams757 wrote:It was a really smart move for overall project throughput to segregate and restrict the various project numbers to GPU's that could do them efficiently. Since it seems like many smaller atom count projects are popping up, it just made sense.

As a comparison, my little Vega 11 onboard has probably averaged that 155k (give or take a couple K) through the project 13421 runs.
This is only the beginning. Rather than just 2 groups, there will be many, many more divisions and further accuracy in matching the WU up with the best GPU. It's a massive goal, so it will take time to achieve it, but things are already getting better.

The first moonshot sprint has been completed and sprint 2 starts tomorrow. In addtion to learning about the anti-virus capabilities of various compounds, the projects will be gathering more data about which WUs match up with which GPUs.
ThWuensche
Posts: 80
Joined: Fri May 29, 2020 4:10 pm

Re: Project 13421

Post by ThWuensche »

bruce wrote: The first moonshot sprint has been completed and sprint 2 starts tomorrow. In addtion to learning about the anti-virus capabilities of various compounds, the projects will be gathering more data about which WUs match up with which GPUs.
So let's hope this will run better on AMD GPUs. My experience is not with 13421 WUs as in this thread, but with 13420 WUs and there on one of my systems completed 26 successfully, while 671 were returned as faulty. That leads to the effect, that after a number of failed WUs the client stops to request new WUs at all and continues only after manually pausing and restart, further cutting down the efficiency. Those WUs that completed were all in the higher run numbers, the lowest successful run number being 4779.
mgetz
Posts: 57
Joined: Tue Aug 11, 2020 6:23 pm

Re: Project 13421

Post by mgetz »

I've been keeping an eye on the NaN issue on my GPUs because I've noticed it a lot with these WUS (13418-1342x). Honestly for NVidia cards it seems to have to do more with driver uptime than anything else in my experience. For my windows machine (shut down completely nightly) NaN failures are near unheard of (driver 451.67 via Gforce experience, TU104 RTX2080). I've seen maybe one or two a month. For my linux box (440.1 Tested from ubuntu repos on 20.04, TU104 RTX2070 Super) I find the longest I can go is about 5 days without rebooting or I start to see a lot more failures. I solve this by just pausing WUs and rebooting manually if uptime hits 5 days and the updater doesn't require a reboot in between. So while I think the WUs are more likely to cause the issue, the issue ultimately probably involves something weird in the NVidia driver at least for my GPUs. Could be an ECC issue as I don't have ECC ram and I am at altitude but I have no way to test that.
Image
BobWilliams757
Posts: 497
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Project 13421

Post by BobWilliams757 »

bruce wrote:
BobWilliams757 wrote:It was a really smart move for overall project throughput to segregate and restrict the various project numbers to GPU's that could do them efficiently. Since it seems like many smaller atom count projects are popping up, it just made sense.

As a comparison, my little Vega 11 onboard has probably averaged that 155k (give or take a couple K) through the project 13421 runs.
This is only the beginning. Rather than just 2 groups, there will be many, many more divisions and further accuracy in matching the WU up with the best GPU. It's a massive goal, so it will take time to achieve it, but things are already getting better.

The first moonshot sprint has been completed and sprint 2 starts tomorrow. In addtion to learning about the anti-virus capabilities of various compounds, the projects will be gathering more data about which WUs match up with which GPUs.
Just knowing that the power GPU's aren't bogged down as often with the small WU's that my rig does reasonably quick means that the big resources are churning out those big WU's. I'm sure it will take a lot of time, but this entire COVID cycle could leave FAH in much better shape, with efficiency improving on many fronts.
Fold them if you get them!
Post Reply