Project 13421

Moderators: Site Moderators, FAHC Science Team

Re: Project 13421

Postby bruce » Sun Aug 16, 2020 6:18 am

BobWilliams757 wrote:It was a really smart move for overall project throughput to segregate and restrict the various project numbers to GPU's that could do them efficiently. Since it seems like many smaller atom count projects are popping up, it just made sense.

As a comparison, my little Vega 11 onboard has probably averaged that 155k (give or take a couple K) through the project 13421 runs.


This is only the beginning. Rather than just 2 groups, there will be many, many more divisions and further accuracy in matching the WU up with the best GPU. It's a massive goal, so it will take time to achieve it, but things are already getting better.

The first moonshot sprint has been completed and sprint 2 starts tomorrow. In addtion to learning about the anti-virus capabilities of various compounds, the projects will be gathering more data about which WUs match up with which GPUs.
bruce
 
Posts: 20009
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Project 13421

Postby ThWuensche » Sun Aug 16, 2020 6:44 am

bruce wrote:The first moonshot sprint has been completed and sprint 2 starts tomorrow. In addtion to learning about the anti-virus capabilities of various compounds, the projects will be gathering more data about which WUs match up with which GPUs.

So let's hope this will run better on AMD GPUs. My experience is not with 13421 WUs as in this thread, but with 13420 WUs and there on one of my systems completed 26 successfully, while 671 were returned as faulty. That leads to the effect, that after a number of failed WUs the client stops to request new WUs at all and continues only after manually pausing and restart, further cutting down the efficiency. Those WUs that completed were all in the higher run numbers, the lowest successful run number being 4779.
ThWuensche
 
Posts: 73
Joined: Fri May 29, 2020 5:10 pm

Re: Project 13421

Postby mgetz » Mon Aug 17, 2020 1:08 pm

I've been keeping an eye on the NaN issue on my GPUs because I've noticed it a lot with these WUS (13418-1342x). Honestly for NVidia cards it seems to have to do more with driver uptime than anything else in my experience. For my windows machine (shut down completely nightly) NaN failures are near unheard of (driver 451.67 via Gforce experience, TU104 RTX2080). I've seen maybe one or two a month. For my linux box (440.1 Tested from ubuntu repos on 20.04, TU104 RTX2070 Super) I find the longest I can go is about 5 days without rebooting or I start to see a lot more failures. I solve this by just pausing WUs and rebooting manually if uptime hits 5 days and the updater doesn't require a reboot in between. So while I think the WUs are more likely to cause the issue, the issue ultimately probably involves something weird in the NVidia driver at least for my GPUs. Could be an ECC issue as I don't have ECC ram and I am at altitude but I have no way to test that.
Image
mgetz
 
Posts: 16
Joined: Tue Aug 11, 2020 7:23 pm

Re: Project 13421

Postby BobWilliams757 » Mon Aug 17, 2020 11:59 pm

bruce wrote:
BobWilliams757 wrote:It was a really smart move for overall project throughput to segregate and restrict the various project numbers to GPU's that could do them efficiently. Since it seems like many smaller atom count projects are popping up, it just made sense.

As a comparison, my little Vega 11 onboard has probably averaged that 155k (give or take a couple K) through the project 13421 runs.


This is only the beginning. Rather than just 2 groups, there will be many, many more divisions and further accuracy in matching the WU up with the best GPU. It's a massive goal, so it will take time to achieve it, but things are already getting better.

The first moonshot sprint has been completed and sprint 2 starts tomorrow. In addtion to learning about the anti-virus capabilities of various compounds, the projects will be gathering more data about which WUs match up with which GPUs.


Just knowing that the power GPU's aren't bogged down as often with the small WU's that my rig does reasonably quick means that the big resources are churning out those big WU's. I'm sure it will take a lot of time, but this entire COVID cycle could leave FAH in much better shape, with efficiency improving on many fronts.
BobWilliams757
 
Posts: 114
Joined: Fri Apr 03, 2020 3:22 pm

Previous

Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 2 guests

cron