WU's Not Being Assigned by 171.67.108.102/171.67.108.105/?

Moderators: Site Moderators, FAHC Science Team

Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Aurum »

We should be able to specify a Failover work server list.
In Science We Trust Image
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Aurum »

Adam, You can try using a firewall rule to block the work server. Foldy posted for Windows and another for Linux. Sometimes toggling Pause to Fold for the idle GPU in Advanced Control gets a WU DLed. It's real hit or miss either way.
In Science We Trust Image
Sn0wy23
Posts: 16
Joined: Mon Dec 12, 2016 1:10 pm

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Sn0wy23 »

Have had this issue for a day or so now :roll:

Thanks to Foldy's IP block in Windows I am back running both machines.

I had tried reboot, pausing and restarting, clearing cache etc so the only fix is to direct the client away from the offending IP :twisted: until it is back running again. Cheers Foldy!
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by rwh202 »

Is the firewall block really confirmed to work?
I've applied the rule in Linux Mint and still get assigned to the offending server 9 out of 10 times. The only solution I've found is continual pausing and un-pausing to reset the throttling delay in requesting assignments, but getting tedious on 10 rigs that need new WUs every hour or so.
How hard is it to pull the plug on the server and stop assignments to it?
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Aurum »

How hard is it :?: :?: :?:
Must be excruciating because it's been days.

I agree, the IP blocking does not seem to work. The AS assigns me to 171.67.108.105 every single time. Occasionally it reassigns me to another WS after a while.
Rig by rig, as they go idle, I'm moving them to crunch another project.
In Science We Trust Image
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Aurum »

Server status do not make sense. I just caught a WU by getting reassigned to 171.67.108.159 but it says WUs Avail = 0 and WUs To Go = 0.
http://fah-web.stanford.edu/pybeta/serverstat.html
In Science We Trust Image
Joe_H
Site Admin
Posts: 7857
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Joe_H »

Aurum wrote:Server status do not make sense. I just caught a WU by getting reassigned to 171.67.108.159 but it says WUs Avail = 0 and WUs To Go = 0.
http://fah-web.stanford.edu/pybeta/serverstat.html
Those fields usually do have zeros on currently active servers. Changes in the work server code since the fields were defined for serverstat has made those fields useless for telling how many WU's a particular WS has available. It has been that way for years. About the only lines for WS's that have numbers showing in those fields are for ones that are inactive.

One column that still holds meaningful information about the numbers of WU's is WUs Rcv. That shows the number of WU's collected since the last update sent to the stats server. The script that collects the logs to update the stats runs once an hour, the number in the WUs Rcv column will climb as WU's are returned until the stats are collected.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Aurum »

Then please rewrite Troubleshooting Server Connectivity Issues (Do This First) and stop telling us to do useless things.
In Science We Trust Image
Joe_H
Site Admin
Posts: 7857
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Joe_H »

Where in that topic does it tell you to check those columns? The columns are only mentioned as having information that might be informative to an expert user.

The actual troubleshooting steps do not include any checking of information in those columns. It does mention checking to see if a particular server is up, and how to do so.

As for rewriting those topics, they are on a long list of material that needs to be updated.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
SombraGuerrero
Posts: 118
Joined: Mon Mar 16, 2009 3:06 am

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by SombraGuerrero »

I haven't experimented with the firewall solution in a Linux environment, but I can say from a Windows perspective that it is doing what I would expect. It's blocking the work server which does eventually force the logic to pick a different one. The real reason it doesn't appear to be/actually isn't particularly effective is that there's really no way to circumvent the behavior of the assignment servers picking offending work servers. I think it's probably that logic, not the work server logic, that would need to change to make the unhappy path stuff more fluid, and I imagine you'd have to change the whole pool, so I think it might be a bigger effort than it may seem. Looking back on previous threads in this forum, I have to cut the people who maintain these servers slack. They're no different than any other type of server, really. They can fail for any of the same reasons that any server or computer can. We're all very passionate about folding, and that's awesome, but let's not forget, we're dealing with people at the end of the day -- people who work for an academic institution that can at times have a tremendous amount of red tape around getting operational things done.
SteveWillis
Posts: 409
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by SteveWillis »

rwh202 wrote:Is the firewall block really confirmed to work?
I've applied the rule in Linux Mint and still get assigned to the offending server 9 out of 10 times. The only solution I've found is continual pausing and un-pausing to reset the throttling delay in requesting assignments, but getting tedious on 10 rigs that need new WUs every hour or so.
How hard is it to pull the plug on the server and stop assignments to it?
Oh it works all right.
first are you sure the firewall is enabled? By default is not. I went back and added that to my earlier post but
to check
sudo ufw status
and to enable
sudo ufw enable

Also be aware that it will first try to assign to 105 but you'll get a connection error then it will go on to 102. Sometimes it has to go through this cycle several times before you get an assignment. It is going to take longer than what you are used to.

Here is a script I wrote to automatically pause and unpause when it appears to be hung up. It loops every 15 minutes. I modified it this morning and it hasn't needed to do it's thing yet so not thoroughly tested. Use at your own risk.

Code: Select all

#!/bin/bash
cd /var/lib/fahclient

while true
do
egrep -i "Connected|assign|refused|Upload|Download" log.txt|tail -1|egrep "refused|assign"
results=$?
echo "$(date)    results = $results"
if [ $results = 0 ]
then 

echo "PAUSED *******  $(date) "
echo -e "pause 0\nquit" | nc localhost 36330 &> /dev/null
echo -e "pause 1\nquit" | nc localhost 36330 &> /dev/null
echo -e "pause 2\nquit" | nc localhost 36330 &> /dev/null
echo -e "pause 3\nquit" | nc localhost 36330 &> /dev/null
sleep 10
echo -e "unpause 0\nquit" | nc localhost 36330 &> /dev/null
echo -e "unpause 1\nquit" | nc localhost 36330 &> /dev/null
echo -e "unpause 2\nquit" | nc localhost 36330 &> /dev/null
echo -e "unpause 3\nquit" | nc localhost 36330 &> /dev/null
fi
sleep 900
done


Image

1080 and 1080TI GPUs on Linux Mint
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by rwh202 »

SteveWillis wrote:
rwh202 wrote:Is the firewall block really confirmed to work?
I've applied the rule in Linux Mint and still get assigned to the offending server 9 out of 10 times. The only solution I've found is continual pausing and un-pausing to reset the throttling delay in requesting assignments, but getting tedious on 10 rigs that need new WUs every hour or so.
How hard is it to pull the plug on the server and stop assignments to it?
Oh it works all right.
first are you sure the firewall is enabled? By default is not. I went back and added that to my earlier post but
to check
sudo ufw status
and to enable
sudo ufw enable

Also be aware that it will first try to assign to 105 but you'll get a connection error then it will go on to 102. Sometimes it has to go through this cycle several times before you get an assignment. It is going to take longer than what you are used to.
Yeah, I enabled the firewall, but I see the same behaviour regardless whether I block the offending server in the firewall or not - it still fails and goes through the loop of usually getting reassigned to the same one, failing etc. etc.

Thanks for the pause / unpause script though. I'll brush up on my grep and see how I can run it on a per slot basis and just pause the stalled one.
SteveWillis
Posts: 409
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by SteveWillis »

Mine has been running all day without missing a beat and the script hasn't triggered the pause/unpause even once. I'm going to show you my firewall settings. I messed around with them some and maybe it will be some help.

Code: Select all

Status: active

To                         Action      From
--                         ------      ----
Anywhere                   REJECT      171.67.108.105            
Anywhere                   ALLOW       171.67.108.102            

Anywhere                   REJECT OUT  171.67.108.105            
Anywhere                   ALLOW OUT   171.67.108.102            
171.67.108.102             ALLOW OUT   Anywhere                  
171.67.108.105             REJECT OUT  Anywhere                  
Image

1080 and 1080TI GPUs on Linux Mint
boristsybin
Posts: 50
Joined: Mon Jan 16, 2017 11:40 am
Hardware configuration: 4x1080Ti + 2x1050Ti
Location: Russia, Moscow

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by boristsybin »

still no comments from support team?
Image
Joe_H
Site Admin
Posts: 7857
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Joe_H »

I have heard back that it is being looked into, but nothing further to post. The first reports came in on a Friday evening and reported to PG on Saturday morning. This is a relatively major holiday weekend, so limited staff would be available to work on this.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply