Sporadic issues with 171.67.108.xx servers

Moderators: Site Moderators, FAHC Science Team

ComputerGenie
Posts: 236
Joined: Mon Dec 12, 2016 4:06 am

Sporadic issues with 171.67.108.xx servers

Post by ComputerGenie »

New thread because I keep getting merged with issues that are "fixed" (and threats about "we do have ways we can deal with that type of activity" if I post on any other 171.67.108.xx thread, because I'm "trying to hijack the topic").

Everyone says I need to provide a log, I ask from which system and get no answer, so we'll randomly start with the Win 7 rig (because it's got the lowest IP on my LAN - I guess that's as good a reason as any to pick 1 random rig out of multiples to provide a log for) and go from there (yes, the verbosity is set to 5 because there's no obvious indicators at 3 or 4).

Code: Select all

*********************** Log Started 2017-06-12T11:36:03Z ***********************
11:36:03:************************* Folding@home Client *************************
11:36:03:        Website: http://folding.stanford.edu/
11:36:03:      Copyright: (c) 2009-2016 Stanford University
11:36:03:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
11:36:03:           Args: 
11:36:03:         Config: C:\Users\ComputerGenie\AppData\Roaming\FAHClient\config.xml
11:36:03:******************************** Build ********************************
11:36:03:        Version: 7.4.16
11:36:03:           Date: Jan 6 2017
11:36:03:           Time: 00:25:14
11:36:03:     Repository: Git
11:36:03:       Revision: a9e9e27dc2ee6ff01398c439677bc27f6cb74032
11:36:03:         Branch: master
11:36:03:       Compiler: Visual C++ 2008
11:36:03:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox -arch:SSE /MT
11:36:03:       Platform: win32 10
11:36:03:           Bits: 32
11:36:03:           Mode: Release
11:36:03:******************************* System ********************************
11:36:03:            CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
11:36:03:         CPU ID: GenuineIntel Family 6 Model 94 Stepping 3
11:36:03:           CPUs: 8
11:36:03:         Memory: 63.94GiB
11:36:03:    Free Memory: 59.71GiB
11:36:03:        Threads: WINDOWS_THREADS
11:36:03:     OS Version: 6.1
11:36:03:    Has Battery: false
11:36:03:     On Battery: false
11:36:03:     UTC Offset: -5
11:36:03:            PID: 4256
11:36:03:            CWD: C:\Users\ComputerGenie\AppData\Roaming\FAHClient
11:36:03:             OS: Windows 7 Ultimate
11:36:03:        OS Arch: AMD64
11:36:03:           GPUs: 0
11:36:03:  CUDA Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:6.1 Driver:8.0
11:36:03:  CUDA Device 1: Platform:0 Device:1 Bus:1 Slot:0 Compute:6.1 Driver:8.0
11:36:03:OpenCL Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:1.2 Driver:382.53
11:36:03:OpenCL Device 1: Platform:0 Device:1 Bus:1 Slot:0 Compute:1.2 Driver:382.53
11:36:03:  Win32 Service: false
11:36:03:***********************************************************************
11:36:03:<config>
11:36:03:  <service-description v='Folding@home Client'/>
11:36:03:  <service-restart v='true'/>
11:36:03:  <service-restart-delay v='5000'/>
11:36:03:
11:36:03:  <!-- Client Control -->
11:36:03:  <client-threads v='6'/>
11:36:03:  <cycle-rate v='4'/>
11:36:03:  <cycles v='-1'/>
11:36:03:  <data-directory v='.'/>
11:36:03:  <disable-sleep-when-active v='true'/>
11:36:03:  <exec-directory v='C:\Program Files (x86)\FAHClient'/>
11:36:03:  <exit-when-done v='false'/>
11:36:03:  <fold-anon v='false'/>
11:36:03:  <open-web-control v='false'/>
11:36:03:
11:36:03:  <!-- Configuration -->
11:36:03:  <config-rotate v='true'/>
11:36:03:  <config-rotate-dir v='configs'/>
11:36:03:  <config-rotate-max v='16'/>
11:36:03:
11:36:03:  <!-- Debugging -->
11:36:03:  <assignment-servers>
11:36:03:    assign3.stanford.edu:8080 assign4.stanford.edu:80
11:36:03:  </assignment-servers>
11:36:03:  <auth-as v='true'/>
11:36:03:  <capture-directory v='capture'/>
11:36:03:  <capture-on-error v='false'/>
11:36:03:  <capture-packets v='false'/>
11:36:03:  <capture-requests v='false'/>
11:36:03:  <capture-responses v='false'/>
11:36:03:  <capture-sockets v='false'/>
11:36:03:  <core-exec v='FahCore_$type'/>
11:36:03:  <core-wrapper-exec v='FAHCoreWrapper'/>
11:36:03:  <debug-sockets v='false'/>
11:36:03:  <exception-locations v='true'/>
11:36:03:  <gpu-assignment-servers>
11:36:03:    assign-GPU.stanford.edu:80 assign-GPU2.stanford.edu:80
11:36:03:  </gpu-assignment-servers>
11:36:03:  <stack-traces v='false'/>
11:36:03:
11:36:03:  <!-- Error Handling -->
11:36:03:  <max-slot-errors v='10'/>
11:36:03:  <max-unit-errors v='5'/>
11:36:03:
11:36:03:  <!-- Folding Core -->
11:36:03:  <checkpoint v='30'/>
11:36:03:  <core-dir v='cores'/>
11:36:03:  <core-priority v='idle'/>
11:36:03:  <cpu-affinity v='false'/>
11:36:03:  <cpu-usage v='100'/>
11:36:03:  <gpu-usage v='100'/>
11:36:03:  <no-assembly v='false'/>
11:36:03:
11:36:03:  <!-- Folding Slot Configuration -->
11:36:03:  <cause v='ALZHEIMERS'/>
11:36:03:  <client-subtype v='STDCLI'/>
11:36:03:  <client-type v='normal'/>
11:36:03:  <cpu-species v='X86_PENTIUM_II'/>
11:36:03:  <cpu-type v='AMD64'/>
11:36:03:  <cpus v='-1'/>
11:36:03:  <disable-viz v='false'/>
11:36:03:  <gpu v='true'/>
11:36:03:  <max-packet-size v='normal'/>
11:36:03:  <os-species v='WIN_7'/>
11:36:03:  <os-type v='WIN32'/>
11:36:03:  <project-key v='0'/>
11:36:03:  <smp v='true'/>
11:36:03:
11:36:03:  <!-- GUI -->
11:36:03:  <gui-enabled v='true'/>
11:36:03:
11:36:03:  <!-- HTTP Server -->
11:36:03:  <allow v='127.0.0.1'/>
11:36:03:  <connection-timeout v='60'/>
11:36:03:  <deny v='0/0'/>
11:36:03:  <http-addresses v='0:7396'/>
11:36:03:  <https-addresses v=''/>
11:36:03:  <max-connect-time v='900'/>
11:36:03:  <max-connections v='800'/>
11:36:03:  <max-request-length v='52428800'/>
11:36:03:  <min-connect-time v='300'/>
11:36:03:
11:36:03:  <!-- Logging -->
11:36:03:  <log v='log.txt'/>
11:36:03:  <log-color v='false'/>
11:36:03:  <log-crlf v='true'/>
11:36:03:  <log-date v='false'/>
11:36:03:  <log-date-periodically v='21600'/>
11:36:03:  <log-domain v='false'/>
11:36:03:  <log-header v='true'/>
11:36:03:  <log-level v='true'/>
11:36:03:  <log-no-info-header v='true'/>
11:36:03:  <log-redirect v='false'/>
11:36:03:  <log-rotate v='true'/>
11:36:03:  <log-rotate-dir v='logs'/>
11:36:03:  <log-rotate-max v='16'/>
11:36:03:  <log-short-level v='false'/>
11:36:03:  <log-simple-domains v='true'/>
11:36:03:  <log-thread-id v='false'/>
11:36:03:  <log-thread-prefix v='true'/>
11:36:03:  <log-time v='true'/>
11:36:03:  <log-to-screen v='true'/>
11:36:03:  <log-truncate v='false'/>
11:36:03:  <verbosity v='5'/>
11:36:03:
11:36:03:  <!-- Network -->
11:36:03:  <proxy v=':8080'/>
11:36:03:  <proxy-enable v='false'/>
11:36:03:  <proxy-pass v=''/>
11:36:03:  <proxy-user v=''/>
11:36:03:
11:36:03:  <!-- Process Control -->
11:36:03:  <child v='false'/>
11:36:03:  <daemon v='false'/>
11:36:03:  <pid v='false'/>
11:36:03:  <pid-file v='Folding@home Client.pid'/>
11:36:03:  <respawn v='false'/>
11:36:03:  <service v='false'/>
11:36:03:
11:36:03:  <!-- Remote Command Server -->
11:36:03:  <command-address v='0.0.0.0'/>
11:36:03:  <command-allow-no-pass v='127.0.0.1'/>
11:36:03:  <command-deny-no-pass v='0/0'/>
11:36:03:  <command-enable v='true'/>
11:36:03:  <command-port v='36330'/>
11:36:03:
11:36:03:  <!-- Slot Control -->
11:36:03:  <idle v='false'/>
11:36:03:  <max-shutdown-wait v='60'/>
11:36:03:  <pause-on-battery v='false'/>
11:36:03:  <pause-on-start v='true'/>
11:36:03:  <paused v='false'/>
11:36:03:  <power v='full'/>
11:36:03:
11:36:03:  <!-- User Information -->
11:36:03:  <machine-id v='0'/>
11:36:03:  <passkey v='********************************'/>
11:36:03:  <team v='*******'/>
11:36:03:  <user v='********************************'/>
11:36:03:
11:36:03:  <!-- Web Server -->
11:36:03:  <web-allow v='127.0.0.1'/>
11:36:03:  <web-deny v='0/0'/>
11:36:03:  <web-enable v='true'/>
11:36:03:
11:36:03:  <!-- Web Server Sessions -->
11:36:03:  <session-cookie v='sid'/>
11:36:03:  <session-lifetime v='86400'/>
11:36:03:  <session-timeout v='3600'/>
11:36:03:
11:36:03:  <!-- Work Unit Control -->
11:36:03:  <dump-after-deadline v='true'/>
11:36:03:  <max-queue v='16'/>
11:36:03:  <max-units v='0'/>
11:36:03:  <next-unit-percentage v='99'/>
11:36:03:  <stall-detection-enabled v='false'/>
11:36:03:  <stall-percent v='5'/>
11:36:03:  <stall-timeout v='1800'/>
11:36:03:
11:36:03:  <!-- Folding Slots -->
11:36:03:  <slot id='0' type='GPU'>
11:36:03:    <next-unit-percentage v='98'/>
11:36:03:    <paused v='true'/>
11:36:03:  </slot>
11:36:03:  <slot id='1' type='GPU'>
11:36:03:    <next-unit-percentage v='98'/>
11:36:03:  </slot>
11:36:03:</config>
11:36:03:Connecting to assign-GPU.stanford.edu:80
11:36:03:Updated GPUs.txt
11:36:03:Read GPUs.txt
11:36:03:Trying to access database...
11:36:03:Successfully acquired database lock
....
11:36:22:FS00:Unpaused
11:36:22:FS01:Unpaused
11:36:23:WU00:FS00:Connecting to 171.67.108.45:80
11:36:23:WU01:FS01:Connecting to 171.67.108.45:80
11:36:23:WU00:FS00:Connecting to 171.67.108.45:80
11:36:23:WU00:FS00:Assigned to work server 171.67.108.157
11:36:23:WU01:FS01:Connecting to 171.67.108.45:80
11:36:23:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:GP102 [GeForce GTX 1080 Ti] 11380 from 171.67.108.157
11:36:23:WU00:FS00:Connecting to 171.67.108.157:8080
11:36:24:WU01:FS01:Assigned to work server 171.67.108.102
11:36:24:WU00:FS00:Downloading 5.16MiB
11:36:24:WU01:FS01:Requesting new work unit for slot 01: READY gpu:1:GP104 [GeForce GTX 1080] 8873 from 171.67.108.102
11:36:24:WU01:FS01:Connecting to 171.67.108.102:8080
11:36:34:WU01:FS01:Downloading 7.06MiB
11:36:40:WU01:FS01:Download 94.77%
11:36:40:WU01:FS01:Download complete
11:36:40:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13204 run:31 clone:16 gen:98 core:0x21 unit:0x00000035ab436c6657894f0c0a705f85
User and team are ****ed out because, aside from being irrelevant, not all rigs are on same team or same user (but all are suffering the same issue).
Downloads just "hang" and there is nothing further in the log about that slot. FS01 got work and FS00 did not.
The "cause" setting doesn't matter (it happens on "any" or any individual setting).
As you can see, it's not an "internet connection issue" (since 1 of the 2 cards got work). I await further instructions/advice....
bollix47
Posts: 2941
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Sporadic issues with 171.67.108.xx servers

Post by bollix47 »

Okay there is at least one obvious problem in your log:

11:36:03: GPUs: 0

That usually indicates your drivers are not new enough to support those GPUs although it may have other meanings such as no GPUs.txt file but there's no sign of that in the log.

In this case I would start by setting the client to FINISH, when the WU is finished close the client and install the latest drivers from nvidia. This should ensure that they support your GPUs.

Then re-install the client by uninstalling including data (don't save your config.xml), re-run the installer, setup your folding info and configuration and try to Fold again.
ComputerGenie
Posts: 236
Joined: Mon Dec 12, 2016 4:06 am

Re: Sporadic issues with 171.67.108.xx servers

Post by ComputerGenie »

bollix47 wrote:Okay there is at least one obvious problem in your log:

11:36:03: GPUs: 0

That usually indicates your drivers are not new enough to support those GPUs although it may have other meanings such as no GPUs.txt file but there's no sign of that in the log.

In this case I would start by setting the client to FINISH, when the WU is finished close the client and install the latest drivers from nvidia. This should ensure that they support your GPUs.

Then re-install the client by uninstalling including data, re-run the installer and try to Fold again.
That was just a 1st run thing, the close/restart =

Code: Select all

12:00:42:           GPUs: 2
12:00:42:          GPU 0: Bus:2 Slot:0 Func:0 NVIDIA:7 GP102 [GeForce GTX 1080 Ti] 11380
12:00:42:          GPU 1: Bus:1 Slot:0 Func:0 NVIDIA:7 GP104 [GeForce GTX 1080] 8873
and Driver:382.53 is the newest (3 days old - since nothing else worked)
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Sporadic issues with 171.67.108.xx servers

Post by bruce »

1) please reset verbosity to the default value.
2) The key piece of information in that log is GPUs: 0. Why do you need to restart?
3) Are you running a copy of 32-bit Windows in a virtual machine? The emulated GPUs created by VMs are typically not supported (just as they are not supported when running in a Windows service). They have to be real GPUs.

Also, I strongly recommend a 64-bit OS.
ComputerGenie
Posts: 236
Joined: Mon Dec 12, 2016 4:06 am

Re: Sporadic issues with 171.67.108.xx servers

Post by ComputerGenie »

bruce wrote:1) please reset verbosity to the default value.
2) The key piece of information in that log is GPUs: 0
3) Are you running a copy of 32-bit Windows in a virtual machine? The emulated GPUs created by VMs are typically not supported (just as they are not supported when running in a Windows service). They have to be real GPUs.

Also, I strongly recommend a 64-bit OS.
That log is from a 64-bit Win 7 physical machine (with real GPUs).
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Sporadic issues with 171.67.108.xx servers

Post by bruce »

Suggestion: There may be a problem with assignments downloaded with no functional GPUs. Set pause-on-start=true and fix whatever problem causes the GPU:0 error before you download a new assignment.
bollix47
Posts: 2941
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Sporadic issues with 171.67.108.xx servers

Post by bollix47 »

That was just a 1st run thing, the close/restart =

Code: Select all

12:00:42:           GPUs: 2
12:00:42:          GPU 0: Bus:2 Slot:0 Func:0 NVIDIA:7 GP102 [GeForce GTX 1080 Ti] 11380
12:00:42:          GPU 1: Bus:1 Slot:0 Func:0 NVIDIA:7 GP104 [GeForce GTX 1080] 8873
and Driver:382.53 is the newest (3 days old - since nothing else worked)
When you installed those 3 day old drivers did you re-install the Folding@home software? If not then try that now as described in my previous post.
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: Sporadic issues with 171.67.108.xx servers

Post by foldy »

Now "GPUs: 2" looks good.

The original problem I think is that download never finishes.
11:36:24:WU00:FS00:Downloading 5.16MiB

I think that is a known issue were a downloading work unit gets stuck sometimes.
The only workaround I know is to restart the FahClient.

The question is why your PCs get this error more often?
ComputerGenie
Posts: 236
Joined: Mon Dec 12, 2016 4:06 am

Re: Sporadic issues with 171.67.108.xx servers

Post by ComputerGenie »

foldy wrote:...The original problem I think is that download never finishes.
11:36:24:WU00:FS00:Downloading 5.16MiB

I think that is a known issue were a downloading work unit gets stuck sometimes...
That's what I've been saying. :(
foldy wrote:...The only workaround I know is to restart the FahClient...
Given that it happened to 3 separate systems more than 60 times over the weekend, there's little chance of my continuing to do that. :(
foldy wrote:...The question is why your PCs get this error more often?
If I knew that, I'd have ~3 million more points. :P
ComputerGenie
Posts: 236
Joined: Mon Dec 12, 2016 4:06 am

Re: Sporadic issues with 171.67.108.xx servers

Post by ComputerGenie »

bruce wrote:Suggestion: There may be a problem with assignments downloaded with no functional GPUs. Set pause-on-start=true and fix whatever problem causes the GPU:0 error before you download a new assignment.
Because of what all loads on any given rig, That's already set. The "0" issue was specifically related to that particular start (maybe due to clearing all of the temp files except themes and config).
bollix47 wrote:When you installed those 3 day old drivers did you re-install the Folding@home software? If not then try that now as described in my previous post.
Yeah, I've tried straight update, update with reinstall, and any number of different possibilities as far as the chicken/egg of drivers and F@H.
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: Sporadic issues with 171.67.108.xx servers

Post by rwh202 »

That log shows a failed download from 171.67.108.157:8080 and a successful one from 171.67.108.102:8080

Have you ever had success from .157 or a failure from another server that has sometimes worked e.g. .102

Just trying to ascertain whether it's just one server or not.

Also, what happens when you try to access those addresses from a browser on your LAN, do you get the WS splash page for both?

Finally, are all your clients at the same location on the same LAN or is this affecting you in multiple locations?
ComputerGenie
Posts: 236
Joined: Mon Dec 12, 2016 4:06 am

Re: Sporadic issues with 171.67.108.xx servers

Post by ComputerGenie »

rwh202 wrote:That log shows a failed download from 171.67.108.157:8080 and a successful one from 171.67.108.102:8080

Have you ever had success from .157 or a failure from another server that has sometimes worked e.g. .102...
It has happened with more than one (ironically, .102 is usually the most afflicted; however, that could just be because it's also the one most designated).
rwh202 wrote:...Also, what happens when you try to access those addresses from a browser on your LAN, do you get the WS splash page for both?...
Yes, the "splash" shows fine, as are pings and tracert (usually under 10 hops).

rwh202 wrote:...Finally, are all your clients at the same location on the same LAN or is this affecting you in multiple locations?
Same geophysical location, 2 separate buildings, and 1 rig is on a different LAN. <- if that's at all possible to follow in a text-based forum.
ComputerGenie
Posts: 236
Joined: Mon Dec 12, 2016 4:06 am

Re: Sporadic issues with 171.67.108.xx servers

Post by ComputerGenie »

Is there a possible chance that it's a time sync issue (either in the software or on my end)?
I'm looking at the most recent WU...
on the "Status" tab, it says "Assigned: 2017-06-12T19:02:46Z"
in the log is: "19:02:22:WU00:FS00:0x21:Completed 187500 out of 6250000 steps (3%)"
Meaning that 24 seconds before status claims it was assigned, it was 3% done. :shock:
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: Sporadic issues with 171.67.108.xx servers

Post by rwh202 »

ComputerGenie wrote:Is there a possible chance that it's a time sync issue (either in the software or on my end)?
I'm looking at the most recent WU...
on the "Status" tab, it says "Assigned: 2017-06-12T19:02:46Z"
in the log is: "19:02:22:WU00:FS00:0x21:Completed 187500 out of 6250000 steps (3%)"
Meaning that 24 seconds before status claims it was assigned, it was 3% done. :shock:
That's an interesting observation. Not sure I've ever seen that - just checked on 3 slots and the assigned time matches the download entry in the log to within 2 seconds. Can't understand how it could impact things to that extent but running out of other ideas. Is your system time accurate or is it the server that's out?

With the separate LANs is there any shared hardware at all? The only times I've had similar issues, it was an overheating network switch that would drop packets when under load and then a dodgy cable, both causing permanently hung downloads in FAHClient.
ComputerGenie
Posts: 236
Joined: Mon Dec 12, 2016 4:06 am

Re: Sporadic issues with 171.67.108.xx servers

Post by ComputerGenie »

rwh202 wrote:...Can't understand how it could impact things to that extent but running out of other ideas. Is your system time accurate or is it the server that's out?

With the separate LANs is there any shared hardware at all? The only times I've had similar issues, it was an overheating network switch that would drop packets when under load and then a dodgy cable, both causing permanently hung downloads in FAHClient.
All of my systems are are synced at least once per day (I have one that is synced 3 times per day with a specific German server, but that's totally unrelated).
The only hardware that all 3 systems share is a single 4-port switch (and technically a modem).
Post Reply