Assignment Server errors [fixed]

The most demanding Projects are only available to a small percentage of very high-end servers.

Moderators: Site Moderators, PandeGroup

Assignment Server errors [fixed]

Postby bollix47 » Mon May 10, 2010 8:38 pm

From the News blog:

First, we have greatly improved the AS logic so it uses more information about the CPU.


Not sure if there's any connection but all my bigadv computers are only getting a3 WUs today and as of a couple hours ago they are getting "No appropriate work server was available; will try again in a bit"

Because I also run a GPU2 client on these computers my arguments for the SMP client are "-bigadv -smp 7 -verbosity 9" without the quotes and this has been working fine for months.

Hopefully this is just a temporary WU shortage or the result of a high net load on the assignment server.
Image
bollix47
 
Posts: 3479
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: "No appropriate work server was avail" for bigadv clients

Postby road-runner » Mon May 10, 2010 10:14 pm

Dang decided to switch back to folding for awhile and cant get any work...

[22:10:30] + Attempting to get work packet
[22:10:30] Passkey found
[22:10:30] - Connecting to assignment server
[22:10:31] + No appropriate work server was available; will try again in a bit.
[22:10:31] + Couldn't get work instructions.
[22:10:31] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[22:10:42] + Attempting to get work packet
[22:10:42] Passkey found
[22:10:42] - Connecting to assignment server
[22:10:42] + No appropriate work server was available; will try again in a bit.
[22:10:42] + Couldn't get work instructions.
[22:10:42] - Attempt #2 to get work failed, and no other work to do.
Waiting before retry.
[22:11:00] + Attempting to get work packet
[22:11:00] Passkey found
[22:11:00] - Connecting to assignment server
[22:11:00] + No appropriate work server was available; will try again in a bit.
[22:11:00] + Couldn't get work instructions.
[22:11:00] - Attempt #3 to get work failed, and no other work to do.
Waiting before retry.
[22:11:32] + Attempting to get work packet
[22:11:32] Passkey found
[22:11:32] - Connecting to assignment server
[22:11:33] + No appropriate work server was available; will try again in a bit.
[22:11:33] + Couldn't get work instructions.
[22:11:33] - Attempt #4 to get work failed, and no other work to do.
Waiting before retry.
[22:12:22] + Attempting to get work packet
[22:12:22] Passkey found
[22:12:22] - Connecting to assignment server
[22:12:22] + No appropriate work server was available; will try again in a bit.
[22:12:22] + Couldn't get work instructions.
[22:12:22] - Attempt #5 to get work failed, and no other work to do.
Waiting before retry.
Image
User avatar
road-runner
 
Posts: 466
Joined: Sun Dec 02, 2007 4:01 am
Location: Houston, Texas

Re: "No appropriate work server was avail" for bigadv clients

Postby kasson » Mon May 10, 2010 10:29 pm

There might be something funny going on with the assignment server this afternoon. We're looking into it.
User avatar
kasson
Pande Group Member
 
Posts: 1906
Joined: Thu Nov 29, 2007 9:37 pm

Re: "No appropriate work server was avail" for bigadv clients

Postby bruce » Mon May 10, 2010 11:23 pm

road-runner wrote:Dang decided to switch back to folding for awhile and cant get any work...


Try again now.

Which client version do you have? If you have not upgraded to v6.29+ you should do that ASAP. (You may or may not see any immediate changes, but do it anyway.)
bruce
 
Posts: 22322
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: "No appropriate work server was avail" for bigadv clients

Postby road-runner » Mon May 10, 2010 11:42 pm

I have been using the 6.29 mpiex file and the drop in linux fah from the bonus points thread, its working now...
User avatar
road-runner
 
Posts: 466
Joined: Sun Dec 02, 2007 4:01 am
Location: Houston, Texas

Re: "No appropriate work server was avail" for bigadv clients

Postby bollix47 » Mon May 10, 2010 11:46 pm

All clients are again processing WUs. They are a3 WUs but that's better than nothing.

Thanks for "looking into it".
bollix47
 
Posts: 3479
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: "No appropriate work server was avail" for bigadv clients

Postby kasson » Tue May 11, 2010 12:52 am

We've tracked it down to a problem with some of the new AS logic introduced today. Hopefully we can fix it this afternoon; if not, we'll revert to the old code and then do another debugging session later.
User avatar
kasson
Pande Group Member
 
Posts: 1906
Joined: Thu Nov 29, 2007 9:37 pm

Re: "No appropriate work server was avail" for bigadv clients

Postby bollix47 » Tue May 11, 2010 1:52 am

FYI

My dual cpu xeon box just received a ProtoMol WU, something none of my SMP clients have ever seen before.

o/s: Windows 7 Professional
Client: 6.29
Arguments: -verbosity 9 -bigadv -smp 15

Code: Select all
[01:15:58] Project: 6061 (Run 0, Clone 152, Gen 10)


[01:15:58] + Attempting to send results [May 11 01:15:58 UTC]
[01:15:58] - Reading file work/wuresults_02.dat from core
[01:15:58]   (Read 3793318 bytes from disk)
[01:15:58] Connecting to http://171.64.65.54:8080/
[01:17:10] Posted data.
[01:17:10] Initial: 0000; - Uploaded at ~51 kB/s
[01:17:10] - Averaged speed for that direction ~50 kB/s
[01:17:10] + Results successfully sent
[01:17:10] Thank you for your contribution to Folding@Home.
[01:17:10] + Number of Units Completed: 42

[01:17:15] Trying to send all finished work units
[01:17:15] + No unsent completed units remaining.
[01:17:15] - Preparing to get new work unit...
[01:17:15] Cleaning up work directory
[01:17:15] + Attempting to get work packet
[01:17:15] Passkey found
[01:17:15] - Will indicate memory of 12279 MB
[01:17:15] - Connecting to assignment server
[01:17:15] Connecting to http://assign.stanford.edu:8080/
[01:17:16] Posted data.
[01:17:16] Initial: 4A81; - Successful: assigned to (129.74.85.15).
[01:17:16] + News From Folding@Home: Welcome to Folding@Home
[01:17:16] Loaded queue successfully.
[01:17:16] Connecting to http://129.74.85.15:8080/
[01:17:17] Posted data.
[01:17:17] Initial: 0000; - Receiving payload (expected size: 38265)
[01:17:18] - Downloaded at ~37 kB/s
[01:17:18] - Averaged speed for that direction ~328 kB/s
[01:17:18] + Received work.
[01:17:18] Trying to send all finished work units
[01:17:18] + No unsent completed units remaining.
[01:17:18] + Closed connections
[01:17:18]
[01:17:18] + Processing work unit
[01:17:18] Work type b4 not eligible for variable processors
[01:17:18] Core required: FahCore_b4.exe
[01:17:18] Core not found.
[01:17:18] - Core is not present or corrupted.
[01:17:18] - Attempting to download new core...
[01:17:18] + Downloading new core: FahCore_b4.exe
[01:17:18] Downloading core (/~pande/Win32/x86/Core_b4.fah from www.stanford.edu)
[01:17:18] Initial: AFDE; + 10240 bytes downloaded

<SNIP>

[01:17:27] Initial: C685; + 5169901 bytes downloaded
[01:17:27] Verifying core Core_b4.fah...
[01:17:27] Signature is VALID
[01:17:27]
[01:17:27] Trying to unzip core FahCore_b4.exe
[01:17:30] Decompressed FahCore_b4.exe (16564224 bytes) successfully
[01:17:35] + Core successfully engaged
[01:17:41]
[01:17:41] + Processing work unit
[01:17:41] Work type b4 not eligible for variable processors
[01:17:41] Core required: FahCore_b4.exe
[01:17:41] Core found.
[01:17:41] Working on queue slot 03 [May 11 01:17:41 UTC]
[01:17:41] + Working ...
[01:17:41] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_b4.exe -dir work/ -suffix 03 -priority 96 -checkpoint 15 -verbose -lifeline 6880 -version 629'

[01:17:46] *********************** Log Started 11/May/2010 01:17:46 ***********************
[01:17:46] ************************** ProtoMol Folding@Home Core **************************
[01:17:46]   Version: 23
[01:17:46]      Type: 180
[01:17:46]      Core: ProtoMol
[01:17:46]   Website: http://folding.stanford.edu/
[01:17:46] Copyright: (c) 2009 Stanford University
[01:17:46]    Author: Joseph Coffland <joseph@cauldrondevelopment.com>
[01:17:46]      Args: -dir work/ -suffix 03 -priority 96 -checkpoint 15 -verbose -lifeline
[01:17:46]            6880 -version 629
[01:17:46] ************************************ Build *************************************
[01:17:46]      Date: Mar 22 2010
[01:17:46]      Time: 16:55:15
[01:17:46]  Revision: 1789
[01:17:46]  Compiler: Intel(R) C++ MSVC 1500 mode 1110
[01:17:46]   Options: /TP /nologo /EHsc /wd4297 /wd4103 /wd1786 /arch:IA32 /Ox
[01:17:46]            /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
[01:17:46]   Defines: _CRT_SECURE_NO_WARNINGS NDEBUG HAVE_GEEKINFO BOOST_ALL_NO_LIB
[01:17:46]            XML_STATIC HAVE_EXPAT HAVE_OPENSSL HAVE_LIBFAH HAVE_SIMTK_LAPACK
[01:17:46]  Platform: Windows XP
[01:17:46]      Bits: 32
[01:17:46]      Mode: Release
[01:17:46] ************************************ System ************************************
[01:17:46]        OS: Microsoft Windows 7 Professional
[01:17:46]       CPU: Intel(R) Xeon(R) CPU E5540 @ 2.53GHz
[01:17:46]    CPU ID: GenuineIntel Family 6 Model 26 Stepping 5
[01:17:46]      CPUs: 32 Logical, 2 Physical  <===========  WRONG - s/b 16 Logical, 2 Physical *********************
[01:17:46]    Memory: 12.0 GB
[01:17:46]   Threads: Windows
[01:17:46] ********************************************************************************
[01:17:46] Project: 10040 (Run 99, Clone 0, Gen 0)
[01:17:46] Reading tar file par_all27_prot_lipid.inp
[01:17:46] Reading tar file scpismQuartic.inp
[01:17:46] Reading tar file ww_exteq_nowater1.pdb
[01:17:46] Reading tar file ww_exteq_nowater1.psf
[01:17:46] Reading tar file protomol.conf
[01:17:46] Reading tar file core.xml
[01:17:46] Digital signatures verified
[01:17:47] GUI Server started
[01:17:47] Completed 0 out of 2000000 steps (0%)
[01:22:23] Completed 20000 out of 2000000 steps (1%)
[01:27:03] Completed 40000 out of 2000000 steps (2%)
[01:31:42] Completed 60000 out of 2000000 steps (3%)
[01:36:19] Completed 80000 out of 2000000 steps (4%)
[01:40:57] Completed 100000 out of 2000000 steps (5%)
[01:45:39] Completed 120000 out of 2000000 steps (6%)


Went from 19K PPD to 512 PPD. :roll:
bollix47
 
Posts: 3479
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: "No appropriate work server was avail" for bigadv clients

Postby road-runner » Tue May 11, 2010 2:50 am

What the heck, I keep getting 10040 WUs that only load 1 core when I was running bigadv. What happen to folding while I was gone? Damn thing seem to be getting worse instead of better
User avatar
road-runner
 
Posts: 466
Joined: Sun Dec 02, 2007 4:01 am
Location: Houston, Texas

Re: "No appropriate work server was avail" for bigadv clients

Postby Punchy » Tue May 11, 2010 2:57 am

All 3 of my SMP systems (using -advmethods at the moment) are now running Protomol projects - so I'm using 3 out of 60 available cores. The worst thing is that I can't even start additional clients due to a B4 core bug I read about elsewhere - they simply fail immediately with a socket error.
Punchy
 
Posts: 218
Joined: Fri Feb 19, 2010 1:49 am

What is WU 1796

Postby Grandpa_01 » Tue May 11, 2010 3:37 am

Why would I get this WU on a bigadv machine

Code: Select all
[19:18:50] Loaded queue successfully.
[19:18:50]
[19:18:50] + Processing work unit
[19:18:50] Work type a0 not eligible for variable processors
[19:18:50] Core required: FahCore_a0.exe
[19:18:50] Core found.
[19:18:50] Working on queue slot 00 [May 10 19:18:50 UTC]
[19:18:50] + Working ...
[19:18:50]
[19:18:50] *------------------------------*
[19:18:50] Folding@Home Gromacs 3.3 Core
[19:18:50] Version 1.92 (April 17. 2007)
[19:18:50]
[19:18:50] Preparing to commence simulation
[19:18:50] - Ensuring status. Please wait.
[19:19:07] - Looking at optimizations...
[19:19:07] - Working with standard loops on this execution.
[19:19:07] - Previous termination of core was improper.
[19:19:07] - Files status OK
[19:19:08] - Expanded 928912 -> 5947613 (decompressed 640.2 percent)
[19:19:08]
[19:19:08] Project: 1796 (Run 5, Clone 188, Gen 0)
[19:19:08]
[19:19:08] Entering M.D.
[19:19:28] (Starting from checkpoint)
[19:19:28] Protein: 41330
[19:19:28] Writing local files
[19:19:29] Writing local files
[19:19:29] Completed 0 out of 500000 steps  (0 percent)

Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
User avatar
Grandpa_01
 
Posts: 1757
Joined: Wed Mar 04, 2009 7:36 am

Re: "No appropriate work server was avail" for bigadv clients

Postby rexrzer » Tue May 11, 2010 3:48 am

Punchy wrote:All 3 of my SMP systems (using -advmethods at the moment) are now running Protomol projects - so I'm using 3 out of 60 available cores. The worst thing is that I can't even start additional clients due to a B4 core bug I read about elsewhere - they simply fail immediately with a socket error.


I figured something BAD was going on when that started up at my place, too!

So I have simply shut things down until Kasson and company fix the Assignment Servers, because they are TOAST
right now! I mean I am either getting A0 WU's on one machine, or these Protomo1 Work Units on the other, and
nothing to stop it! CRAZY! This is a total chaotic mess, like the worst thing I've ever seen here HahHah!

WOW! Poor Kasson and company...! Like NUTZ!

I wish them the best of luck, and us too, so we can get back to our Big ADV Units and A3 Units...right now
its' best to just shut things down until further notice, I think?

You all tell me what you are doing about this scene!
i7 970 HexCore @ 4.3Ghz/24GB RAM; i7 920 @ 4.2Ghz/6GB RAM; Asus G73SW-3DE laptop/Core i7 2630QM @ 2.5 Ghz/16GB RAM; i7 920 @ 4.2Ghz/6GB RAM+GPU Clients: 2 EVGA GTX-560 Ti SC's-SLI+2 EVGA GTX-560 Ti SC's, all 'clocked 980/1960/2170
rexrzer
 
Posts: 122
Joined: Sat Dec 08, 2007 10:45 am

Re: What is WU 1796

Postby Zagen30 » Tue May 11, 2010 4:02 am

Image
Zagen30
 
Posts: 1589
Joined: Tue Mar 25, 2008 12:45 am

Re: What is WU 1796

Postby P5-133XL » Tue May 11, 2010 4:15 am

viewtopic.php?f=55&t=14519&p=142734#p142734


kasson wrote:We've tracked it down to a problem with some of the new AS logic introduced today. Hopefully we can fix it this afternoon; if not, we'll revert to the old code and then do another debugging session later.
Image
P5-133XL
 
Posts: 4034
Joined: Sun Dec 02, 2007 4:36 am
Location: Salem. OR USA

Re: "No appropriate work server was avail" for bigadv clients

Postby glussier » Tue May 11, 2010 4:25 am

All work units have to be processed, so why be selective? Just let them run, and soon, you'll be back folding your A3 or bigadv units.
Image
glussier
 
Posts: 16
Joined: Wed Nov 18, 2009 3:57 am

Next

Return to SMP with bigadv

Who is online

Users browsing this forum: No registered users and 2 guests

cron