128.143.231.201 / 202 in reject

Moderators: Site Moderators, FAHC Science Team

Post Reply
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

128.143.231.201 / 202 in reject

Post by Grandpa_01 »

Just a heads up 128.143.231.201 / 202 are in reject along with several other servers :ewink:
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
bollix47
Posts: 2941
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: 128.143.231.201 / 202 in reject

Post by bollix47 »

Thank you for the heads up ... PG notified.
Image
EXT64
Posts: 323
Joined: Mon Apr 09, 2012 11:54 pm

Re: 128.143.231.201 / 202 in reject

Post by EXT64 »

bollix47 - this is probably the wrong place to put this, but since it started with the same problem Grandpa_01 had I'll add it here. One of my bigadv rigs of course (due to the problem above) reverted to SMP. However, now it can no longer even validate the assignment servers (assign or assign2). I've pulled it off try (to do updates, maybe a little boinc - no use in sitting idle).
bollix47
Posts: 2941
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: 128.143.231.201 / 202 in reject

Post by bollix47 »

Mine has just downloaded an SMP work unit (P7809) in the last 10 minutes and appears to be working okay ... there did appear to be a few attempts before the AS finally sent the client to a WS with work (took a little over 3 minutes). The previous WU was a bigadv and did successfully return to the CS (128.143.199.97).

Code: Select all

14:15:41:WU01:FS00:0xa5:Completed 250000 out of 250000 steps  (100%)
13:28:32:WU01:FS00:Connecting to 171.67.108.200:8080
13:28:33:WU01:FS00:Assigned to work server 143.89.28.72
13:28:33:WU01:FS00:Requesting new work unit for slot 00: RUNNING cpu:64 from 143.89.28.72
13:28:33:WU01:FS00:Connecting to 143.89.28.72:8080
13:28:33:ERROR:WU01:FS00:Exception: Server did not assign work unit
13:28:34:WU01:FS00:Connecting to 171.67.108.200:8080
13:28:34:WU01:FS00:Assigned to work server 143.89.28.72
13:28:34:WU01:FS00:Requesting new work unit for slot 00: RUNNING cpu:64 from 143.89.28.72
13:28:34:WU01:FS00:Connecting to 143.89.28.72:8080
13:28:35:ERROR:WU01:FS00:Exception: Server did not assign work unit
13:28:48:WU00:FS00:0xa5:DynamicWrapper: Finished Work Unit: sleep=10000
13:28:58:WU00:FS00:0xa5:
13:28:58:WU00:FS00:0xa5:Finished Work Unit:
13:28:58:WU00:FS00:0xa5:- Reading up to 64340496 from "00/wudata_01.trr": Read 64340496
13:28:58:WU00:FS00:0xa5:trr file hash check passed.
13:28:58:WU00:FS00:0xa5:- Reading up to 31615820 from "00/wudata_01.xtc": Read 31615820
13:28:59:WU00:FS00:0xa5:xtc file hash check passed.
13:28:59:WU00:FS00:0xa5:edr file hash check passed.
13:28:59:WU00:FS00:0xa5:logfile size: 206581
13:28:59:WU00:FS00:0xa5:Leaving Run
13:29:02:WU00:FS00:0xa5:- Writing 96323773 bytes of core data to disk...
13:29:33:WU00:FS00:0xa5:Done: 96323261 -> 91567211 (compressed to 5.8 percent)
13:29:33:WU00:FS00:0xa5:  ... Done.
13:29:34:WU01:FS00:Connecting to 171.67.108.200:8080
13:29:34:WARNING:WU01:FS00:Failed to get assignment from '171.67.108.200:8080': Empty work server assignment
13:29:34:WU01:FS00:Connecting to 171.64.65.121:80
13:30:04:WU00:FS00:0xa5:- Shutting down core
13:30:04:WU00:FS00:0xa5:
13:30:04:WU00:FS00:0xa5:Folding@home Core Shutdown: FINISHED_UNIT
13:30:09:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
13:30:09:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:8101 run:6 clone:9 gen:447 core:0xa5 unit:0x00000282088988e14f296a5448a68f5a
13:30:09:WU00:FS00:Uploading 87.33MiB to 128.143.231.201
13:30:09:WU00:FS00:Connecting to 128.143.231.201:8080
13:31:42:WARNING:WU01:FS00:Failed to get assignment from '171.64.65.121:80': Failed to connect to 171.64.65.121:80: Connection timed out
13:31:42:ERROR:WU01:FS00:Exception: Could not get an assignment
13:31:42:WU01:FS00:Connecting to 171.67.108.200:8080
13:31:42:WU01:FS00:Assigned to work server 171.64.65.99
13:31:42:WU01:FS00:Requesting new work unit for slot 00: READY cpu:64 from 171.64.65.99
13:31:42:WU01:FS00:Connecting to 171.64.65.99:8080
13:31:44:WU01:FS00:Downloading 1.98MiB
13:31:46:WU01:FS00:Download complete
13:31:46:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:7809 run:4 clone:226 gen:67 core:0xa4 unit:0x000000560a3b1e874e310e298d7351d7
13:31:46:WU01:FS00:Starting
13:31:46:WU01:FS00:Running FahCore: /home/bollix/fah/FAHCoreWrapper /home/bollix/fah/cores/web.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 2839 -checkpoint 30 -np 64
13:31:46:WU01:FS00:Started FahCore on PID 23250
13:31:46:WU01:FS00:Core PID:23254
13:31:46:WU01:FS00:FahCore 0xa4 started
13:31:47:WU01:FS00:0xa4:
13:31:47:WU01:FS00:0xa4:*------------------------------*
13:31:47:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
13:31:47:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
13:31:47:WU01:FS00:0xa4:
13:31:47:WU01:FS00:0xa4:Preparing to commence simulation
13:31:47:WU01:FS00:0xa4:- Looking at optimizations...
13:31:47:WU01:FS00:0xa4:- Created dyn
13:31:47:WU01:FS00:0xa4:- Files status OK
13:31:47:WU01:FS00:0xa4:- Expanded 2079310 -> 5386224 (decompressed 259.0 percent)
13:31:47:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=2079310 data_size=5386224, decompressed_data_size=5386224 diff=0
13:31:47:WU01:FS00:0xa4:- Digital signature verified
13:31:47:WU01:FS00:0xa4:
13:31:47:WU01:FS00:0xa4:Project: 7809 (Run 4, Clone 226, Gen 67)
13:31:47:WU01:FS00:0xa4:
13:31:47:WU01:FS00:0xa4:Assembly optimizations on if available.
13:31:47:WU01:FS00:0xa4:Entering M.D.
13:31:55:WU01:FS00:0xa4:Completed 0 out of 1500000 steps  (0%)
13:32:16:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
13:32:16:WU00:FS00:Connecting to 128.143.231.201:80
13:34:23:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.143.231.201:80: Connection timed out
13:34:23:WU00:FS00:Trying to send results to collection server
13:34:23:WU00:FS00:Uploading 87.33MiB to 128.143.199.97
13:34:23:WU00:FS00:Connecting to 128.143.199.97:8080
13:34:29:WU00:FS00:Upload 3.08%
13:34:35:WU00:FS00:Upload 6.08%
13:34:41:WU00:FS00:Upload 9.38%
13:34:47:WU00:FS00:Upload 11.74%
13:34:53:WU00:FS00:Upload 14.17%
13:35:00:WU00:FS00:Upload 17.39%
13:35:06:WU00:FS00:Upload 19.75%
13:35:09:WU01:FS00:0xa4:Completed 15000 out of 1500000 steps  (1%)
13:35:12:WU00:FS00:Upload 22.33%
13:35:19:WU00:FS00:Upload 25.62%
13:35:25:WU00:FS00:Upload 27.98%
13:35:31:WU00:FS00:Upload 30.42%
13:35:38:WU00:FS00:Upload 33.57%
13:35:44:WU00:FS00:Upload 36.00%
13:35:50:WU00:FS00:Upload 38.58%
13:35:56:WU00:FS00:Upload 41.15%
13:36:04:WU00:FS00:Upload 44.30%
13:36:10:WU00:FS00:Upload 46.74%
13:36:16:WU00:FS00:Upload 49.96%
13:36:23:WU00:FS00:Upload 52.32%
13:36:30:WU00:FS00:Upload 55.90%
13:36:36:WU00:FS00:Upload 58.26%
13:36:43:WU00:FS00:Upload 61.48%
13:36:49:WU00:FS00:Upload 63.91%
13:36:56:WU00:FS00:Upload 67.06%
13:37:02:WU00:FS00:Upload 69.50%
13:37:09:WU00:FS00:Upload 72.64%
13:37:15:WU00:FS00:Upload 75.08%
13:37:23:WU00:FS00:Upload 78.30%
13:37:30:WU00:FS00:Upload 81.52%
13:37:36:WU00:FS00:Upload 83.88%
13:37:42:WU00:FS00:Upload 86.31%
13:37:49:WU00:FS00:Upload 89.68%
13:37:55:WU00:FS00:Upload 92.11%
13:38:01:WU00:FS00:Upload 94.76%
13:38:07:WU00:FS00:Upload 97.19%
13:38:22:WU01:FS00:0xa4:Completed 30000 out of 1500000 steps  (2%)
13:38:32:WU00:FS00:Upload complete
13:38:32:WU00:FS00:Server responded WORK_ACK (400)
13:38:32:WU00:FS00:Final credit estimate, 318564.00 points
13:38:32:WU00:FS00:Cleaning up
Image
bollix47
Posts: 2941
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: 128.143.231.201 / 202 in reject

Post by bollix47 »

Response from kasson posted at his request:
kasson wrote:Yes, all our machines in that data center got knocked offline. I can't reach any of those machines at the moment. I sent our networking people email, but I can't provide an ETA at the moment.
Image
EXT64
Posts: 323
Joined: Mon Apr 09, 2012 11:54 pm

Re: 128.143.231.201 / 202 in reject

Post by EXT64 »

Great - thanks for the update!
Post Reply