171.64.65.56 ???

Moderators: Site Moderators, FAHC Science Team

Re: 171.64.65.56 ???

Postby lobuxracer » Tue Oct 05, 2010 5:42 am

You are not alone. HTTP 503 here too.
Image
lobuxracer
 
Posts: 18
Joined: Mon Aug 18, 2008 6:49 am

Re: 171.64.65.56 ???

Postby susato » Tue Oct 05, 2010 7:22 am

Netload's back at 200. Every time Dr. K. bumps it, it heads right back to 200 connections and trouble. Good to hear about the upcoming equipment upgrade.
susato
Site Moderator
 
Posts: 511
Joined: Fri Nov 30, 2007 5:57 am
Location: Team MacOSX

Re: 171.64.65.56 ???

Postby ThunderRd » Tue Oct 05, 2010 7:48 am

susato wrote:Netload's back at 200. Every time Dr. K. bumps it, it heads right back to 200 connections and trouble. Good to hear about the upcoming equipment upgrade.


Actually it's at 202 ATM.

Additional server is welcomed. I've got 5 machines waiting for work now, 4 of them for over 12 hours.
ThunderRd
 
Posts: 78
Joined: Sun Dec 02, 2007 6:30 am
Location: Nong Khai, Thailand

Re: 171.64.65.56 ???

Postby snapshot » Tue Oct 05, 2010 12:11 pm

I've got two WUs waiting for upload with their bonus points frittering away.....
User avatar
snapshot
 
Posts: 132
Joined: Thu Apr 09, 2009 8:25 pm
Location: Wiltshire, UK

Re: 171.64.65.56 ???

Postby Tobit » Tue Oct 05, 2010 1:34 pm

65.56 needs another kick please. At the time of this post, net load is over 200 again. :(
User avatar
Tobit
 
Posts: 342
Joined: Thu Apr 17, 2008 3:35 pm
Location: Manchester, NH USA

Re: 171.64.65.56 ???

Postby Datsun 1600 » Tue Oct 05, 2010 2:20 pm

How can I get permanently assigned to Server 171.64.65.54, I have no problems with that one, but as soon as I get assigned a WU from Server 171.64.65.56 and go to return it, it is usually overloaded? GRRRRRRRRRRR
Datsun 1600
 
Posts: 33
Joined: Mon May 05, 2008 3:42 am

Re: 171.64.65.56 ???

Postby VijayPande » Tue Oct 05, 2010 3:12 pm

I'm sorry about this. We're monitoring the situation and have a longer term solution (new servers, work distributed amongst them), but that will take some time to get on line (sorry it's taking so long).
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
VijayPande
Pande Group Member
 
Posts: 2058
Joined: Fri Nov 30, 2007 7:25 am
Location: Stanford

Re: 171.64.65.56 ???

Postby 7up1n3 » Tue Oct 05, 2010 4:55 pm

Getting 503s here too now.

Code: Select all
[13:11:12] Completed 2000000 out of 2000000 steps  (100%)
[13:11:12] DynamicWrapper: Finished Work Unit: sleep=10000
[13:11:23]
[13:11:23] Finished Work Unit:
[13:11:23] - Reading up to 687408 from "work/wudata_05.trr": Read 687408
[13:11:23] trr file hash check passed.
[13:11:23] - Reading up to 42672364 from "work/wudata_05.xtc": Read 42672364
[13:11:23] xtc file hash check passed.
[13:11:23] edr file hash check passed.
[13:11:23] logfile size: 279301
[13:11:23] Leaving Run
[13:11:23] - Writing 43641409 bytes of core data to disk...
[13:11:24]   ... Done.
[13:11:25] - Shutting down core
[13:11:25]
[13:11:25] Folding@home Core Shutdown: FINISHED_UNIT
[13:11:29] CoreStatus = 64 (100)
[13:11:29] Unit 5 finished with 91 percent of time to deadline remaining.
[13:11:29] Updated performance fraction: 0.673357
[13:11:29] Sending work to server
[13:11:29] Project: 6701 (Run 56, Clone 11, Gen 58)


[13:11:29] + Attempting to send results [October 5 13:11:29 UTC]
[13:11:29] - Reading file work/wuresults_05.dat from core
[13:11:29]   (Read 43641409 bytes from disk)
[13:11:29] Connecting to http://171.64.65.56:8080/
[13:11:29] - Couldn't send HTTP request to server
[13:11:29]   (Got status 503)
[13:11:29] + Could not connect to Work Server (results)
[13:11:29]     (171.64.65.56:8080)
[13:11:29] + Retrying using alternative port
[13:11:29] Connecting to http://171.64.65.56:80/
[13:11:29] - Couldn't send HTTP request to server
[13:11:29]   (Got status 503)
[13:11:29] + Could not connect to Work Server (results)
[13:11:29]     (171.64.65.56:80)
[13:11:29] - Error: Could not transmit unit 05 (completed October 5) to work server.
[13:11:29] - 1 failed uploads of this unit.
[13:11:29]   Keeping unit 05 in queue.
[13:11:29] Trying to send all finished work units
[13:11:29] Project: 6701 (Run 56, Clone 11, Gen 58)


[13:11:29] + Attempting to send results [October 5 13:11:29 UTC]
[13:11:29] - Reading file work/wuresults_05.dat from core
[13:11:29]   (Read 43641409 bytes from disk)
[13:11:29] Connecting to http://171.64.65.56:8080/
[13:11:29] - Couldn't send HTTP request to server
[13:11:29]   (Got status 503)
[13:11:29] + Could not connect to Work Server (results)
[13:11:29]     (171.64.65.56:8080)
[13:11:29] + Retrying using alternative port
[13:11:29] Connecting to http://171.64.65.56:80/
[13:11:29] - Couldn't send HTTP request to server
[13:11:29]   (Got status 503)
[13:11:29] + Could not connect to Work Server (results)
[13:11:29]     (171.64.65.56:80)
[13:11:29] - Error: Could not transmit unit 05 (completed October 5) to work server.
[13:11:29] - 2 failed uploads of this unit.


[13:11:29] + Attempting to send results [October 5 13:11:29 UTC]
[13:11:29] - Reading file work/wuresults_05.dat from core
[13:11:29]   (Read 43641409 bytes from disk)
[13:11:29] Connecting to http://171.67.108.25:8080/
[13:11:29] - Couldn't send HTTP request to server
[13:11:29]   (Got status 503)
[13:11:29] + Could not connect to Work Server (results)
[13:11:29]     (171.67.108.25:8080)
[13:11:29] + Retrying using alternative port
[13:11:29] Connecting to http://171.67.108.25:80/
[13:11:29] - Couldn't send HTTP request to server
[13:11:29]   (Got status 503)
[13:11:29] + Could not connect to Work Server (results)
[13:11:29]     (171.67.108.25:80)
[13:11:29]   Could not transmit unit 05 to Collection server; keeping in queue.
[13:11:29] + Sent 0 of 1 completed units to the server
[13:11:29] - Preparing to get new work unit...
[13:11:29] Cleaning up work directory
[13:11:31] + Attempting to get work packet
[13:11:31] Passkey found
[13:11:31] - Will indicate memory of 12285 MB
[13:11:31] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 10
[13:11:31] - Connecting to assignment server
[13:11:31] Connecting to http://assign.stanford.edu:8080/
[13:11:32] Posted data.
[13:11:32] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[13:11:32] + News From Folding@Home: Welcome to Folding@Home
[13:11:32] Loaded queue successfully.
[13:11:32] Sent data
[13:11:32] Connecting to http://171.64.65.54:8080/
[13:11:33] Posted data.
[13:11:33] Initial: 0000; - Receiving payload (expected size: 1765609)
[13:11:35] - Downloaded at ~862 kB/s
[13:11:35] - Averaged speed for that direction ~615 kB/s
[13:11:35] + Received work.
[13:11:35] Trying to send all finished work units
[13:11:35] Project: 6701 (Run 56, Clone 11, Gen 58)


[13:11:35] + Attempting to send results [October 5 13:11:35 UTC]
[13:11:35] - Reading file work/wuresults_05.dat from core
[13:11:35]   (Read 43641409 bytes from disk)
[13:11:35] Connecting to http://171.64.65.56:8080/
[13:11:36] - Couldn't send HTTP request to server
[13:11:36]   (Got status 503)
[13:11:36] + Could not connect to Work Server (results)
[13:11:36]     (171.64.65.56:8080)
[13:11:36] + Retrying using alternative port
[13:11:36] Connecting to http://171.64.65.56:80/
[13:11:36] - Couldn't send HTTP request to server
[13:11:36]   (Got status 503)
[13:11:36] + Could not connect to Work Server (results)
[13:11:36]     (171.64.65.56:80)
[13:11:36] - Error: Could not transmit unit 05 (completed October 5) to work server.
[13:11:36] - 3 failed uploads of this unit.


[13:11:36] + Attempting to send results [October 5 13:11:36 UTC]
[13:11:36] - Reading file work/wuresults_05.dat from core
[13:11:36]   (Read 43641409 bytes from disk)
[13:11:36] Connecting to http://171.67.108.25:8080/
[13:11:36] - Couldn't send HTTP request to server
[13:11:36]   (Got status 503)
[13:11:36] + Could not connect to Work Server (results)
[13:11:36]     (171.67.108.25:8080)
[13:11:36] + Retrying using alternative port
[13:11:36] Connecting to http://171.67.108.25:80/
[13:11:36] - Couldn't send HTTP request to server
[13:11:36]   (Got status 503)
[13:11:36] + Could not connect to Work Server (results)
[13:11:36]     (171.67.108.25:80)
[13:11:36]   Could not transmit unit 05 to Collection server; keeping in queue.
[13:11:36] + Sent 0 of 1 completed units to the server
[13:11:36] + Closed connections
[13:11:36]
[13:11:36] + Processing work unit
[13:11:36] Core required: FahCore_a3.exe
[13:11:36] Core found.
[13:11:36] Working on queue slot 06 [October 5 13:11:36 UTC]
[13:11:36] + Working ...
[13:11:36] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 06 -np 8 -checkpoint 15 -verbose -lifeline 2668 -version 630'

[13:11:36]
[13:11:36] *------------------------------*
[13:11:36] Folding@Home Gromacs SMP Core
[13:11:36] Version 2.22 (Mar 12, 2010)
[13:11:36]
[13:11:36] Preparing to commence simulation
[13:11:36] - Looking at optimizations...
[13:11:36] - Created dyn
[13:11:36] - Files status OK
[13:11:36] - Expanded 1765097 -> 2251569 (decompressed 127.5 percent)
[13:11:36] Called DecompressByteArray: compressed_data_size=1765097 data_size=2251569, decompressed_data_size=2251569 diff=0
[13:11:36] - Digital signature verified
[13:11:36]
[13:11:36] Project: 6055 (Run 1, Clone 160, Gen 45)
[13:11:36]
[13:11:36] Assembly optimizations on if available.
[13:11:36] Entering M.D.
[13:11:43] Completed 0 out of 500000 steps  (0%)


What happens to WUs that fail to upload like this? Do they expire when the deadline passes? Is the bonus frittered away?
Image
Rage3D Admin ~ The Fighting 300 ~ Team Rage3D Folding
7up1n3
 
Posts: 68
Joined: Sun Dec 02, 2007 3:55 am

Re: 171.64.65.56 ???

Postby PantherX » Tue Oct 05, 2010 5:48 pm

7up1n3 wrote:...What happens to WUs that fail to upload like this? Do they expire when the deadline passes? Is the bonus frittered away?
In this case, you bonus points will reduce according to the delay caused by the Server. If you cross the Preferred Deadline, you will be assigned Base Credits. If you cross the Final Deadline, you will not get any credits and the Client will delete the WU and move on. Sorry but that is the way it currently works. Hopefully when new SMP Servers are added, this will be history and we all will be happy.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6315
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 171.64.65.56 ???

Postby codysluder » Tue Oct 05, 2010 7:01 pm

VijayPande wrote:I'm sorry about this. We're monitoring the situation and have a longer term solution (new servers, work distributed amongst them), but that will take some time to get on line (sorry it's taking so long).


I understand how long it takes to get new servers on-line but there's a real conflict between how long that's going to take and how long it takes for our QRBonus to decay. Are there any short-term solutions that can help? Why can't at least some of the work be distributed as has been suggested here?
Datsun 1600 wrote:How can I get permanently assigned to Server 171.64.65.54, I have no problems with that one, but as soon as I get assigned a WU from Server 171.64.65.56 and go to return it, it is usually overloaded? GRRRRRRRRRRR
codysluder
 
Posts: 1022
Joined: Sun Dec 02, 2007 1:43 pm

Re: 171.64.65.56 ???

Postby 7up1n3 » Tue Oct 05, 2010 7:08 pm

PantherX wrote:
7up1n3 wrote:...What happens to WUs that fail to upload like this? Do they expire when the deadline passes? Is the bonus frittered away?
In this case, you bonus points will reduce according to the delay caused by the Server. If you cross the Preferred Deadline, you will be assigned Base Credits. If you cross the Final Deadline, you will not get any credits and the Client will delete the WU and move on. Sorry but that is the way it currently works. Hopefully when new SMP Servers are added, this will be history and we all will be happy.

Wow. It shouldn't be that difficult to read the code detailing beginning and end of the processing time, and rewarding the contribution accordingly so that users aren't penalized for server side issues.
7up1n3
 
Posts: 68
Joined: Sun Dec 02, 2007 3:55 am

Re: 171.64.65.56 ???

Postby codysluder » Tue Oct 05, 2010 8:08 pm

You're assuming that all failure to upload problems are server-side issues. That may be true much of the time and it's certainly true right now, but I'll bet that folks would find a way to cheat. I have not tried it, but what happens if you adjusting your clock to show that the WU finished one minuted after you downloaded it but then corrected the clock before you uploaded the WU, blaming all of your processing time on a server outage.

The points cannot be based on any clock other than the server clock.

Stanford said that they recognized the possibility of server problems and that they'd do their best to maintain the servers but it was a risk that you'd have to accept. Are you saying that you don't believe that they are making a sincere effort to correct the problems? If so, I disagree with you.
codysluder
 
Posts: 1022
Joined: Sun Dec 02, 2007 1:43 pm

Re: 171.64.65.56 ???

Postby codysluder » Tue Oct 05, 2010 8:14 pm

Assuming that the problems with this server will continue until a new server comes on-line, some folks might actually benefit by temporarily suspending SMP. Multiple uniprocessor clients normally earn less PPD than SMP, but you don't have to reduce the PPD for SMP very much before multiple clients win out because of their reliability. At least it's an alternative to consider.
codysluder
 
Posts: 1022
Joined: Sun Dec 02, 2007 1:43 pm

Re: 171.64.65.56 ???

Postby Tobit » Wed Oct 06, 2010 1:27 pm

Please kick it again, net load is over 200 yet again.
User avatar
Tobit
 
Posts: 342
Joined: Thu Apr 17, 2008 3:35 pm
Location: Manchester, NH USA

Re: 171.64.65.56 ???

Postby 7up1n3 » Wed Oct 06, 2010 6:46 pm

codysluder wrote:You're assuming that all failure to upload problems are server-side issues.

I'm not assuming that at all. But I am assuming that, when server issues do arise, that a system could be implemented to address it. This has been done in the past, with mass point credits as users are "caught up", and would simply need to be adjusted to accommodate the bonus system.
7up1n3
 
Posts: 68
Joined: Sun Dec 02, 2007 3:55 am

PreviousNext

Return to Issues with a specific server

Who is online

Users browsing this forum: Tashgan and 2 guests

cron