Page 2 of 6

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 4:01 pm
by kasson
Thanks for the heads-up. The server should be running again.

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 4:43 pm
by bollix47
Thank you Peter.

The WU has uploaded. :D

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 6:17 pm
by tear
Edit by Bruce: Moved this post to the proper forum.
tear wrote:There's still something funky w/that server.

One of my machines picked a unit up w/o issue.

Another one picked a small WU which was tagged as P6903 (FahCore, instead of starting, leaks memory
until all memory is exhausted). . . .
Seems to be same unit as reported here -- viewtopic.php?f=18&t=20523#p204530
I'll remove machinedependent.dat. Probably a "BAD WU" ?

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 6:37 pm
by tear
Another machine, different issue:

Code: Select all

[18:33:23] Connecting to http://130.237.232.237:8080/
[18:33:24] Posted data.
[18:33:24] Initial: 0000; - Receiving payload (expected size: 512)
[18:33:24] Conversation time very short, giving reduced weight in bandwidth avg
[18:33:24] - Downloaded at ~1 kB/s
[18:33:24] - Averaged speed for that direction ~389 kB/s
[18:33:24] + Received work.
[18:33:24] + Closed connections
[18:33:29] 
[18:33:29] + Processing work unit
[18:33:29] Core required: FahCore_a5.exe
[18:33:29] Core found.
[18:33:29] Working on queue slot 07 [January 17 18:33:29 UTC]
[18:33:29] + Working ...
[18:33:29] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 07 -np 64 -checkpoint 15 -forceasm -verbose -lifeline 46262 -version 634'
[18:33:29] 
[18:33:29] *------------------------------*
[18:33:29] Folding@Home Gromacs SMP Core
[18:33:29] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[18:33:29] 
[18:33:29] Preparing to commence simulation
[18:33:29] - Assembly optimizations manually forced on.
[18:33:29] - Not checking prior termination.
[18:33:29] Couldn't Decompress
[18:33:29] Called DecompressByteArray: compressed_data_size=0 data_size=0, decompressed_data_size=0 diff=0
[18:33:29] -Error: Couldn't update checksum variables
[18:33:29] Error: Could not open work file
[18:33:29] 
[18:33:29] Folding@home Core Shutdown: FILE_IO_ERROR
[18:33:29] CoreStatus = 75 (117)
[18:33:29] Error opening or reading from a file.
[18:33:29] Deleting current work unit & continuing...
Removal of machinedependent.dat helped. Looks like some per-machine server DB entries got corrupted?

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 7:51 pm
by noname2
bollix47 wrote:Thank you Peter.

The WU has uploaded. :D
I have the same. Nice job.

But I got a 6903 again! Through the transition to the issuance of bigadv jobs for 16 cores and more (today January 17 already) will be no problems with sending later? Or is it better to delete this job and fold simple jobs?

Thanka a lot! Sorry for english.

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 8:03 pm
by 7im
The change is for new projects. I don't expect the existing projects to change as they finish their normal run to completion.

Dr. Kasson can correct me if I'm wrong. ;)

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 8:03 pm
by bollix47
The deadlines on the Project Summary have not changed, so you should be fine. :wink:

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 8:27 pm
by noname2
bollix47 wrote:The deadlines on the Project Summary have not changed, so you should be fine.
So the problem with uploadIng to 130.237.232.237 is not associated with this?

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 8:36 pm
by bruce
tear wrote:Seems to be same unit as reported here -- viewtopic.php?f=18&t=20523#p204530
I'll remove machinedependent.dat. Probably a "BAD WU" ?
We have a forum where you can report a potentially bad WU and there are no reports in that forum. You've now reported it twice but the reports are buried within in a topic associated with a bad server and the bad server problem has been fixed.

Reporting the same WU in two different posts won't help with the possibility that it's a BAD WU that needs to be dealt with. I've moved one of your posts to the proper forum where it can get the attention it deserves. See viewtopic.php?f=19&t=20534&p=204710#p204710
tear wrote:There's still something funky w/that server.

One of my machines picked a unit up w/o issue.

Another one picked a small WU which was tagged as P6903 (FahCore, instead of starting, leaks memory
until all memory is exhausted). . . .
No, nothing funky with that server ... but something clearly funky with that WU.

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 8:39 pm
by -alias-
After standing still for 6 hours, everything is now worked out and the rig is folding agian. Everything in the queue were also uploaded.

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 8:41 pm
by bollix47
noname2 wrote:
bollix47 wrote:The deadlines on the Project Summary have not changed, so you should be fine.
So the problem with uploadIng to 130.237.232.237 is not associated with this?

I don't know the reason(s) behind the problems with that server, but I'm fairly certain that the deadlines displayed in the Project Summary (link at top of page) at the time of downloading are the ones used to calculate your bonus and determine whether you're late or not. So even if the deadlines change tomorrow the WUs downloaded before that change should not be affected.

In case you missed it see 7im's post above mine.

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 8:53 pm
by bruce
noname2 wrote:
bollix47 wrote:The deadlines on the Project Summary have not changed, so you should be fine.
So the problem with uploadIng to 130.237.232.237 is not associated with this?
No. Servers do occasionally have problems and Kasson has fixed whatever happened to this server.

As far as anybody has seen, there have beenl no changes associated with that announcement yet. Please read it carefully. It says that the changes are to happen no earlier than on Monday January 16, 2012.

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 9:02 pm
by noname2
bollix47
I have no problem with the deadlines. I have a 40% margin to spare on 6903 (2 of 5 days). You don't understand me. But the necessary information I received.

bruce
Yes, I saw it. Two months have passed. Can you assign more exact time? :wink:

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 9:20 pm
by tear
bruce wrote:
tear wrote:Seems to be same unit as reported here -- viewtopic.php?f=18&t=20523#p204530
I'll remove machinedependent.dat. Probably a "BAD WU" ?
We have a forum where you can report a potentially bad WU and there are no reports in that forum. You've now reported it twice but the reports are buried within in a topic associated with a bad server and the bad server problem has been fixed.

Reporting the same WU in two different posts won't help with the possibility that it's a BAD WU that needs to be dealt with. I've moved one of your posts to the proper forum where it can get the attention it deserves. See viewtopic.php?f=19&t=20534&p=204710#p204710
Oh. Been helping the project for several years but I had no idea of existence of that forum. Can you believe that?
You've been very helpful and are of extreme value to this community!
bruce wrote:
tear wrote:There's still something funky w/that server.

One of my machines picked a unit up w/o issue.

Another one picked a small WU which was tagged as P6903 (FahCore, instead of starting, leaks memory
until all memory is exhausted). . . .
No, nothing funky with that server ... but something clearly funky with that WU.
Thank you for very definitive answer. It's always good to hear from solid and unquestionable source.
Either way, I'm glad problem is not on user's end. Makes me feel very good about myself.

Care to comment on the other machine that gets short HTTP responses? I'd appreciate that a lot.

Re: 130.237.232.237

Posted: Tue Jan 17, 2012 11:17 pm
by bruce
tear wrote:Care to comment on the other machine that gets short HTTP responses? I'd appreciate that a lot.
The HTTP protocol has failed. I can only guess why. :!:

Is there anything between the client and the server that might be changing the HTTP datastream?

One possibility is that it's similar to the proxy issue where one of the FAH developers offered an interesting comment:
The F@H WS and CS (servers) only understand a very basic HTTP protocol. If the proxy were to use methods not supported by the WS/CS but that were part of the HTTP 1.1/1.0 standard it would fail.
Early versions of FAH were designed around a (very) limited subset of HTTP standards. That support was sufficient to get the project started without the overhead of support for "unnecessary" HTTP functionality. Much of that old code is still in place and it functions fine as long as proxy developers or 3rd party developers don't try to interject or modify protocol messages that are not part of the "very basic" HTTP protocol that he mentioned.

Note this clause of the EULA:
. . . You may only use unmodified versions of Folding@home obtained through authorized distributors to connect to the Folding@Home servers. Use of other software to connect to the Folding@home servers is strictly prohibited. . . .
In one sense, I'd say that that clause prohibits a proxy ... yet FAH does provide (limited) support for a proxy ... so somebody besides myself will have to figure out the implications.