new 6.23 public beta client

Moderators: Site Moderators, PandeGroup

new 6.23 public beta client

Postby kasson » Fri Oct 10, 2008 3:18 pm

See the announcement in the PandeGroup News forum. This client provides improved EUE handling. A drop-in binary is available at: http://www.stanford.edu/~kasson/folding ... 32-x86.exe

Installers to follow. Please post feedback and bug reports here.
User avatar
kasson
Pande Group Member
 
Posts: 2087
Joined: Thu Nov 29, 2007 10:37 pm

Re: new 6.23 public beta client

Postby mikeb12 » Fri Oct 10, 2008 5:16 pm

I've been running this [mod edit] client for a couple weeks and it does handle eue's much better. when the client eue's, it sends partial results, then picks up a new wu. no more churning the same old bad wu 3 times in a row like 6.22 did. I definitely rec upgrading to 6.23 if you run smp... it doesn't fix the eue problems we're having with 2665's, but it does improve how the client handles them..

I have had one bug. it sometimes hangs at "finished unit". the only way to get it to send is to stop the client, kill all the fahcore_a1 processes, run qfix, and restart the client. it will go missing work files, but then send the 100% that hung as a 100% credit wu for points and pick up a new wu. that has only happened about 5 times in the past 2 weeks out of 7 smp clients going 24/7 for me. (Vista64 mpich 6.23)

other than that this edition is better than any of the 6.22 releases imo, because it sends partial credit for eue wu's.. the 6.22's one would reprocess the same eue'd wu 3 times before picking up a new wu, and then it wouldn't give credit unless you qfixed it during one of those 3 attempts.

the only time I have to babysit this one is for units that hang at "finished unit". and you see those right away in fahmon, since it hangs and turns red. the old 6.22 eue's we used to have to log search to find, because the client would always stay green in fahmon while it churned through a bad wu 3 times and wasted tons of cpu cycles.. this new 6.23 will send the eue as partial credit on the first eue, and pick up a new wu to work on with no intervention from the user. a definite improvement in the babysit nature of smp behavior.

First rule of fight club! -7im
mikeb12
 
Posts: 261
Joined: Tue Feb 12, 2008 12:51 pm
Location: South Carolina USA

Re: new 6.23 public beta client

Postby ChrisDTC » Fri Oct 10, 2008 5:51 pm

will give this a try
Image
ChrisDTC
 
Posts: 86
Joined: Thu Dec 06, 2007 6:45 am

Re: new 6.23 public beta client

Postby theo343 » Fri Oct 10, 2008 8:28 pm

Dont like the hang at "finished unit" bug, since im not able to babysit all the time and then could loose days works just as easy. Will give it a try later.
Image
theo343
 
Posts: 514
Joined: Thu Jul 03, 2008 1:43 pm
Location: Norway

Re: new 6.23 public beta client

Postby sdack » Fri Oct 10, 2008 9:41 pm

Nope. Reverting back to 6.22r3 ...

6.23 cannot connect under Vista x64.
sdack
 

Re: new 6.23 public beta client

Postby kasson » Fri Oct 10, 2008 11:37 pm

For us to effectively identify and fix bugs, it's great to have full bug reporting. Please identify your system configuration and post a log snippet showing the error you encounter. Thanks!
User avatar
kasson
Pande Group Member
 
Posts: 2087
Joined: Thu Nov 29, 2007 10:37 pm

Re: new 6.23 public beta client

Postby mikeb12 » Sat Oct 11, 2008 12:05 am

I and another user have been trying to figure this little hang bug out, and I have a suspicion it has to do with a netowrk/smp instability.
I've had a couple router reboots in the past 2 week and even though the pc gets the same IP back, it goes through it's little recycle at the nic...
which may explain why I only get this hang error sporadically for the last 2 weeks. that's how long I've had a new dsl modem and having to reboot the router at various times due to some abnormal behavior with the modem... that magicly coincides with this hang error and when I started with this client.

If you do encounter the HANG at "Folding@home Core Shutdown: FINISHED_UNIT",
It's an easy fix and you'll get full points credit for thw WU.. here's how..
This example uses Vista64 6.23 mpich
--It Hangs and goes red in fahmon:
Code: Select all
[15:49:56] Completed 495000 out of 500000 steps  (99 percent)
[16:03:08] Writing local files
[16:03:08] Completed 500000 out of 500000 steps  (100 percent)
[16:03:08] Writing final coordinates.
[16:03:09] Past main M.D. loop
[16:03:10] Will end MPI now
[16:04:10]
[16:04:10] Finished Work Unit:
[16:04:10] - Reading up to 3724560 from "work/wudata_06.arc": Read 3724560
[16:04:10] - Reading up to 1779536 from "work/wudata_06.xtc": Read 1779536
[16:04:10] goefile size: 0
[16:04:10] logfile size: 17309
[16:04:10] Leaving Run
[16:04:10] - Writing 5525805 bytes of core data to disk...
[16:04:10]   ... Done.
[16:04:10] - Failed to delete work/wudata_06.sas
[16:04:10] - Failed to delete work/wudata_06.goe
[16:04:10] Warning:  check for stray files
[16:04:10] - Shutting down core
[16:06:10]
[16:06:10] Folding@home Core Shutdown: FINISHED_UNIT
[16:06:10]
[16:06:10] Folding@home Core Shutdown: FINISHED_UNIT

Folding@Home Client Shutdown at user request.

Folding@Home Client Shutdown.


--Stop the client and make sure all fahcore_a1 processes are gone in task manager.
--Run qfix.
--Then restart client normally.
It will error with missing work files, but send the finished unit for full credit and pick up a fresh wu.
Code: Select all
--- Opening Log file [October 10 16:52:37 UTC]


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23 Beta R1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: D:\Software\fah\fah1
Executable: D:\Software\fah\fah1\Folding@home-Win32-x86.exe
Arguments: -smp -local

[16:52:37] - Ask before connecting: No
[16:52:37] - User name: mikeb12 (Team 36362)
[16:52:37] - User ID: 4649F33816C9AA36
[16:52:37] - Machine ID: 1
[16:52:37]
[16:52:38] Loaded queue successfully.
[16:52:38]
[16:52:38] + Processing work unit
[16:52:38] Work type a1 not eligible for variable processors
[16:52:38] Core required: FahCore_a1.exe
[16:52:38] Core found.
[16:52:38] Using generic mpiexec calls
[16:52:38] Working on queue slot 06 [October 10 16:52:38 UTC]
[16:52:38] + Working ...
[16:52:38]
[16:52:38] *------------------------------*
[16:52:38] Folding@Home Gromacs SMP Core
[16:52:38] Version 1.74 (March 10, 2007)
[16:52:38]
[16:52:38] Preparing to commence simulation
[16:52:38] - Looking at optimizations...
[16:52:38] - Created dyn
[16:52:38] - Files status OK
[16:52:38]
[16:52:38] Folding@home Core Shutdown: MISSING_WORK_FILES
[16:52:38] Finalizing output
[16:54:41] CoreStatus = 1 (1)
[16:54:41] Sending work to server
[16:54:41] Project: 2653 (Run 18, Clone 26, Gen 85)


[16:54:41] + Attempting to send results [October 10 16:54:41 UTC]
[16:56:10] + Results successfully sent
[16:56:10] Thank you for your contribution to Folding@Home.
[16:56:30] - Preparing to get new work unit...
[16:56:30] + Attempting to get work packet
[16:56:30] - Connecting to assignment server
[16:56:31] - Successful: assigned to (171.64.65.64).
[16:56:31] + News From Folding@Home: Welcome to Folding@Home
[16:56:31] Loaded queue successfully.
[16:57:06] + Closed connections
[16:57:11]
[16:57:11] + Processing work unit
[16:57:11] Work type a1 not eligible for variable processors
[16:57:11] Core required: FahCore_a1.exe
[16:57:11] Core found.
[16:57:11] Using generic mpiexec calls
[16:57:11] Working on queue slot 07 [October 10 16:57:11 UTC]
[16:57:11] + Working ...
[16:57:11]
[16:57:11] *------------------------------*
[16:57:11] Folding@Home Gromacs SMP Core
[16:57:11] Version 1.74 (March 10, 2007)
[16:57:11]
[16:57:11] Preparing to commence simulation
[16:57:11] - Ensuring status. Please wait.
[16:57:28] - Looking at optimizations...
[16:57:28] - Working with standard loops on this execution.
[16:57:28] - Previous termination of core was improper.
[16:57:28] - Going to use standard loops.
[16:57:28] - Files status OK
[16:57:40] (decompressed 514.9 percent)
[16:57:40] - Starting from initial work packet
[16:57:40]
[16:57:40] Project: 2665 (Run 3, Clone 967, Gen 56)
[16:57:40]
[16:57:40] 65 (Run 3, Clone 967, Gen 56)
[16:57:40]
[16:57:44] Entering M.D.
[16:57:52] Rejecting checkpoint
[16:57:54]
[16:57:54] Writing local files
[16:57:57]
[16:57:57] Writing local files
[16:58:06] Extra SSE boost OK.
[16:58:07] Writing local files
[16:58:07] Completed 0 out of 250000 steps  (0 percent)
[17:16:12] Writing local files
[17:16:13] Completed 2500 out of 250000 steps  (1 percent)
[17:33:26] Writing local files
[17:33:26] Completed 5000 out of 250000 steps  (2 percent)
[17:50:38] Writing local files
[17:50:38] Completed 7500 out of 250000 steps  (3 percent)
[18:07:49] Writing local files
[18:07:49] Completed 10000 out of 250000 steps  (4 percent)
[18:25:01] Writing local files
[18:25:01] Completed 12500 out of 250000 steps  (5 percent)
[18:42:11] Writing local files
[18:42:11] Completed 15000 out of 250000 steps  (6 percent)
[18:59:22] Writing local files
[18:59:22] Completed 17500 out of 250000 steps  (7 percent)
[19:16:33] Writing local files
[19:16:33] Completed 20000 out of 250000 steps  (8 percent)
[19:33:45] Writing local files
[19:33:45] Completed 22500 out of 250000 steps  (9 percent)
[19:50:58] Writing local files
[19:50:58] Completed 25000 out of 250000 steps  (10 percent)
[20:08:22] Writing local files
[20:08:22] Completed 27500 out of 250000 steps  (11 percent)
[20:25:48] Writing local files
[20:25:48] Completed 30000 out of 250000 steps  (12 percent)
[20:43:01] Writing local files
[20:43:01] Completed 32500 out of 250000 steps  (13 percent)
[21:00:31] Writing local files
[21:00:31] Completed 35000 out of 250000 steps  (14 percent)
[21:18:46] Writing local files
[21:18:46] Completed 37500 out of 250000 steps  (15 percent)
[21:37:12] Writing local files
[21:37:12] Completed 40000 out of 250000 steps  (16 percent)
[21:54:36] Writing local files
[21:54:36] Completed 42500 out of 250000 steps  (17 percent)
[22:11:50] Writing local files
[22:11:50] Completed 45000 out of 250000 steps  (18 percent)
[22:29:05] Writing local files
[22:29:05] Completed 47500 out of 250000 steps  (19 percent)
[22:46:19] Writing local files
[22:46:19] Completed 50000 out of 250000 steps  (20 percent)
Last edited by mikeb12 on Sat Oct 11, 2008 9:30 am, edited 1 time in total.
mikeb12
 
Posts: 261
Joined: Tue Feb 12, 2008 12:51 pm
Location: South Carolina USA

Re: new 6.23 public beta client

Postby mikeb12 » Sat Oct 11, 2008 12:12 am

The big improvement I've seen with this client is how it handles bad WU eue's...
Vista 64 6.23 mpich
Notice: in this example
--it finished and sends a successful unit
--then picks up a bad wu and eue's 2% in...
--immediately sends it for partial credit...
--and picks up a brand new wu to work on..

Code: Select all
[07:34:47] Completed 250000 out of 250000 steps  (100 percent)
[07:34:47] Writing final coordinates.
[07:34:49] Past main M.D. loop
[07:34:49] Will end MPI now
[07:35:49]
[07:35:49] Finished Work Unit:
[07:35:49] - Reading up to 21310704 from "work/wudata_07.arc": Read 21310704
[07:35:49] - Reading up to 560352 from "work/wudata_07.xtc": Read 560352
[07:35:49] goefile size: 0
[07:35:49] logfile size: 221659
[07:35:49] Leaving Run
[07:35:52] - Writing 22099087 bytes of core data to disk...
[07:35:52]   ... Done.
[07:35:52] - Failed to delete work/wudata_07.sas
[07:35:52] - Failed to delete work/wudata_07.goe
[07:35:52] Warning:  check for stray files
[07:35:52] - Shutting down core
[07:37:52]
[07:37:52] Folding@home Core Shutdown: FINISHED_UNIT
[07:37:52]
[07:37:52] Folding@home Core Shutdown: FINISHED_UNIT
[07:37:57] CoreStatus = 64 (100)
[07:37:57] Sending work to server
[07:37:57] Project: 2665 (Run 3, Clone 152, Gen 54)


[07:37:57] + Attempting to send results [October 6 07:37:57 UTC]
[07:43:45] + Results successfully sent
[07:43:45] Thank you for your contribution to Folding@Home.
[07:43:45] + Number of Units Completed: 40

[07:43:49] - Preparing to get new work unit...
[07:43:49] + Attempting to get work packet
[07:43:49] - Connecting to assignment server
[07:43:50] - Successful: assigned to (171.64.65.64).
[07:43:50] + News From Folding@Home: Welcome to Folding@Home
[07:43:50] Loaded queue successfully.
[07:44:38] + Closed connections
[07:44:38]
[07:44:38] + Processing work unit
[07:44:38] Work type a1 not eligible for variable processors
[07:44:38] Core required: FahCore_a1.exe
[07:44:38] Core found.
[07:44:38] Using generic mpiexec calls
[07:44:38] Working on queue slot 08 [October 6 07:44:38 UTC]
[07:44:38] + Working ...
[07:44:38]
[07:44:38] *------------------------------*
[07:44:38] Folding@Home Gromacs SMP Core
[07:44:38] Version 1.74 (March 10, 2007)
[07:44:38]
[07:44:38] Preparing to commence simulation
[07:44:38] - Ensuring status. Please wait.
[07:44:43] - Starting from initial work packet
[07:44:43]
[07:44:43] Project: 2665 (Run 0, Clone 810, Gen 35)
[07:44:43]
[07:44:43] Assembly optimizations on if available.
[07:44:43] Entering M.D.
[07:45:03] al work packet
[07:45:03]
[07:45:03] Project: 2665 (Run 0, Clone 810, Gen 35)
[07:45:03]
[07:45:04] 65 (Run 0, Clone 810, Gen 35)
[07:45:04]
[07:45:04] Entering M.D.
[07:45:12] Rejecting checkpoint
[07:45:14] Protein: HGG in water
[07:45:14] Writing local files
[07:45:21] Extra SSE boost OK.
[07:45:21] Writing local files
[07:45:22] Completed 0 out of 250000 steps  (0 percent)
[08:03:48] Writing local files
[08:03:48] Completed 2500 out of 250000 steps  (1 percent)
[08:20:46] Writing local files
[08:20:46] Completed 5000 out of 250000 steps  (2 percent)
[08:25:36]  often see other project units terminating early like this
[08:25:36]   too, you may wish to check the stability of youlogfile size: 12967
[08:25:36] - Writing 13517 bytes of core data to disk...
[08:25:36]   ... Done.
[08:25:36] e size: 12967
[08:25:36] - Writing 13517 bytes of core data to disk...
[08:25:36]   ... Done.
[08:25:36] re data to disk...
[08:25:36]   ... Done.
[08:25:36] - Failed to delete work/wudata_08.bed
[08:25:36] Warning:  check for stray files
[08:25:36]
[08:25:36] Folding@home Core Shutdown: EARLY_UNIT_END
[08:25:36] Finalizing output
[08:25:36] , etc.).
[08:25:36] Going to send back what have done.
[08:25:36] logfile size: 9422
[08:25:36] - Writing 9972 bytes of core data to disk...
[08:25:36]   ... Done.
[08:27:36]
[08:27:36] Folding@home Core Shutdown: EARLY_UNIT_END
[08:27:36]
[08:27:36] Folding@home Core Shutdown: EARLY_UNIT_END
[08:27:40] CoreStatus = 7B (123)
[08:27:40] Sending work to server
[08:27:40] Project: 2665 (Run 0, Clone 810, Gen 35)


[08:27:40] + Attempting to send results [October 6 08:27:40 UTC]
[08:27:41] + Results successfully sent
[08:27:41] Thank you for your contribution to Folding@Home.
[08:27:45] - Preparing to get new work unit...
[08:27:45] + Attempting to get work packet
[08:27:45] - Connecting to assignment server
[08:27:46] - Successful: assigned to (171.64.65.64).
[08:27:46] + News From Folding@Home: Welcome to Folding@Home
[08:27:46] Loaded queue successfully.
[08:28:20] + Closed connections
[08:28:25]
[08:28:25] + Processing work unit
[08:28:25] Work type a1 not eligible for variable processors
[08:28:25] Core required: FahCore_a1.exe
[08:28:25] Core found.
[08:28:25] Using generic mpiexec calls
[08:28:25] Working on queue slot 09 [October 6 08:28:25 UTC]
[08:28:25] + Working ...
[08:28:25]
[08:28:25] *------------------------------*
[08:28:25] Folding@Home Gromacs SMP Core
[08:28:25] Version 1.74 (March 10, 2007)
[08:28:25]
[08:28:25] Preparing to commence simulation
[08:28:25] - Ensuring status. Please wait.
[08:28:30] - Starting from initial work packet
[08:28:30]
[08:28:30] Project: 2665 (Run 2, Clone 750, Gen 55)
[08:28:30]
[08:28:30] Assembly optimizations on if available.
[08:28:30] Entering M.D.
[08:28:49] al work packet
[08:28:49]
[08:28:49] Project: 2665 (Run 2, Clone 750, Gen 55)
[08:28:49]
[08:28:50] Entering M.D.
[08:28:51] ne 750, Gen 55)
[08:28:51]
[08:28:52] Entering M.D.
[08:28:59] Rejecting checkpoint
[08:29:02] Protein: HGG with glycosylations
[08:29:02] Writing local files
[08:29:11] Extra SSE boost OK.
[08:29:11] Writing local files
[08:29:12] Completed 0 out of 250000 steps  (0 percent)
[08:47:21] Writing local files
[08:47:21] Completed 2500 out of 250000 steps  (1 percent)
[09:04:42] Writing local files
[09:04:42] Completed 5000 out of 250000 steps  (2 percent)
mikeb12
 
Posts: 261
Joined: Tue Feb 12, 2008 12:51 pm
Location: South Carolina USA

Re: new 6.23 public beta client

Postby mikeb12 » Sat Oct 11, 2008 12:15 am

my 2 posts above aren't really bug reports, just thought I'd post some relevant info for other users to see.
1. how to handle the hang bug.
2. whats so neat about this release. (no more babysitting and qfixing bad wu eue's)
mikeb12
 
Posts: 261
Joined: Tue Feb 12, 2008 12:51 pm
Location: South Carolina USA

Re: new 6.23 public beta client

Postby sdack » Sat Oct 11, 2008 1:50 am

kasson wrote:For us to effectively identify and fix bugs, it's great to have full bug reporting. Please identify your system configuration and post a log snippet showing the error you encounter. Thanks!

Well, the error I encounter is that it does not work. It says the connection is being refused. The system is Vista x64. If I "drop in" 6.22r3 it works fine again but the log is now gone. What is it that you have changed in 6.23?

Perhaps wait and see if more people have this problem.
sdack
 

Re: new 6.23 public beta client

Postby mikeb12 » Sat Oct 11, 2008 9:03 am

sdack,
look in your install folder and you'll find 2 files..
FAHlog.txt
FAHlog-Prev.txt

that last one stores all client logs for about 4-5 days on mine. so even if you dont see what you need in fahmon viewing the log, then open the install dir and open FAHlog-Prev.txt in notepad and it will have it for sure.. that's where it rolls the info back through in case you need it.

I know it works on Vista64, I have 7 going right now.
Image
mikeb12
 
Posts: 261
Joined: Tue Feb 12, 2008 12:51 pm
Location: South Carolina USA

Re: new 6.23 public beta client

Postby Zagen30 » Sat Oct 11, 2008 11:23 pm

I've had the 100% hang bug happen to me a few times, but I've never had to qfix it. I've just shut down and immediately restarted the client, and it's completed and sent the finished WU perfectly fine.
Zagen30
 
Posts: 904
Joined: Tue Mar 25, 2008 1:45 am

Re: new 6.23 public beta client

Postby jrweiss » Sun Oct 12, 2008 2:16 am

I tried "dropping in" the 6.23 to replace my 5.91 client (XP Pro SP3). I ran -configonly, then started the client. It fired up OK, but got a Client/Core Communications error a few minutes later. Dropped in the 5.91 again, deleted client.cfg and reran -configonly, restrated client. Is running fine.

Is there a known problem trying to upgrade from 5.91 to 6.2x in the middle of a WU?

Is there some incompatibility between MPI or other pieces of 5.91 and 6.2x?
Q9650/HD4670 (SMP+GPU), Q9450 (4xCPU+GPU), Lenovo T9400 laptop (2xCPU [Win7] or SMP [Ubuntu]), IBM PM745 laptop (CPU), and 2 old Dells Folding@home and elsewhere.
User avatar
jrweiss
 
Posts: 920
Joined: Tue Dec 04, 2007 7:56 am
Location: Gotta guess!

Re: new 6.23 public beta client

Postby ChrisDTC » Sun Oct 12, 2008 3:23 am

jrweiss wrote:
Is there some incompatibility between MPI or other pieces of 5.91 and 6.2x?

Yes, you should upgrade to 6.xx to run 6.23
Image
ChrisDTC
 
Posts: 86
Joined: Thu Dec 06, 2007 6:45 am

Re: new 6.23 public beta client

Postby Sahkuhnder » Sun Oct 12, 2008 7:05 am

jrweiss wrote:I tried "dropping in" the 6.23 to replace my 5.91 client (XP Pro SP3). I ran -configonly, then started the client. It fired up OK, but got a Client/Core Communications error a few minutes later. Dropped in the 5.91 again, deleted client.cfg and reran -configonly, restrated client. Is running fine.

Is there a known problem trying to upgrade from 5.91 to 6.2x in the middle of a WU?

Is there some incompatibility between MPI or other pieces of 5.91 and 6.2x?



I dropped in the 6.23 to replace the 5.9x in both XP and Vista and had no problems. No -configonly, the new client just restarts the WU right where the old one left off.

That doesn't count the one time I forgot to add the -smp flag to the new 6.23 shortcut. That one gave an error and crashed. Added the -smp and it restarted the same WU, only back at the beginning at 0 again. :oops:

Code: Select all
[04:40:12] Entering M.D.
[04:40:19] Rejecting checkpoint
[04:40:21] Gromacs error.
[04:40:21]
[04:40:21] Folding@home Core Shutdown: UNKNOWN_ERROR
[04:40:21]
[04:40:21] Folding@home Core Shutdown: UNKNOWN_ERROR
[04:40:25] CoreStatus = 79 (121)
[04:40:25] Client-core communications error: ERROR 0x79
[04:40:25] This is a sign of more serious problems, shutting down.
Image
Sahkuhnder
 
Posts: 249
Joined: Sun Dec 02, 2007 6:28 am
Location: Vegas Baby! Yeah!

Next

Return to Windows v6.24 Beta with -smp specified (core_a1)

Who is online

Users browsing this forum: No registered users