problem with wu 11292 3,72,15

Moderators: Site Moderators, FAHC Science Team

Post Reply
rav4gema
Posts: 1
Joined: Sun Jan 20, 2013 10:59 am

problem with wu 11292 3,72,15

Post by rav4gema »

my ati 6850 keep on stopping with this wu at 2 sec remaining what should i do should i just stop the program and restart or just not use the gpu at all this is the third time this happened and i normally just rmove the slot the wu is on and then re add it to get a new wu it will dl a new work unit. and then this will happen over and over with the project 11292 and 11293
bollix47
Posts: 2942
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: problem with wu 11292 3,72,15

Post by bollix47 »

Welcome to the folding@home support forum rav4gema.

Please post all three sections of your log as explained here.

Also, what version of graphic drivers are you using?
art_l_j_PlanetAMD64
Posts: 472
Joined: Sun May 30, 2010 2:28 pm

Re: problem with wu 11292 3,72,15

Post by art_l_j_PlanetAMD64 »

Here are the instructions on how to post information contained in your log file(s). You can find the log file(s) by going here:
Start -> All Programs -> FAHClient -> Data Directory
The log in v7 is called log.txt and is located in the Data Directory. In that same directory you will see a sub-directory called 'logs'. There you will find a number of previous logs depending on how long you've been folding and which options have been set.

As bollix47 said above, you should include the information at the top of the log file (System Info and Configuration) as described here, as well as the part of the log file starting just before where the problem appeared.

When you are posting a log file extract, please enclose the log file extract in a 'code' window. You can do that by clicking on the 'Code' button above the editing window when you are typing a message. It makes the message much easier to read and analyze. Thanks!
art_l_j_PlanetAMD64
Over 1.04 Billion Total Points
Over 185,000 Work Units
Over 3,800,000 PPD
Overall rank (if points are combined) 20 of 1721690
In memory of my Mother May 12th 1923 - February 10th 2012
art_l_j_PlanetAMD64
Posts: 472
Joined: Sun May 30, 2010 2:28 pm

Re: problem with wu 11292 3,72,15

Post by art_l_j_PlanetAMD64 »

rav4gema wrote:my ati 6850 keep on stopping with this wu at 2 sec remaining what should i do should i just stop the program and restart or just not use the gpu at all this is the third time this happened and i normally just rmove the slot the wu is on and then re add it to get a new wu it will dl a new work unit. and then this will happen over and over with the project 11292 and 11293
I have sometimes seen a WU appear to stop when it is 99.99% completed (just a few seconds remaining). The FahCore seems to be doing some processing of the completed WU, as I see constant activity on the HDD LED. Just waiting some more time (up to 4 or 5 minutes) is all you have to do, the completed WU will be sent and a new WU will be downloaded. Please see this log file extract:

Code: Select all

16:43:06:WU01:FS00:0xa4:logfile size: 39169
16:43:06:WU01:FS00:0xa4:Leaving Run
16:43:08:WU01:FS00:0xa4:- Writing 4419281 bytes of core data to disk...
16:43:10:WU01:FS00:0xa4:Done: 4418769 -> 4245423 (compressed to 96.0 percent)
16:43:10:WU01:FS00:0xa4:  ... Done.
16:47:46:WU01:FS00:0xa4:- Shutting down core
16:47:46:WU01:FS00:0xa4:
16:47:46:WU01:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
16:48:20:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
16:48:20:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:7808 run:5 clone:201 gen:18 core:0xa4 unit:0x0000001b0a3b1e874e30fa9eb9a519a5
Note that it took 4 minutes and 36 seconds to go from this:
16:43:10:WU01:FS00:0xa4: ... Done.
to this:
16:47:46:WU01:FS00:0xa4:- Shutting down core

So the next time this happens, just wait some more time, and the WU will be sent and a new WU will be downloaded and started.

EDIT:
It might be a good idea for FAHControl to display something like this:
Processing the completed WU, please wait ...
while this processing of the completed WU is being done. The first time this happened to me, I almost 'Quit' FAHControl and dumped the WU, thinking it was bad. Fortunately, I waited the 4:36 and then the slot sent the completed WU and downloaded a new WU. I came This Close (thumb and forefinger 1/4 inch apart) to doing that! :D
art_l_j_PlanetAMD64
Over 1.04 Billion Total Points
Over 185,000 Work Units
Over 3,800,000 PPD
Overall rank (if points are combined) 20 of 1721690
In memory of my Mother May 12th 1923 - February 10th 2012
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: problem with wu 11292 3,72,15

Post by bruce »

There has always been a relatively long pause for disk activity after processing has been completed. Presumably the various files that have been created between 0% and 99.9% must be combined, compressed, and prepared for uploading. The time does seem to be somewhat related to the size of the upload package. It also has been reported made worse by an ext4 filesystem (not your problem) but in the case of GPU projects, the servicing of those files will depend on the availability of CPU resources. It's probably worse if you're running SMP:-1 because Done: 4418769 -> 4245423 (compressed to 96.0 percent) is going to interrupt SMP quite a few times.

Do note that in your case, it took more than 5 minutes {from 16:43:06 to 16:48:20} before the WU was actually safely written to disk. If you had interrupted the process before the FINISHED_UNIT (100 = 0x64) message, you most likely would have lost all of your work. There have been reports of restarting the WU discarding everything and starting from 0% since the checkpoint files at 99.9% are no longer present. In other words DO NOT GET IMPATIENT AT THAT POINT.
art_l_j_PlanetAMD64 wrote:It might be a good idea for FAHControl to display something like this:
Processing the completed WU, please wait ...
while this processing of the completed WU is being done. The first time this happened to me, I almost 'Quit' FAHControl and dumped the WU, thinking it was bad. Fortunately, I waited the 4:36 and then the slot sent the completed WU and downloaded a new WU. I came This Close (thumb and forefinger 1/4 inch apart) to doing that! :D
While that would be a nice feature, it's not likely to happen. FAHControl can't report anything that FAHClient doesn't know. Each of the FahCores is developed separately so there's significant differences in how they operate. FAHControl knows it started a FahCore_xx quite some time ago and the FahCore_xx is mostly a black box that is still working on the WU until you see the message FINISHED_UNIT (100 = 0x64). FahCores do make some progress reports that can be picked up by FAHClient but it's unlikely that any (let alone "all") FahCores report "I finished computing and now I'm working on the files." which is exactly the message you'd like to see. Since FAHClient isn't notified, it can't pass that information to you.

Proteneer seems to be developing some kind of new GPU core. We might get an enhancement added to it while it's being developed all existing projects are being run on other FahCores which are NOT being developed. Getting any one of them changed is extremely unlikely.
art_l_j_PlanetAMD64
Posts: 472
Joined: Sun May 30, 2010 2:28 pm

Re: problem with wu 11292 3,72,15

Post by art_l_j_PlanetAMD64 »

bruce wrote:While that would be a nice feature, it's not likely to happen. FAHControl can't report anything that FAHClient doesn't know. Each of the FahCores is developed separately so there's significant differences in how they operate. FAHControl knows it started a FahCore_xx quite some time ago and the FahCore_xx is mostly a black box that is still working on the WU until you see the message FINISHED_UNIT (100 = 0x64). FahCores do make some progress reports that can be picked up by FAHClient but it's unlikely that any (let alone "all") FahCores report "I finished computing and now I'm working on the files." which is exactly the message you'd like to see. Since FAHClient isn't notified, it can't pass that information to you.

Proteneer seems to be developing some kind of new GPU core. We might get an enhancement added to it while it's being developed all existing projects are being run on other FahCores which are NOT being developed. Getting any one of them changed is extremely unlikely.
OK, but couldn't FAHControl still display something, when (for example) the ETA is down to just a couple of seconds? Because at that time FahCore_xx is working on creating the upload data file, even though neither FAHControl or FAHClient is getting reports about the progress. I was just trying to think of something, so that folders like rav4gema and (almost, but not quite) myself will not mistakenly dump fully-completed WUs when we don't see anything happening for several minutes. And for each rav4gema and (almost) myself, how many other folders out there are also dumping fully-completed WUs? And maybe giving up on FAH altogether thinking: "This junk doesn't work!"? I'm not sure that everyone who has a problem will come here to the forum and post a message about it.

The system that had the log file in this example, is a Debian Linux system with an ext4 filesystem. But it is only running smp:2 WUs on its AMD Athlon X2 7550 2.5GHz CPU. And this system is folding-only, I don't use it for anything else, so 100% of the CPU time would be dedicated to creating the upload file when the WU has been completed.
art_l_j_PlanetAMD64
Over 1.04 Billion Total Points
Over 185,000 Work Units
Over 3,800,000 PPD
Overall rank (if points are combined) 20 of 1721690
In memory of my Mother May 12th 1923 - February 10th 2012
bollix47
Posts: 2942
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: problem with wu 11292 3,72,15

Post by bollix47 »

Art, have a look at this thread if you want to reduce the time between done and shutting down core.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: problem with wu 11292 3,72,15

Post by bruce »

art_l_j_PlanetAMD64 wrote:The system that had the log file in this example, is a Debian Linux system with an ext4 filesystem. But it is only running smp:2 WUs on its AMD Athlon X2 7550 2.5GHz CPU. And this system is folding-only, I don't use it for anything else, so 100% of the CPU time would be dedicated to creating the upload file when the WU has been completed.
The point here is that each slot runs one and only one WU at a time. Whether the FahCore is busy waiting on the GPU to finish or waiting on the pcie bus to transfer data or waiting on the ext4 filesystem to write and sync the data to disk, the FahCore is still working on that WU. FAHClient is waiting on the FahCore to say "I'm done (0x64)" and until that happens, the new WU won't start. CPU utilization is really pretty low during that step.

The FahCore was developed and tested when ext3 was popular and nobody noticed a problem until the Linux folks decided to switch to ext4 with default barriers. If the FahCore were being developed today, they would have changed something so it was quicker. Rattledagger has even suggested how to fix it, but I don't expect any response from the PG until a the current FahCore is replaced with a new FahCore.
art_l_j_PlanetAMD64
Posts: 472
Joined: Sun May 30, 2010 2:28 pm

Re: problem with wu 11292 3,72,15

Post by art_l_j_PlanetAMD64 »

bollix47 wrote:Art, have a look at this thread if you want to reduce the time between done and shutting down core.
Mike, thanks very much, I have just done that. Although I logged in as root, so I left out the 'sudo' at the beginning of the line. And I cp'd fstab to fstab_001.bak prior to running the sed command. :)
art_l_j_PlanetAMD64
Over 1.04 Billion Total Points
Over 185,000 Work Units
Over 3,800,000 PPD
Overall rank (if points are combined) 20 of 1721690
In memory of my Mother May 12th 1923 - February 10th 2012
Post Reply