Project 8690 bad WU

Moderators: Site Moderators, PandeGroup

Project 8690 bad WU

Postby vincent89147 » Mon Jul 15, 2019 12:03 am

Apparently FAH got a bad WU or ran into another problem. Browsing around I found that there may be an bug in the a4 core:

23:30:22:WU01:FS00:Starting
23:30:22:WU01:FS00:Running FahCore: FahCore_a4 -dir 01 -suffix 01 -version 705 -lifeline 1049 -checkpoint 15 -np 3
23:30:22:WU01:FS00:Started FahCore on PID 2302
23:30:22:WU01:FS00:Core PID:2306
23:30:22:WU01:FS00:FahCore 0xa4 started
23:30:23:WU01:FS00:0xa4:
23:30:23:WU01:FS00:0xa4:*------------------------------*
23:30:23:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
23:30:23:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
23:30:23:WU01:FS00:0xa4:
23:30:23:WU01:FS00:0xa4:Preparing to commence simulation
23:30:23:WU01:FS00:0xa4:- Ensuring status. Please wait.
23:30:32:WU01:FS00:0xa4:- Looking at optimizations...
23:30:32:WU01:FS00:0xa4:- Working with standard loops on this execution.
23:30:32:WU01:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
23:30:32:WU01:FS00:0xa4:- Expanded 127421 -> 264024 (decompressed 207.2 percent)
23:30:32:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=127421 data_size=264024, decompressed_data_size=264024 diff=0
23:30:32:WU01:FS00:0xa4:- Digital signature verified
23:30:32:WU01:FS00:0xa4:
23:30:32:WU01:FS00:0xa4:Project: 8690 (Run 0, Clone 459, Gen 21)
23:30:32:WU01:FS00:0xa4:
23:30:32:WU01:FS00:0xa4:Entering M.D.
23:30:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

grep'ing through the log showed:
23:12:22:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:12:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:13:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:14:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:15:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:16:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:17:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:18:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:19:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:20:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:21:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:22:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:23:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:24:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:25:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:26:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:27:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:28:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:29:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:30:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:31:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23:32:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

This PC is not overclocked. I dropped the above unit and started a new one that is running fine, so it seems to be a bad WU.
vincent89147
 
Posts: 16
Joined: Sun Jul 14, 2019 11:46 pm

Re: Project 8690 bad WU

Postby Joe_H » Mon Jul 15, 2019 12:25 am

Welcome to the folding support forum.

So far the failure report from this WU being assigned to you is the only one in the database. The WU could be bad or be the rare one that does not decompose into 3 threads successfully.

There is a known bug with the A4 core not properly stopping after several attempts at processing a WU that errors out like this. It should have the client dump the WU after 8 attempts as indicated by this message in the log:
Code: Select all
23:30:32:WU01:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.

It happens so infrequently that the bug never was identified in the code and fixed.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 4635
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Project 8690 bad WU

Postby vincent89147 » Mon Jul 15, 2019 12:37 am

Thanks for the explanation Joe. By the amount of INTERRUPTED lines in the log it doesn't seem to have discarded the WU, but I used the telnet interface to delete the slot and then re-add it. I'll be happy to send the full log if needed, but i just wanted to make somebody aware of this project's possibly bad WU
vincent89147
 
Posts: 16
Joined: Sun Jul 14, 2019 11:46 pm

Re: Project 8690 bad WU

Postby vincent89147 » Mon Jul 22, 2019 2:22 pm

Another bad result for the same project:

13:52:40:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
13:53:24:WU00:FS00:Starting
13:53:24:WU00:FS00:Removing old file './work/00/logfile_01-20190722-132123.txt'
13:53:24:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00 -suffix 01 -version 705 -lifeline 994 -checkpoint 15 -np 3
13:53:24:WU00:FS00:Started FahCore on PID 31980
13:53:24:WU00:FS00:Core PID:31984
13:53:24:WU00:FS00:FahCore 0xa4 started
13:53:24:WU00:FS00:0xa4:
13:53:24:WU00:FS00:0xa4:*------------------------------*
13:53:24:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
13:53:24:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
13:53:24:WU00:FS00:0xa4:
13:53:24:WU00:FS00:0xa4:Preparing to commence simulation
13:53:24:WU00:FS00:0xa4:- Ensuring status. Please wait.
13:53:33:WU00:FS00:0xa4:- Looking at optimizations...
13:53:33:WU00:FS00:0xa4:- Working with standard loops on this execution.
13:53:33:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
13:53:33:WU00:FS00:0xa4:- Expanded 127235 -> 264024 (decompressed 207.5 percent)
13:53:33:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=127235 data_size=264024, decompressed_data_size=264024 diff=0
13:53:33:WU00:FS00:0xa4:- Digital signature verified
13:53:33:WU00:FS00:0xa4:
13:53:33:WU00:FS00:0xa4:Project: 8690 (Run 0, Clone 283, Gen 18)
13:53:33:WU00:FS00:0xa4:
13:53:33:WU00:FS00:0xa4:Entering M.D.
13:53:40:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

Let me know if you need more information :)

Thanks!
vincent89147
 
Posts: 16
Joined: Sun Jul 14, 2019 11:46 pm

Re: Project 8690 bad WU

Postby Joe_H » Mon Jul 22, 2019 7:57 pm

Both WU's you have reported went on to be processed successfully by others.

Besides the rare possibility of a WU not decomposing into 3 threads for processing, the WU;'s could have been corrupted somehow. One possible cause is anti-virus scans getting a false match when some random binary data in a WU happens to match a virus signature it looks for. We do recommend exempting the F@h data directory from such scanning.
Joe_H
Site Admin
 
Posts: 4635
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Project 8690 bad WU

Postby bruce » Mon Jul 22, 2019 8:14 pm

Nevertheless, Project: 8690 (Run 0, Clone 283, Gen 18) was reported as "dumped" by you and was successfully completed by the next assignee. The bug will never be fixed because when you dump a WU, the information needed to diagnose the problem is not uploaded.The developers have no alternative but to assume the problem was due to a hardware issue (perhaps overclocking) in your system. When they receive no other information, they can't diagnose it except to make that assumption.

The same is true for Project: 8690 (Run 0, Clone 459, Gen 21)
bruce
 
Posts: 22998
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 8690 bad WU

Postby vincent89147 » Tue Jul 23, 2019 1:46 am

Bruce, it's my understanding that after a certain amount of INTERRUPTED messages the WU is automatically skipped, but that doesn't seem to happen here. Is there any way to diagnose the problem before dumping the WU?

There is no overclocking and there's also no antivirus software. It is older hardware (AMD CPU on an nVidia board), but other projects have no problems completing. I don't need to get to the bottom of this because most stuff finishes, but I'm enough of a geek to poke around and see what I can find if you let me. :)

Thanks!
vincent89147
 
Posts: 16
Joined: Sun Jul 14, 2019 11:46 pm

Re: Project 8690 bad WU

Postby bruce » Tue Jul 23, 2019 3:35 am

That's why I asked you to find the FIRST time the WU was interrupted. Under normal circumstances, whenever the software detects an error, it issues an error message and then chooses which action to take: Resume from the previous checkpoint or Abort and upload the remnants of the WU (together with the log containing the error message(s). If the choice is Resume, it will try a limited number of times and then Abort.

This is not a "normal circumstances" case. INTERRUPTED is not an error so unless there's another message, this case is issued normally (such as if you are logging out or shutting down). The question is which other process decided to inform FAHClient that it needed to shut down folding and why.

Is your computer configured to hybernate/sleep/shut_down after a certain amount of time ... or when a bettery is fully discharged ... or something like that? was your computer disconnected from power, causing an abrupt shutdown without going through the normal shutdown process? The only thing that matters is the first time this happened to this WU.

Once a WU is abruptly terminated like that, it's possible that the checkpoint file was corrupted, and once that happens, instead of resuming from the checkpoint, it might get an INTERRUPTED. (That's not too likely since other people reported errors.)

I'll ask the project people to isolate this WU and test it, but unfortunately I can't predict if they'll find anything. We have so little to go on.
bruce
 
Posts: 22998
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 8690 bad WU

Postby vincent89147 » Tue Jul 23, 2019 4:19 am

I'm sorry Bruce, I did not see you ask that here.

This computer was on all weekend. It's a ASUS M3N78 with a AMD Phenom 9950 so it's entirely possible that it's starting to show its age. However, this project (8690) is the first and (thus far) only project that I'm having trouble with. It's running Debian 9, and will only shut down when I tell it to. It ran over the weekend while I was away, and according to the latest log, there are over 3000 mentions of "INTERRUPTED" in the last log, starting on 7/20 and ending on 7/22:

vv@nestor:/var/lib/fahclient/logs$ grep -c INTERRUPTED log-20190723-035359.txt
3289
vv@nestor:/var/lib/fahclient/logs$ grep INTERRUPTED log-20190723-035359.txt | head -5
07:08:32:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
07:08:49:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
07:09:49:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
07:10:49:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
07:11:49:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
vv@nestor:/var/lib/fahclient/logs$ grep INTERRUPTED log-20190723-035359.txt | tail -6
13:51:40:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
13:52:40:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
13:53:40:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
13:54:40:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
13:55:40:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
14:24:53:WU00:FS00:0xa7:Folding@home Core Shutdown: INTERRUPTED

(Last interruption is a clean shutdown)

When I look in the log the only change I see is the PID. Other than that, the information between two "Starting" lines is identical, so it doesn't look like it ever dropped it on its own. When I got home, queue-info reported that it took ~50 minutes per frame and that it was still at 0% completion. That's when I deleted the slot and re-added it. If there is a way to pull some more information (would it help to attach gdb? If so, what could I inspect?) then I'll be happy to do that, but if there is a configuration option for me to refuse project 8690, then that will work too.

Thanks!
vincent89147
 
Posts: 16
Joined: Sun Jul 14, 2019 11:46 pm

Re: Project 8690 bad WU

Postby Joe_H » Tue Jul 23, 2019 1:35 pm

The "INTERRUPTED" messages are a sign of the problem, but not the actual problem. A variation of that message will also be entered into the log file during normal stoppage of the folding core. The "improper termination" message is what is more significant. What happened in the processing of the WU in the first processing period before the core starts repeatedly restarting and the first time that message appears in the log is the important information.
Joe_H
Site Admin
 
Posts: 4635
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Project 8690 bad WU

Postby vincent89147 » Tue Jul 23, 2019 1:46 pm

Thanks Joe. Here is the start of this run before the first INTERRUPTED message.

Code: Select all
07:08:26:WU00:FS00:Starting
07:08:26:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00 -suffix 01 -version 705 -lifeline 994 -checkpoint 15 -np 3
07:08:26:WU00:FS00:Started FahCore on PID 2035
07:08:26:WU00:FS00:Core PID:2039
07:08:26:WU00:FS00:FahCore 0xa4 started
07:08:26:WU00:FS00:0xa4:
07:08:26:WU00:FS00:0xa4:*------------------------------*
07:08:26:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
07:08:26:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
07:08:26:WU00:FS00:0xa4:
07:08:26:WU00:FS00:0xa4:Preparing to commence simulation
07:08:26:WU00:FS00:0xa4:- Looking at optimizations...
07:08:26:WU00:FS00:0xa4:- Created dyn
07:08:26:WU00:FS00:0xa4:- Files status OK
07:08:26:WU00:FS00:0xa4:- Expanded 127235 -> 264024 (decompressed 207.5 percent)
07:08:26:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=127235 data_size=264024, decompressed_data_size=264024 diff=0
07:08:26:WU00:FS00:0xa4:- Digital signature verified
07:08:26:WU00:FS00:0xa4:
07:08:26:WU00:FS00:0xa4:Project: 8690 (Run 0, Clone 283, Gen 18)
07:08:26:WU00:FS00:0xa4:
07:08:26:WU00:FS00:0xa4:Assembly optimizations on if available.
07:08:26:WU00:FS00:0xa4:Entering M.D.
07:08:30:WU01:FS00:Upload complete
07:08:31:WU01:FS00:Server responded WORK_ACK (400)
07:08:31:WU01:FS00:Final credit estimate, 1364.00 points
07:08:31:WU01:FS00:Cleaning up
07:08:32:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
07:08:33:WU00:FS00:Starting
07:08:33:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00 -suffix 01 -version 705 -lifeline 994 -checkpoint 15 -np 3
07:08:33:WU00:FS00:Started FahCore on PID 2043
07:08:33:WU00:FS00:Core PID:2047
07:08:33:WU00:FS00:FahCore 0xa4 started
07:08:33:WU00:FS00:0xa4:
07:08:33:WU00:FS00:0xa4:*------------------------------*
07:08:33:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
07:08:33:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
07:08:33:WU00:FS00:0xa4:
07:08:33:WU00:FS00:0xa4:Preparing to commence simulation
07:08:33:WU00:FS00:0xa4:- Ensuring status. Please wait.
07:08:42:WU00:FS00:0xa4:- Looking at optimizations...
07:08:42:WU00:FS00:0xa4:- Working with standard loops on this execution.
07:08:42:WU00:FS00:0xa4:- Previous termination of core was improper.
07:08:42:WU00:FS00:0xa4:- Files status OK
07:08:42:WU00:FS00:0xa4:- Expanded 127235 -> 264024 (decompressed 207.5 percent)
07:08:42:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=127235 data_size=264024, decompressed_data_size=264024 diff=0
07:08:42:WU00:FS00:0xa4:- Digital signature verified
07:08:42:WU00:FS00:0xa4:
07:08:42:WU00:FS00:0xa4:Project: 8690 (Run 0, Clone 283, Gen 18)
07:08:42:WU00:FS00:0xa4:
07:08:42:WU00:FS00:0xa4:Entering M.D.
07:08:49:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
07:09:33:WU00:FS00:Starting
07:09:33:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00 -suffix 01 -version 705 -lifeline 994 -checkpoint 15 -np 3
07:09:33:WU00:FS00:Started FahCore on PID 2054
07:09:33:WU00:FS00:Core PID:2058
07:09:33:WU00:FS00:FahCore 0xa4 started
07:09:33:WU00:FS00:0xa4:
07:09:33:WU00:FS00:0xa4:*------------------------------*
07:09:33:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
07:09:33:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
07:09:33:WU00:FS00:0xa4:
07:09:33:WU00:FS00:0xa4:Preparing to commence simulation
07:09:33:WU00:FS00:0xa4:- Ensuring status. Please wait.
07:09:42:WU00:FS00:0xa4:- Looking at optimizations...
07:09:42:WU00:FS00:0xa4:- Working with standard loops on this execution.
07:09:42:WU00:FS00:0xa4:- Previous termination of core was improper.
07:09:42:WU00:FS00:0xa4:- Going to use standard loops.
07:09:42:WU00:FS00:0xa4:- Files status OK
07:09:42:WU00:FS00:0xa4:- Expanded 127235 -> 264024 (decompressed 207.5 percent)
07:09:42:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=127235 data_size=264024, decompressed_data_size=264024 diff=0
07:09:42:WU00:FS00:0xa4:- Digital signature verified
07:09:42:WU00:FS00:0xa4:
07:09:42:WU00:FS00:0xa4:Project: 8690 (Run 0, Clone 283, Gen 18)
07:09:42:WU00:FS00:0xa4:
07:09:42:WU00:FS00:0xa4:Entering M.D.
07:09:49:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
07:10:33:WU00:FS00:Starting
07:10:33:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00 -suffix 01 -version 705 -lifeline 994 -checkpoint 15 -np 3
07:10:33:WU00:FS00:Started FahCore on PID 2062
07:10:33:WU00:FS00:Core PID:2066
07:10:33:WU00:FS00:FahCore 0xa4 started


From there on it's just thousands of times this very message, except for the different PID numbers. If there is another log where I might find more information, could you let me know where that would be?

Thanks! :)
vincent89147
 
Posts: 16
Joined: Sun Jul 14, 2019 11:46 pm

Re: Project 8690 bad WU

Postby Joe_H » Tue Jul 23, 2019 4:59 pm

I am at work so can't look at what work files are present for an A4 WU being processed. If I recall right, within the work folder there would a full logging of the messages that get passed on to the FAHClient process and a MD>log file. They go away at some point in the processing restart, but can contain useful information.

One other thought, do your logs show successful completion of A4 WU's from other projects using that folding core? I have a vague recollection of A4 using an older library that might not be on some installations of Linux. That may depend on what version you are running, and how it was installed, upgraded from an older version or installed fresh from the distro.
Joe_H
Site Admin
 
Posts: 4635
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Project 8690 bad WU

Postby vincent89147 » Tue Jul 23, 2019 7:12 pm

Right now the work folder looks like this:

Code: Select all
vv@nestor:/var/lib/fahclient/work$ ls -l
total 68
drwxrwxrwx 3 fahclient root  4096 Jul 23 12:00 00
-rw-rw-r-- 1 fahclient root 45056 Jul 23 12:00 client.db
-rw-rw-r-- 1 fahclient root 16928 Jul 23 12:00 client.db-journal
vv@nestor:/var/lib/fahclient/work$ ls -l 00
total 360
drwxrwxrwx 2 fahclient root   4096 Jul 23 07:39 01
-rw-r--r-- 1 fahclient root   2272 Jul 22 07:24 logfile_01-20190723-035359.txt
-rw-r--r-- 1 fahclient root   2204 Jul 22 21:22 logfile_01-20190723-132253.txt
-rw-r--r-- 1 fahclient root   2870 Jul 23 07:39 logfile_01-20190723-190023.txt
-rw-r--r-- 1 fahclient root   1662 Jul 23 12:00 logfile_01.txt
-rw-r--r-- 1 fahclient root   2215 Jul 23 12:00 viewerFrame0.json
-rw-r--r-- 1 fahclient root   2215 Jul 22 20:55 viewerFrame10.json
-rw-r--r-- 1 fahclient root   2221 Jul 22 20:58 viewerFrame11.json
-rw-r--r-- 1 fahclient root   2213 Jul 22 21:01 viewerFrame12.json
-rw-r--r-- 1 fahclient root   2232 Jul 22 21:04 viewerFrame13.json
-rw-r--r-- 1 fahclient root   2222 Jul 22 21:07 viewerFrame14.json
-rw-r--r-- 1 fahclient root   2220 Jul 22 21:10 viewerFrame15.json
-rw-r--r-- 1 fahclient root   2224 Jul 22 21:13 viewerFrame16.json
-rw-r--r-- 1 fahclient root   2213 Jul 22 21:16 viewerFrame17.json
-rw-r--r-- 1 fahclient root   2211 Jul 22 21:18 viewerFrame18.json
-rw-r--r-- 1 fahclient root   2223 Jul 23 06:22 viewerFrame19.json
-rw-r--r-- 1 fahclient root   2207 Jul 22 06:59 viewerFrame1.json
-rw-r--r-- 1 fahclient root   2218 Jul 23 06:25 viewerFrame20.json
-rw-r--r-- 1 fahclient root   2228 Jul 23 06:28 viewerFrame21.json
-rw-r--r-- 1 fahclient root   2198 Jul 23 06:31 viewerFrame22.json
-rw-r--r-- 1 fahclient root   2205 Jul 23 06:34 viewerFrame23.json
-rw-r--r-- 1 fahclient root   2209 Jul 23 06:37 viewerFrame24.json
-rw-r--r-- 1 fahclient root   2200 Jul 23 06:40 viewerFrame25.json
-rw-r--r-- 1 fahclient root   2229 Jul 23 06:43 viewerFrame26.json
-rw-r--r-- 1 fahclient root   2219 Jul 23 06:46 viewerFrame27.json
-rw-r--r-- 1 fahclient root   2224 Jul 23 06:49 viewerFrame28.json
-rw-r--r-- 1 fahclient root   2207 Jul 23 06:52 viewerFrame29.json
-rw-r--r-- 1 fahclient root   2230 Jul 22 07:02 viewerFrame2.json
-rw-r--r-- 1 fahclient root   2229 Jul 23 06:55 viewerFrame30.json
-rw-r--r-- 1 fahclient root   2219 Jul 23 06:58 viewerFrame31.json
-rw-r--r-- 1 fahclient root   2224 Jul 23 07:01 viewerFrame32.json
-rw-r--r-- 1 fahclient root   2219 Jul 23 07:04 viewerFrame33.json
-rw-r--r-- 1 fahclient root   2220 Jul 23 07:07 viewerFrame34.json
-rw-r--r-- 1 fahclient root   2210 Jul 23 07:10 viewerFrame35.json
-rw-r--r-- 1 fahclient root   2208 Jul 23 07:13 viewerFrame36.json
-rw-r--r-- 1 fahclient root   2225 Jul 23 07:16 viewerFrame37.json
-rw-r--r-- 1 fahclient root   2223 Jul 23 07:19 viewerFrame38.json
-rw-r--r-- 1 fahclient root   2209 Jul 23 07:22 viewerFrame39.json
-rw-r--r-- 1 fahclient root   2228 Jul 22 07:05 viewerFrame3.json
-rw-r--r-- 1 fahclient root   2207 Jul 23 07:25 viewerFrame40.json
-rw-r--r-- 1 fahclient root   2230 Jul 23 07:29 viewerFrame41.json
-rw-r--r-- 1 fahclient root   2226 Jul 23 07:32 viewerFrame42.json
-rw-r--r-- 1 fahclient root   2222 Jul 23 07:35 viewerFrame43.json
-rw-r--r-- 1 fahclient root   2215 Jul 23 12:00 viewerFrame44.json
-rw-r--r-- 1 fahclient root   2227 Jul 22 07:08 viewerFrame4.json
-rw-r--r-- 1 fahclient root   2225 Jul 22 07:11 viewerFrame5.json
-rw-r--r-- 1 fahclient root   2234 Jul 22 07:14 viewerFrame6.json
-rw-r--r-- 1 fahclient root   2236 Jul 22 07:17 viewerFrame7.json
-rw-r--r-- 1 fahclient root   2234 Jul 22 07:20 viewerFrame8.json
-rw-r--r-- 1 fahclient root   2225 Jul 22 20:54 viewerFrame9.json
-rw-r--r-- 1 fahclient root   3455 Jul 23 12:00 viewerTop.json
-rw-r--r-- 1 fahclient root 149756 Jul 22 06:56 wudata_01.dat
-rw-rw-rw- 1 fahclient root      5 Jul 23 12:00 wudata_01.lock
-rw-r--r-- 1 fahclient root    512 Jul 23 12:00 wuinfo_01.dat
vv@nestor:/var/lib/fahclient/work$

I can't find a reference to anything containing or related to MD. If I get a broken WU again I will look in the work directory for more information.

And it looks like other a4 WUs complete successfully (9039 here):

Code: Select all
04:01:12:WU02:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 02 -suffix 01 -version 705 -lifeline 1063 -checkpoint 15 -np 1
04:01:12:WU02:FS02:FahCore 0xa4 started
04:01:12:WU02:FS02:0xa4:
04:01:12:WU02:FS02:0xa4:*------------------------------*
04:01:12:WU02:FS02:0xa4:Folding@Home Gromacs GB Core
04:01:12:WU02:FS02:0xa4:Version 2.27 (Dec. 15, 2010)
04:01:12:WU02:FS02:0xa4:
04:01:12:WU02:FS02:0xa4:Preparing to commence simulation
04:01:12:WU02:FS02:0xa4:- Looking at optimizations...
04:01:12:WU02:FS02:0xa4:- Files status OK
04:01:12:WU02:FS02:0xa4:- Expanded 825431 -> 1398040 (decompressed 169.3 percent)
04:01:12:WU02:FS02:0xa4:Called DecompressByteArray: compressed_data_size=825431 data_size=1398040, decompressed_data_size=1398040 diff=0
04:01:12:WU02:FS02:0xa4:- Digital signature verified
04:01:12:WU02:FS02:0xa4:
04:01:12:WU02:FS02:0xa4:Project: 9039 (Run 828, Clone 0, Gen 1239)
04:01:12:WU02:FS02:0xa4:
04:01:12:WU02:FS02:0xa4:Assembly optimizations on if available.
04:01:12:WU02:FS02:0xa4:Entering M.D.
04:01:18:WU02:FS02:0xa4:Using Gromacs checkpoints
04:01:19:WU02:FS02:0xa4:Resuming from checkpoint
04:01:19:WU02:FS02:0xa4:Verified 02/wudata_01.log
04:01:19:WU02:FS02:0xa4:Verified 02/wudata_01.trr
04:01:19:WU02:FS02:0xa4:Verified 02/wudata_01.xtc
04:01:19:WU02:FS02:0xa4:Verified 02/wudata_01.edr
04:01:19:WU02:FS02:0xa4:Completed 48220 out of 250000 steps  (19%)
04:08:50:WU02:FS02:0xa4:Completed 50000 out of 250000 steps  (20%)
04:20:24:WU02:FS02:0xa4:Completed 52500 out of 250000 steps  (21%)
04:30:57:WU02:FS02:0xa4:Completed 55000 out of 250000 steps  (22%)
04:42:31:WU02:FS02:0xa4:Completed 57500 out of 250000 steps  (23%)
04:53:35:WU02:FS02:0xa4:Completed 60000 out of 250000 steps  (24%)
05:04:39:WU02:FS02:0xa4:Completed 62500 out of 250000 steps  (25%)
05:15:42:WU02:FS02:0xa4:Completed 65000 out of 250000 steps  (26%)
05:27:16:WU02:FS02:0xa4:Completed 67500 out of 250000 steps  (27%)
05:38:22:WU02:FS02:0xa4:Completed 70000 out of 250000 steps  (28%)
05:49:25:WU02:FS02:0xa4:Completed 72500 out of 250000 steps  (29%)
06:01:03:WU02:FS02:0xa4:Completed 75000 out of 250000 steps  (30%)
06:11:34:WU02:FS02:0xa4:Completed 77500 out of 250000 steps  (31%)
06:23:05:WU02:FS02:0xa4:Completed 80000 out of 250000 steps  (32%)
06:34:09:WU02:FS02:0xa4:Completed 82500 out of 250000 steps  (33%)
06:45:41:WU02:FS02:0xa4:Completed 85000 out of 250000 steps  (34%)
06:56:12:WU02:FS02:0xa4:Completed 87500 out of 250000 steps  (35%)
07:06:41:WU02:FS02:0xa4:Completed 90000 out of 250000 steps  (36%)
07:17:02:WU02:FS02:0xa4:Completed 92500 out of 250000 steps  (37%)
07:27:24:WU02:FS02:0xa4:Completed 95000 out of 250000 steps  (38%)
07:37:49:WU02:FS02:0xa4:Completed 97500 out of 250000 steps  (39%)
07:48:18:WU02:FS02:0xa4:Completed 100000 out of 250000 steps  (40%)
07:58:46:WU02:FS02:0xa4:Completed 102500 out of 250000 steps  (41%)
08:09:15:WU02:FS02:0xa4:Completed 105000 out of 250000 steps  (42%)
08:19:44:WU02:FS02:0xa4:Completed 107500 out of 250000 steps  (43%)
08:30:13:WU02:FS02:0xa4:Completed 110000 out of 250000 steps  (44%)
08:40:42:WU02:FS02:0xa4:Completed 112500 out of 250000 steps  (45%)
08:51:11:WU02:FS02:0xa4:Completed 115000 out of 250000 steps  (46%)
09:01:40:WU02:FS02:0xa4:Completed 117500 out of 250000 steps  (47%)
09:12:09:WU02:FS02:0xa4:Completed 120000 out of 250000 steps  (48%)
09:22:38:WU02:FS02:0xa4:Completed 122500 out of 250000 steps  (49%)
09:33:08:WU02:FS02:0xa4:Completed 125000 out of 250000 steps  (50%)
09:43:37:WU02:FS02:0xa4:Completed 127500 out of 250000 steps  (51%)
09:54:06:WU02:FS02:0xa4:Completed 130000 out of 250000 steps  (52%)
10:04:35:WU02:FS02:0xa4:Completed 132500 out of 250000 steps  (53%)
10:15:05:WU02:FS02:0xa4:Completed 135000 out of 250000 steps  (54%)
10:25:34:WU02:FS02:0xa4:Completed 137500 out of 250000 steps  (55%)
10:36:03:WU02:FS02:0xa4:Completed 140000 out of 250000 steps  (56%)
10:46:33:WU02:FS02:0xa4:Completed 142500 out of 250000 steps  (57%)
10:57:02:WU02:FS02:0xa4:Completed 145000 out of 250000 steps  (58%)
11:07:30:WU02:FS02:0xa4:Completed 147500 out of 250000 steps  (59%)
11:18:00:WU02:FS02:0xa4:Completed 150000 out of 250000 steps  (60%)
11:28:29:WU02:FS02:0xa4:Completed 152500 out of 250000 steps  (61%)
11:38:58:WU02:FS02:0xa4:Completed 155000 out of 250000 steps  (62%)
11:49:28:WU02:FS02:0xa4:Completed 157500 out of 250000 steps  (63%)
11:59:58:WU02:FS02:0xa4:Completed 160000 out of 250000 steps  (64%)
12:10:28:WU02:FS02:0xa4:Completed 162500 out of 250000 steps  (65%)
12:20:58:WU02:FS02:0xa4:Completed 165000 out of 250000 steps  (66%)
12:31:27:WU02:FS02:0xa4:Completed 167500 out of 250000 steps  (67%)
12:41:55:WU02:FS02:0xa4:Completed 170000 out of 250000 steps  (68%)
12:52:23:WU02:FS02:0xa4:Completed 172500 out of 250000 steps  (69%)
13:02:53:WU02:FS02:0xa4:Completed 175000 out of 250000 steps  (70%)
13:13:22:WU02:FS02:0xa4:Completed 177500 out of 250000 steps  (71%)
13:23:52:WU02:FS02:0xa4:Completed 180000 out of 250000 steps  (72%)
13:34:21:WU02:FS02:0xa4:Completed 182500 out of 250000 steps  (73%)
13:44:49:WU02:FS02:0xa4:Completed 185000 out of 250000 steps  (74%)
13:55:17:WU02:FS02:0xa4:Completed 187500 out of 250000 steps  (75%)
14:05:45:WU02:FS02:0xa4:Completed 190000 out of 250000 steps  (76%)
14:16:13:WU02:FS02:0xa4:Completed 192500 out of 250000 steps  (77%)
14:26:40:WU02:FS02:0xa4:Completed 195000 out of 250000 steps  (78%)
14:37:09:WU02:FS02:0xa4:Completed 197500 out of 250000 steps  (79%)
14:47:37:WU02:FS02:0xa4:Completed 200000 out of 250000 steps  (80%)
14:58:07:WU02:FS02:0xa4:Completed 202500 out of 250000 steps  (81%)
15:08:34:WU02:FS02:0xa4:Completed 205000 out of 250000 steps  (82%)
15:19:05:WU02:FS02:0xa4:Completed 207500 out of 250000 steps  (83%)
15:29:33:WU02:FS02:0xa4:Completed 210000 out of 250000 steps  (84%)
15:40:02:WU02:FS02:0xa4:Completed 212500 out of 250000 steps  (85%)
15:50:31:WU02:FS02:0xa4:Completed 215000 out of 250000 steps  (86%)
16:00:59:WU02:FS02:0xa4:Completed 217500 out of 250000 steps  (87%)
16:11:28:WU02:FS02:0xa4:Completed 220000 out of 250000 steps  (88%)
16:21:56:WU02:FS02:0xa4:Completed 222500 out of 250000 steps  (89%)
16:32:23:WU02:FS02:0xa4:Completed 225000 out of 250000 steps  (90%)
16:42:50:WU02:FS02:0xa4:Completed 227500 out of 250000 steps  (91%)
16:53:18:WU02:FS02:0xa4:Completed 230000 out of 250000 steps  (92%)
17:03:47:WU02:FS02:0xa4:Completed 232500 out of 250000 steps  (93%)
17:14:16:WU02:FS02:0xa4:Completed 235000 out of 250000 steps  (94%)
17:24:45:WU02:FS02:0xa4:Completed 237500 out of 250000 steps  (95%)
17:35:15:WU02:FS02:0xa4:Completed 240000 out of 250000 steps  (96%)
17:45:45:WU02:FS02:0xa4:Completed 242500 out of 250000 steps  (97%)
17:56:14:WU02:FS02:0xa4:Completed 245000 out of 250000 steps  (98%)
18:06:40:WU02:FS02:0xa4:Completed 247500 out of 250000 steps  (99%)
18:17:04:WU02:FS02:0xa4:Completed 250000 out of 250000 steps  (100%)
18:17:05:WU02:FS02:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
18:17:15:WU02:FS02:0xa4:
18:17:15:WU02:FS02:0xa4:Finished Work Unit:
18:17:15:WU02:FS02:0xa4:- Reading up to 811440 from "02/wudata_01.trr": Read 811440
18:17:15:WU02:FS02:0xa4:trr file hash check passed.
18:17:15:WU02:FS02:0xa4:- Reading up to 747708 from "02/wudata_01.xtc": Read 747708
18:17:15:WU02:FS02:0xa4:xtc file hash check passed.
18:17:15:WU02:FS02:0xa4:edr file hash check passed.
18:17:15:WU02:FS02:0xa4:logfile size: 24474
18:17:15:WU02:FS02:0xa4:Leaving Run
18:17:19:WU02:FS02:0xa4:- Writing 1586110 bytes of core data to disk...
18:17:19:WU02:FS02:0xa4:Done: 1585598 -> 1539236 (compressed to 97.0 percent)
18:17:19:WU02:FS02:0xa4:  ... Done.
18:17:28:WU02:FS02:0xa4:- Shutting down core
18:17:28:WU02:FS02:0xa4:
18:17:28:WU02:FS02:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
18:17:29:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:9039 run:828 clone:0 gen:1239 core:0xa4 unit:0x0000057dab436c9e5698275dbcbb54de


I'm not in a hurry with this, so when you're at work you're at work (I am, too :) ), but if I can help solve a longstanding problem, however minor, I'll be happy to do so.

Thanks!
vincent89147
 
Posts: 16
Joined: Sun Jul 14, 2019 11:46 pm

Re: Project 8690 bad WU

Postby Joe_H » Tue Jul 23, 2019 7:43 pm

That looks like the work folder for an A7 WU, it uses different files than A4. The .json files are used by FAHViewer in version 7.5.1 of the client. 7.4.4 used a different method of displaying the protein in FAHViewer for A4 WU's.
Joe_H
Site Admin
 
Posts: 4635
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Project 8690 bad WU

Postby vincent89147 » Tue Jul 23, 2019 9:04 pm

This is the only work folder that is currently on the system. If and when I get another 8690 WU I will dig a little bit. If you have some sort of check list on what I should look for I'll be happy to hear it.

Thanks!
vincent89147
 
Posts: 16
Joined: Sun Jul 14, 2019 11:46 pm

Next

Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 1 guest

cron