75XX Project issues (crashes, too many steps etc.)

Moderators: Site Moderators, FAHC Science Team

ThunderRd
Posts: 78
Joined: Sun Dec 02, 2007 5:30 am
Location: Nong Khai, Thailand

Re: 7520 (R119 C4 G264) Could not get length of results file

Post by ThunderRd »

Thanks for that, u_f.
folding_hoomer
Posts: 349
Joined: Sun Feb 10, 2013 6:06 pm
Hardware configuration: Sys 1: I7 2700K@4,4GHz with NH-C14
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d

Sys 2: I7 3930K@4,4GHz with Corsair H110
16GB G.Skill Ripjaws X DDR3 1866MHz CL 9-10-9-28
ASUS Ranpage IV Formula, Ubuntu 10.10

Sys 3 i7 875K@3,826 GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI P55-GD80, Win7 64Bit Pro
Sapphire Radeon HD5870@1,163V 900/1250MHz
Sapphire Radeon HD7870@1,218V 1200/1300MHz

Sys 4 i7 2600K@4,4GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d

Optional:
ASUS P5Q Pro with Q9550
ASUS P5Q Pro with Q6300
Location: Bavaria, Germany

WU´s with wrong number of steps

Post by folding_hoomer »

Got two Projects with a wrong number of steps:
Project: 7522 (Run 0, Clone 96, Gen 50) with 25.500.000 steps:

Code: Select all

01:53:44:WU00:FS00:0xa3:Project: 7522 (Run 0, Clone 96, Gen 50)
01:53:44:WU00:FS00:0xa3:
01:53:44:WU00:FS00:0xa3:Assembly optimizations on if available.
01:53:44:WU00:FS00:0xa3:Entering M.D.
01:53:50:WU00:FS00:0xa3:Mapping NT from 8 to 8 
01:53:50:WU01:FS00:Upload 3.83%
01:53:50:WU00:FS00:0xa3:Completed 0 out of 25500000 steps  (0%)
01:53:56:WU01:FS00:Upload 30.63%
01:54:02:WU01:FS00:Upload 95.73%
01:54:10:WU01:FS00:Upload complete
01:54:10:WU01:FS00:Server responded WORK_ACK (400)
01:54:10:WU01:FS00:Final credit estimate, 2066.00 points
01:54:10:WU01:FS00:Cleaning up
******************************* Date: 2015-03-11 *******************************
04:24:00:WU00:FS00:0xa3:Completed 255000 out of 25500000 steps  (1%)
06:52:50:WU00:FS00:0xa3:Completed 510000 out of 25500000 steps  (2%)
09:21:41:WU00:FS00:0xa3:Completed 765000 out of 25500000 steps  (3%)
09:53:48:FS00:Paused
09:53:48:FS00:Shutting core down
09:53:53:WU00:FS00:0xa3:Client no longer detected. Shutting down core.
09:53:53:WU00:FS00:0xa3:
09:53:53:WU00:FS00:0xa3:Folding@home Core Shutdown: CLIENT_DIED
09:53:53:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Project: 7522 (Run 0, Clone 84, Gen 126) with 63.500.000 steps:

Code: Select all

10:48:33:WU02:FS00:0xa3:Project: 7522 (Run 0, Clone 84, Gen 126)
10:48:33:WU02:FS00:0xa3:
10:48:33:WU02:FS00:0xa3:Assembly optimizations on if available.
10:48:33:WU02:FS00:0xa3:Entering M.D.
10:48:39:WU02:FS00:0xa3:Mapping NT from 10 to 10 
10:48:39:WU01:FS00:Upload 17.04%
10:48:39:WU02:FS00:0xa3:Completed 0 out of 63500000 steps  (0%)
10:48:45:WU01:FS00:Upload 59.63%
10:49:05:WU01:FS00:Upload complete
10:49:05:WU01:FS00:Server responded WORK_ACK (400)
10:49:05:WU01:FS00:Final credit estimate, 2153.00 points
10:49:05:WU01:FS00:Cleaning up
12:04:41:FS00:Paused
12:04:41:FS00:Shutting core down
12:04:44:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
CPU-Slot deleted to dump the Projects.
Image
folding_hoomer
Posts: 349
Joined: Sun Feb 10, 2013 6:06 pm
Hardware configuration: Sys 1: I7 2700K@4,4GHz with NH-C14
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d

Sys 2: I7 3930K@4,4GHz with Corsair H110
16GB G.Skill Ripjaws X DDR3 1866MHz CL 9-10-9-28
ASUS Ranpage IV Formula, Ubuntu 10.10

Sys 3 i7 875K@3,826 GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI P55-GD80, Win7 64Bit Pro
Sapphire Radeon HD5870@1,163V 900/1250MHz
Sapphire Radeon HD7870@1,218V 1200/1300MHz

Sys 4 i7 2600K@4,4GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d

Optional:
ASUS P5Q Pro with Q9550
ASUS P5Q Pro with Q6300
Location: Bavaria, Germany

Re: Various CPU projects cause client to crash

Post by folding_hoomer »

Project: 7523 (Run 0, Clone 74, Gen 487) - infinite loop:

Code: Select all

 . . .
09:10:52:WU01:FS00:Starting
09:10:52:WU01:FS00:Removing old file './work/01/logfile_01-20150311-083852.txt'
09:10:52:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a3.fah/FahCore_a3 -dir 01 -suffix 01 -version 703 -lifeline 1190 -checkpoint 30 -np 10
09:10:52:WU01:FS00:Started FahCore on PID 20399
09:10:52:WU01:FS00:Core PID:20403
09:10:52:WU01:FS00:FahCore 0xa3 started
09:10:53:WU01:FS00:0xa3:
09:10:53:WU01:FS00:0xa3:*------------------------------*
09:10:53:WU01:FS00:0xa3:Folding@Home Gromacs SMP Core
09:10:53:WU01:FS00:0xa3:Version 2.27 (Dec. 15, 2010)
09:10:53:WU01:FS00:0xa3:
09:10:53:WU01:FS00:0xa3:Preparing to commence simulation
09:10:53:WU01:FS00:0xa3:- Ensuring status. Please wait.
09:11:02:WU01:FS00:0xa3:- Looking at optimizations...
09:11:02:WU01:FS00:0xa3:- Working with standard loops on this execution.
09:11:02:WU01:FS00:0xa3:Examination of work files indicates 8 consecutive improper terminations of core.
09:11:02:WU01:FS00:0xa3:- Expanded 2568587 -> 3131980 (decompressed 121.9 percent)
09:11:02:WU01:FS00:0xa3:Called DecompressByteArray: compressed_data_size=2568587 data_size=3131980, decompressed_data_size=3131980 diff=0
09:11:02:WU01:FS00:0xa3:- Digital signature verified
09:11:02:WU01:FS00:0xa3:
09:11:02:WU01:FS00:0xa3:Project: 7523 (Run 0, Clone 74, Gen 487)
09:11:02:WU01:FS00:0xa3:
09:11:02:WU01:FS00:0xa3:Entering M.D.
09:11:08:WU01:FS00:0xa3:Mapping NT from 10 to 10 
09:11:09:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:11:52:WU01:FS00:Starting
09:11:52:WU01:FS00:Removing old file './work/01/logfile_01-20150311-083952.txt'
09:11:52:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a3.fah/FahCore_a3 -dir 01 -suffix 01 -version 703 -lifeline 1190 -checkpoint 30 -np 10
09:11:52:WU01:FS00:Started FahCore on PID 20408
09:11:52:WU01:FS00:Core PID:20412
09:11:52:WU01:FS00:FahCore 0xa3 started
09:11:53:WU01:FS00:0xa3:
09:11:53:WU01:FS00:0xa3:*------------------------------*
09:11:53:WU01:FS00:0xa3:Folding@Home Gromacs SMP Core
09:11:53:WU01:FS00:0xa3:Version 2.27 (Dec. 15, 2010)
09:11:53:WU01:FS00:0xa3:
09:11:53:WU01:FS00:0xa3:Preparing to commence simulation
09:11:53:WU01:FS00:0xa3:- Ensuring status. Please wait.
09:12:02:WU01:FS00:0xa3:- Looking at optimizations...
09:12:02:WU01:FS00:0xa3:- Working with standard loops on this execution.
09:12:02:WU01:FS00:0xa3:Examination of work files indicates 8 consecutive improper terminations of core.
09:12:02:WU01:FS00:0xa3:- Expanded 2568587 -> 3131980 (decompressed 121.9 percent)
09:12:02:WU01:FS00:0xa3:Called DecompressByteArray: compressed_data_size=2568587 data_size=3131980, decompressed_data_size=3131980 diff=0
09:12:02:WU01:FS00:0xa3:- Digital signature verified
09:12:02:WU01:FS00:0xa3:
09:12:02:WU01:FS00:0xa3:Project: 7523 (Run 0, Clone 74, Gen 487)
09:12:02:WU01:FS00:0xa3:
09:12:02:WU01:FS00:0xa3:Entering M.D.
09:12:08:WU01:FS00:0xa3:Mapping NT from 10 to 10 
09:12:09:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:12:52:WU01:FS00:Starting
09:12:52:WU01:FS00:Removing old file './work/01/logfile_01-20150311-084052.txt'
09:12:52:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a3.fah/FahCore_a3 -dir 01 -suffix 01 -version 703 -lifeline 1190 -checkpoint 30 -np 10
09:12:52:WU01:FS00:Started FahCore on PID 20417
09:12:52:WU01:FS00:Core PID:20421
09:12:52:WU01:FS00:FahCore 0xa3 started
09:12:53:WU01:FS00:0xa3:
09:12:53:WU01:FS00:0xa3:*------------------------------*
09:12:53:WU01:FS00:0xa3:Folding@Home Gromacs SMP Core
09:12:53:WU01:FS00:0xa3:Version 2.27 (Dec. 15, 2010)
09:12:53:WU01:FS00:0xa3:
09:12:53:WU01:FS00:0xa3:Preparing to commence simulation
09:12:53:WU01:FS00:0xa3:- Ensuring status. Please wait.
09:13:02:WU01:FS00:0xa3:- Looking at optimizations...
09:13:02:WU01:FS00:0xa3:- Working with standard loops on this execution.
09:13:02:WU01:FS00:0xa3:Examination of work files indicates 8 consecutive improper terminations of core.
09:13:02:WU01:FS00:0xa3:- Expanded 2568587 -> 3131980 (decompressed 121.9 percent)
09:13:02:WU01:FS00:0xa3:Called DecompressByteArray: compressed_data_size=2568587 data_size=3131980, decompressed_data_size=3131980 diff=0
09:13:02:WU01:FS00:0xa3:- Digital signature verified
09:13:02:WU01:FS00:0xa3:
09:13:02:WU01:FS00:0xa3:Project: 7523 (Run 0, Clone 74, Gen 487)
09:13:02:WU01:FS00:0xa3:
09:13:02:WU01:FS00:0xa3:Entering M.D.
09:13:08:WU01:FS00:0xa3:Mapping NT from 10 to 10 
09:13:09:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:13:52:WU01:FS00:Starting
09:13:52:WU01:FS00:Removing old file './work/01/logfile_01-20150311-084152.txt'
09:13:52:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a3.fah/FahCore_a3 -dir 01 -suffix 01 -version 703 -lifeline 1190 -checkpoint 30 -np 10
09:13:52:WU01:FS00:Started FahCore on PID 20427
09:13:52:WU01:FS00:Core PID:20431
09:13:52:WU01:FS00:FahCore 0xa3 started
09:13:53:WU01:FS00:0xa3:
09:13:53:WU01:FS00:0xa3:*------------------------------*
09:13:53:WU01:FS00:0xa3:Folding@Home Gromacs SMP Core
09:13:53:WU01:FS00:0xa3:Version 2.27 (Dec. 15, 2010)
09:13:53:WU01:FS00:0xa3:
09:13:53:WU01:FS00:0xa3:Preparing to commence simulation
09:13:53:WU01:FS00:0xa3:- Ensuring status. Please wait.
09:14:02:WU01:FS00:0xa3:- Looking at optimizations...
09:14:02:WU01:FS00:0xa3:- Working with standard loops on this execution.
09:14:02:WU01:FS00:0xa3:Examination of work files indicates 8 consecutive improper terminations of core.
09:14:02:WU01:FS00:0xa3:- Expanded 2568587 -> 3131980 (decompressed 121.9 percent)
09:14:02:WU01:FS00:0xa3:Called DecompressByteArray: compressed_data_size=2568587 data_size=3131980, decompressed_data_size=3131980 diff=0
09:14:02:WU01:FS00:0xa3:- Digital signature verified
09:14:02:WU01:FS00:0xa3:
09:14:02:WU01:FS00:0xa3:Project: 7523 (Run 0, Clone 74, Gen 487)
09:14:02:WU01:FS00:0xa3:
09:14:02:WU01:FS00:0xa3:Entering M.D.
09:14:08:WU01:FS00:0xa3:Mapping NT from 10 to 10 
09:14:09:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:14:30:FS00:Paused
CPU-Slot deleted to dump data.
Image
uncle_fungus
Site Admin
Posts: 1288
Joined: Fri Nov 30, 2007 9:37 am
Location: Oxfordshire, UK

Re: 75XX Project issues (crashes, too many steps etc.)

Post by uncle_fungus »

Merged a couple of posts, renamed thread and moved topic into more appropriate category.
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: 75XX Project issues (crashes, too many steps etc.)

Post by kasson »

Thanks--we've been fixing WU's as we can and have also corrected 30 work units with too many steps.
ThunderRd
Posts: 78
Joined: Sun Dec 02, 2007 5:30 am
Location: Nong Khai, Thailand

Re: 75XX Project issues (crashes, too many steps etc.)

Post by ThunderRd »

The above-mentioned *7520 (R119 C4 G264) Could not get length of results file* is still in the pipe. Today I finished an 8831 and the box immediately downloaded the faulty 7520 yet again. If it's already been checked/fixed, then there's still something wrong with it, and the log is identical to what I posted above.

After a dozen or so attempts to run the WU again with the same failed result, the client abandoned it and got another 8831, so I'm running that now.
autogrog
Posts: 38
Joined: Mon Aug 18, 2008 3:38 pm
Location: Halifax, Nova Scotia

Re: 75XX Project issues (crashes, too many steps etc.)

Post by autogrog »

I have been getting a variation of this problem. It will complete an 8xxx wu and then spend 10-15 minutes trying to process a 7520 before finally getting a good non-75xx wu. Today it has gone through several cycles of good then the same bad 7520. It seems that this work unit family to totally rubbish and should be regenerated. the problem has been evident for weeks.
Sample log frgment follows (6.34 client on Fedora 18):

Code: Select all

[06:45:14] Project: 7520 (Run 81, Clone 6, Gen 49)
[06:45:14] 
[06:45:14] Entering M.D.
[06:45:20] CoreStatus = 0 (0)
[06:45:20] Sending work to server
[06:45:20] Project: 7520 (Run 81, Clone 6, Gen 49)
[06:45:20] - Error: Could not get length of results file work/wuresults_05.dat
[06:45:20] - Error: Could not read unit 05 file. Removing from queue.
[06:45:20] Trying to send all finished work units
[06:45:20] + No unsent completed units remaining.
[06:45:20] - Preparing to get new work unit...
[06:45:20] Cleaning up work directory
[06:45:22] + Attempting to get work packet
[06:45:22] Passkey found
[06:45:22] - Will indicate memory of 3949 MB
[06:45:22] - Connecting to assignment server
[06:45:22] Connecting to http://assign.stanford.edu:8080/
[06:45:22] Posted data.
[06:45:22] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[06:45:22] + News From Folding@Home: 
[06:45:22] Loaded queue successfully.
[06:45:22] Sent data
[06:45:22] Connecting to http://128.143.199.97:8080/
[06:45:23] Posted data.
[06:45:23] Initial: 0000; - Receiving payload (expected size: 2357744)
[06:45:24] - Downloaded at ~2302 kB/s
Last edited by autogrog on Thu Mar 12, 2015 7:35 pm, edited 1 time in total.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 75XX Project issues (crashes, too many steps etc.)

Post by 7im »

Dr. Kasson should be actively looking through the server logs to find bad work units shown to have multiple failures, and for trajectories stuck on a specific generation caused by these corrupted work units instead of waiting for us miner's canaries to die and report the failures.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 75XX Project issues (crashes, too many steps etc.)

Post by Joe_H »

Persons running the version 6 client will not be sending in a failure report when it fails a WU, that is a feature that usually works in the version 7 client. That is one reason to upgrade clients to the current release.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Gary480six
Posts: 91
Joined: Mon Jan 21, 2008 6:42 pm

Re: 75XX Project issues (crashes, too many steps etc.)

Post by Gary480six »

Since people running the version 6 client have no other way of reporting problem work units - here are a few
This is from a variety of systems. All are running the last SMP version 6 client and Windows 7.
I believe that all the errors are from work units received from the 128.143.199.97 server.
The same systems will run fine on other work units - and even some P7520 work units (presumably ones already 'fixed')

From February 25th to February 28th, one of these systems downloaded and crashed the same P7520 R89 C4 G425 work unit
47 Times. Mind you.. not 47 in a row it would crash a few times, complete a different work unit, then go back to this P7520.
Several times the work folder, queue.dat and unitinfo.txt were deleted.. but that P7520 kept coming back.

Code: Select all

[18:18:09] Project: 7520 (Run 89, Clone 4, Gen 425)
[18:18:09] 
[18:18:09] Entering M.D.
[18:18:15] Mapping NT from 8 to 8 
[18:18:25] CoreStatus = C0000417 (-1073740777)
[18:18:25] Client-core communications error: ERROR 0xc0000417


[19:11:12] Project: 7520 (Run 106, Clone 8, Gen 2)
[19:11:12] 
[19:11:12] Entering M.D.
[19:11:18] Mapping NT from 8 to 8 
[19:11:28] CoreStatus = C0000417 (-1073740777)
[19:11:28] Client-core communications error: ERROR 0xc0000417


[04:30:01] Project: 7520 (Run 63, Clone 6, Gen 111)
[04:30:01] 
[04:30:01] Assembly optimizations on if available.
[04:30:01] Entering M.D.
[04:30:07] Mapping NT from 4 to 4 
[04:30:30] CoreStatus = C0000417 (-1073740777)
[04:30:30] Client-core communications error: ERROR 0xc0000417


[00:52:44] Project: 7520 (Run 115, Clone 3, Gen 252)
[00:52:44] 
[00:52:44] Assembly optimizations on if available.
[00:52:44] Entering M.D.
[00:52:50] Mapping NT from 4 to 4 
[00:52:50] Gromacs cannot continue further.
[00:52:50] Going to send back what have done -- stepsTotalG=0
[00:52:50] Work fraction=0.0000 steps=0.
[00:52:54] logfile size=2225 infoLength=2225 edr=0 trr=23
[00:52:54] logfile size: 2225 info=2225 bed=0 hdr=23
[00:52:54] - Writing 2761 bytes of core data to disk...
[00:52:54] Done: 2249 -> 1160 (compressed to 51.5 percent)
[00:52:54]   ... Done.
[00:52:55] 
[00:52:55] Folding@home Core Shutdown: UNSTABLE_MACHINE
[00:52:58] CoreStatus = 7A (122)

Code: Select all

04:28:29] Project: 7520 (Run 65, Clone 6, Gen 116)
[04:28:29] 
[04:28:29] Assembly optimizations on if available.
[04:28:29] Entering M.D.
[04:28:35] Mapping NT from 4 to 4 
[04:28:35] mdrun returned 255
[04:28:35] Going to send back what have done -- stepsTotalG=0
[04:28:35] Work fraction=0.0000 steps=0.
[04:28:39] logfile size=2225 infoLength=2225 edr=0 trr=25
[04:28:39] logfile size: 2225 info=2225 bed=0 hdr=25
[04:28:39] - Writing 2763 bytes of core data to disk...
[04:28:39] Done: 2251 -> 1156 (compressed to 51.3 percent)
[04:28:39]   ... Done.
[04:28:39] 
[04:28:39] Folding@home Core Shutdown: UNSTABLE_MACHINE
04:28:58] Project: 7520 (Run 88, Clone 6, Gen 114)
[04:28:58] 
[04:28:58] Assembly optimizations on if available.
[04:28:58] Entering M.D.
[04:29:04] Mapping NT from 4 to 4 
[04:29:04] Gromacs cannot continue further.
[04:29:04] Going to send back what have done -- stepsTotalG=500000
[04:29:04] Work fraction=0.0000 steps=500000.
[04:29:08] logfile size=6817 infoLength=6817 edr=0 trr=23
[04:29:08] logfile size: 6817 info=6817 bed=0 hdr=23
[04:29:08] - Writing 7353 bytes of core data to disk...
[04:29:08] Done: 6841 -> 2423 (compressed to 35.4 percent)
[04:29:08]   ... Done.
[04:29:08] 
[04:29:08] Folding@home Core Shutdown: EARLY_UNIT_END
[04:29:11] CoreStatus = 72 (114)


[04:29:26] Project: 7520 (Run 104, Clone 7, Gen 119)
[04:29:26] 
[04:29:26] Assembly optimizations on if available.
[04:29:26] Entering M.D.
[04:29:32] Mapping NT from 4 to 4 
[04:29:32] mdrun returned 255
[04:29:32] Going to send back what have done -- stepsTotalG=0
[04:29:32] Work fraction=0.0000 steps=0.
[04:29:36] logfile size=2225 infoLength=2225 edr=0 trr=25
[04:29:36] logfile size: 2225 info=2225 bed=0 hdr=25
[04:29:36] - Writing 2763 bytes of core data to disk...
[04:29:36] Done: 2251 -> 1157 (compressed to 51.3 percent)
[04:29:36]   ... Done.
[04:29:37] 
[04:29:37] Folding@home Core Shutdown: UNSTABLE_MACHINE
[04:29:40] CoreStatus = 7A (122)



[18:51:20] Project: 7520 (Run 64, Clone 9, Gen 1)
[18:51:20] 
[18:51:20] Assembly optimizations on if available.
[18:51:20] Entering M.D.
[18:51:26] Mapping NT from 4 to 4 
[18:51:26] mdrun returned 255
[18:51:26] Going to send back what have done -- stepsTotalG=0
[18:51:26] Work fraction=0.0000 steps=0.
[18:51:30] logfile size=2225 infoLength=2225 edr=0 trr=25
[18:51:30] logfile size: 2225 info=2225 bed=0 hdr=25
[18:51:30] - Writing 2763 bytes of core data to disk...
[18:51:30] Done: 2251 -> 1157 (compressed to 51.3 percent)
[18:51:30]   ... Done.
[18:51:30] 
[18:51:30] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:51:34] CoreStatus = 7A (122)


[18:52:29] Project: 7520 (Run 67, Clone 9, Gen 1)
[18:52:29] 
[18:52:29] Assembly optimizations on if available.
[18:52:29] Entering M.D.
[18:52:35] Mapping NT from 4 to 4 
[18:52:35] mdrun returned 255
[18:52:35] Going to send back what have done -- stepsTotalG=0
[18:52:35] Work fraction=0.0000 steps=0.
[18:52:39] logfile size=2225 infoLength=2225 edr=0 trr=25
[18:52:39] logfile size: 2225 info=2225 bed=0 hdr=25
[18:52:39] - Writing 2763 bytes of core data to disk...
[18:52:39] Done: 2251 -> 1162 (compressed to 51.6 percent)
[18:52:39]   ... Done.
[18:52:42] 
[18:52:42] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:52:44] CoreStatus = 7A (122)


[19:58:24] Project: 7520 (Run 41, Clone 5, Gen 200)
[19:58:24] 
[19:58:24] Assembly optimizations on if available.
[19:58:24] Entering M.D.
[19:58:30] Mapping NT from 8 to 8 
[19:58:38] CoreStatus = C0000417 (-1073740777)
[19:58:38] Client-core communications error: ERROR 0xc0000417
[19:58:38] Deleting current work unit & continuing...
[19:58:50] - Preparing to get new work unit...
[19:58:50] Cleaning up work directory



[19:17:58] Project: 7520 (Run 55, Clone 8, Gen 5)
[19:17:58] 
[19:17:58] Assembly optimizations on if available.
[19:17:58] Entering M.D.
[19:18:04] Mapping NT from 8 to 8 
[19:18:14] CoreStatus = C0000417 (-1073740777)
[19:18:14] Client-core communications error: ERROR 0xc0000417
[19:18:14] Deleting current work unit & continuing...
[19:18:26] - Preparing to get new work unit...
[19:18:26] Cleaning up work directory
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 75XX Project issues (crashes, too many steps etc.)

Post by 7im »

You've been running v6 too long to remember to change the Machine ID value in the config after deleting the WU info to force a new WU in Windows. Or to delete the ID .dat file in Linux to affect the same change.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
autogrog
Posts: 38
Joined: Mon Aug 18, 2008 3:38 pm
Location: Halifax, Nova Scotia

Re: 75XX Project issues (crashes, too many steps etc.)

Post by autogrog »

Since 6.34 apparently does not report bad wu's here is a current list of failed wu's

7520 (Run 123, Clone 2, Gen 492)
7520 (Run 20, Clone 6, Gen 98)
7520 (Run 30, Clone 7, Gen 57)
7520 (Run 123, Clone 2, Gen 492)

Error: Could not get length of results file
7520 (Run 102, Clone 2, Gen 496) x many consecutive attempts
'' 1.5 hours retrying before I dumped it
7520 (Run 81, Clone 6, Gen 49) 14 minutes of retries before a different (successful) wu
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 75XX Project issues (crashes, too many steps etc.)

Post by toTOW »

Joe_H wrote:Persons running the version 6 client will not be sending in a failure report when it fails a WU, that is a feature that usually works in the version 7 client. That is one reason to upgrade clients to the current release.
People with v7 client might not too if these WUs crashes their clients like mines ... :?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
uncle_fungus
Site Admin
Posts: 1288
Joined: Fri Nov 30, 2007 9:37 am
Location: Oxfordshire, UK

Re: 75XX Project issues (crashes, too many steps etc.)

Post by uncle_fungus »

toTOW wrote: People with v7 client might not too if these WUs crashes their clients like mines ... :?
Agreed, there are actually 2 issues here:
  1. There are broken work units which show a number of different failure modes
  2. The client crashes when the core dies in one of those failure modes
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 75XX Project issues (crashes, too many steps etc.)

Post by 7im »

toTOW wrote:
Joe_H wrote:Persons running the version 6 client will not be sending in a failure report when it fails a WU, that is a feature that usually works in the version 7 client. That is one reason to upgrade clients to the current release.
People with v7 client might not too if these WUs crashes their clients like mines ... :?
Which is better? v6 client stuck in an endless loop crashing and downloading the same WU over and over, and that might not get noticed for a few days, or a V7 client with a crashed notification on the screen as soon as it happens?

And while the v6 client doesn't feed in to the newer server analytics like V7 does, the researcher can still dig a little and find which runs or clones are stuck at a very low generation number (like in the old days) and regenerate them to restart them. Dr. Kasson has been around long enough to know how to do that, it's just tedious. ;) Plus if you wait long enough, the WUs that fail and get dumped from a v6 client eventually get assigned to a V7 client, and so then the analytics work. It just takes longer to fix all the reported bad WUs. :?
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply