Project: 10493 (Run 5, Clone 38, Gen 285) - UNKNOWN_ENUM

Moderators: Site Moderators, FAHC Science Team

Post Reply
Nicolas_orleans
Posts: 106
Joined: Wed Aug 08, 2012 3:08 am

Project: 10493 (Run 5, Clone 38, Gen 285) - UNKNOWN_ENUM

Post by Nicolas_orleans »

Hello
Strange error on this WU, I got a UNKNOWN_ENUM (127 = 0x7f), then FAHClient restarted with a TPF of 15 minutes on a 980 Ti.
After computer rebooted, FAHClient fails to restart even from command line. In process to find a way to reinstall (GDEBI uninstall / reinstall does not work)

Code: Select all

13:20:35:WU03:FS00:0x21:Project: 10493 (Run 5, Clone 38, Gen 285)
13:20:35:WU03:FS00:0x21:Unit: 0x0000018f8ca304f555d616a56df00c60
13:20:35:WU03:FS00:0x21:CPU: 0x00000000000000000000000000000000
13:20:35:WU03:FS00:0x21:Machine: 0
13:20:35:WU03:FS00:0x21:Reading tar file core.xml
13:20:35:WU03:FS00:0x21:Reading tar file system.xml
13:20:35:WU03:FS00:0x21:Reading tar file integrator.xml
13:20:35:WU03:FS00:0x21:Reading tar file state.xml
13:20:36:WU03:FS00:0x21:Digital signatures verified
13:20:36:WU03:FS00:0x21:Folding@home GPU Core21 Folding@home Core
13:20:36:WU03:FS00:0x21:Version 0.0.17
13:20:36:WU02:FS00:Upload 24.54%
13:20:42:WU03:FS00:0x21:Completed 0 out of 5000000 steps (0%)
13:20:42:WU03:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
13:23:44:WU03:FS00:0x21:Completed 50000 out of 5000000 steps (1%)
13:26:45:WU03:FS00:0x21:Completed 100000 out of 5000000 steps (2%)
13:29:48:WU03:FS00:0x21:Completed 150000 out of 5000000 steps (3%)
13:32:49:WU03:FS00:0x21:Completed 200000 out of 5000000 steps (4%)
13:35:50:WU03:FS00:0x21:Completed 250000 out of 5000000 steps (5%)
13:38:53:WU03:FS00:0x21:Completed 300000 out of 5000000 steps (6%)
13:41:55:WU03:FS00:0x21:Completed 350000 out of 5000000 steps (7%)
13:52:40:WARNING:WU03:FS00:FahCore returned: UNKNOWN_ENUM (127 = 0x7f)
13:52:40:WU03:FS00:Starting
13:52:40:WU03:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/beta/Core_21.fah/FahCore_21 -dir 03 -suffix 01 -version 704 -lifeline 3499 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
13:52:40:WU03:FS00:Started FahCore on PID 25376
13:52:40:WU03:FS00:Core PID:25380
13:52:40:WU03:FS00:FahCore 0x21 started
13:52:41:WU03:FS00:0x21:*********************** Log Started 2016-12-12T13:52:41Z ***********************
13:52:41:WU03:FS00:0x21:Project: 10493 (Run 5, Clone 38, Gen 285)
13:52:41:WU03:FS00:0x21:Unit: 0x0000018f8ca304f555d616a56df00c60
13:52:41:WU03:FS00:0x21:CPU: 0x00000000000000000000000000000000
13:52:41:WU03:FS00:0x21:Machine: 0
13:52:41:WU03:FS00:0x21:Digital signatures verified
13:52:41:WU03:FS00:0x21:Folding@home GPU Core21 Folding@home Core
13:52:41:WU03:FS00:0x21:Version 0.0.17
13:52:41:WU03:FS00:0x21:  Found a checkpoint file
13:52:47:WU03:FS00:0x21:Completed 250000 out of 5000000 steps (5%)
13:52:47:WU03:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
14:17:06:WU03:FS00:0x21:Completed 300000 out of 5000000 steps (6%)
14:41:24:WU03:FS00:0x21:Completed 350000 out of 5000000 steps (7%)
15:05:43:WU03:FS00:0x21:Completed 400000 out of 5000000 steps (8%)
15:29:59:WU03:FS00:0x21:Completed 450000 out of 5000000 steps (9%)
15:54:18:WU03:FS00:0x21:Completed 500000 out of 5000000 steps (10%)
16:18:36:WU03:FS00:0x21:Completed 550000 out of 5000000 steps (11%)
16:42:55:WU03:FS00:0x21:Completed 600000 out of 5000000 steps (12%)
17:07:14:WU03:FS00:0x21:Completed 650000 out of 5000000 steps (13%)
17:31:30:WU03:FS00:0x21:Completed 700000 out of 5000000 steps (14%)
17:55:49:WU03:FS00:0x21:Completed 750000 out of 5000000 steps (15%)
******************************* Date: 2016-12-12 *******************************
18:20:08:WU03:FS00:0x21:Completed 800000 out of 5000000 steps (16%)
18:44:24:WU03:FS00:0x21:Completed 850000 out of 5000000 steps (17%)
MSI Z77A-GD55 - i5-3550 - 16 Go RAM - GTX 980 Ti Hybrid @1461 MHz + GTX 770 @ 1124 MHz + GTX 750 Ti @ 1306 MHz - Ubuntu 16.10
Nicolas_orleans
Posts: 106
Joined: Wed Aug 08, 2012 3:08 am

Re: Project: 10493 (Run 5, Clone 38, Gen 285) - UNKNOWN_ENUM

Post by Nicolas_orleans »

I think the cause is an hardware failure. I reinstalled the system, and drivers appear to reset randomly, though I folded for months 24/7 with these. My best guess is one of the cards is failing and resets the driver for all cards. Will need to run each card separately to icheck this assumption.
MSI Z77A-GD55 - i5-3550 - 16 Go RAM - GTX 980 Ti Hybrid @1461 MHz + GTX 770 @ 1124 MHz + GTX 750 Ti @ 1306 MHz - Ubuntu 16.10
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 10493 (Run 5, Clone 38, Gen 285) - UNKNOWN_ENUM

Post by bruce »

That's a reasonable assumption and you have a good plan to isolate it. Messages saying UNKNOWN* means that something happened which is probably "impossible" and hardware failures are known to be a chief cause of such messags.
Post Reply