GPU EUEs

Moderators: slegrand, Site Moderators, PandeGroup

GPU EUEs

Postby UncleWilley » Fri Sep 30, 2011 7:18 pm

Getting EUEs on long running GPU (see log file below).
Two GPUs installed, EVGA 896-P3-1255-AR GeForce GTX 260 Core 216 and MSI Micro Star Intl 260 GTX, N260GTX T2D896 OCv2. The MSI GPU seems to be the one failing. NOTE I have been running with two GPUs for a couple of years w/o any problems!

Windows 7 iCore 7/930
GPU's are not OCed
FAH GPU Client Version 6.30r2 Build May 20, 2010 (I've been running this client for a long time on both GPUs, is this the right client??)

What I've tried/checked:
- Re-updated latest NVIDIA drivers 280.26 with no change.
- Deleted WU/work folder etc. and rerun multiple times
- Temp is 46C idle
- Temp running client before EUE -- 61C

Any ideas???

Is there a test program for GPUs?

Should I reinstall the FAH GPU sw or upgrade to 6.3 beta?

Code: Select all


--- Opening Log file [September 30 19:08:15 UTC]


# Windows GPU Systray Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30r2

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\users\bvoss\appdata\roaming\folding@home-gpu 2
Arguments: -gpu 1

[19:08:15] - Ask before connecting: No
[19:08:15] - User name: UncleWilley (Team 80856)
[19:08:15] - User ID: 59B8E9DB330F9FC6
[19:08:15] - Machine ID: 3
[19:08:15]
[19:08:15] Gpu type=2 species=30.
[19:08:15] Work directory not found. Creating...
[19:08:15] Could not open work queue, generating new queue...
[19:08:15] Initialization complete
[19:08:15] - Preparing to get new work unit...
[19:08:15] Cleaning up work directory
[19:08:15] + Attempting to get work packet
[19:08:15] Passkey found
[19:08:15] Gpu type=2 species=30.
[19:08:15] - Connecting to assignment server
[19:08:16] - Successful: assigned to (171.67.108.31).
[19:08:16] + News From Folding@Home: Welcome to Folding@Home
[19:08:16] Loaded queue successfully.
[19:08:16] Gpu type=2 species=30.
[19:08:17] + Closed connections
[19:08:17]
[19:08:17] + Processing work unit
[19:08:17] Core required: FahCore_15.exe
[19:08:17] Core found.
[19:08:17] Working on queue slot 01 [September 30 19:08:17 UTC]
[19:08:17] + Working ...
[19:08:17]
[19:08:17] *------------------------------*
[19:08:17] Folding@Home GPU Core
[19:08:17] Version                2.20 (Tue Aug 2 15:33:05 PDT 2011)
[19:08:17] Build host             amoeba
[19:08:17] Board Type             NVIDIA/CUDA
[19:08:17] Core                   15
[19:08:17]
[19:08:17] Window's signal control handler registered.
[19:08:17] Preparing to commence simulation
[19:08:17] - Looking at optimizations...
[19:08:17] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[19:08:17] - Created dyn
[19:08:17] - Files status OK
[19:08:17] sizeof(CORE_PACKET_HDR) = 512 file=<>
[19:08:17] - Expanded 38128 -> 167707 (decompressed 439.8 percent)
[19:08:17] Called DecompressByteArray: compressed_data_size=38128 data_size=167707, decompressed_data_size=167707 diff=0
[19:08:17] - Digital signature verified
[19:08:17]
[19:08:17] Project: 11177 (Run 12, Clone 170, Gen 9)
[19:08:17]
[19:08:17] Assembly optimizations on if available.
[19:08:17] Entering M.D.
[19:08:19] Tpr hash work/wudata_01.tpr:  3019175444 496459859 426088452 4121980785 1664664548
[19:08:19] calling fah_main gpuDeviceId=0
[19:08:19] Working on ALZHEIMER'S DISEASE AMYLOID
[19:08:19] Client config found, loading data.
[19:08:20] Starting GUI Server
[19:09:20] Setting checkpoint frequency: 500000
[19:09:20] Completed         0 out of 50000000 steps (0%).
[19:09:20] mdrun_gpu returned 53
[19:09:20] Calculated & specified T inconsisitent
[19:09:20]
[19:09:20] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:09:23] CoreStatus = 7A (122)
[19:09:23] Sending work to server
[19:09:23] Project: 11177 (Run 12, Clone 170, Gen 9)
[19:09:23] - Read packet limit of 540015616... Set to 524286976.
[19:09:23] - Error: Could not get length of results file work/wuresults_01.dat
[19:09:23] - Error: Could not read unit 01 file. Removing from queue.
[19:09:23] - Preparing to get new work unit...
[19:09:23] Cleaning up work directory
[19:09:23] + Attempting to get work packet
[19:09:23] Passkey found
[19:09:23] Gpu type=2 species=30.
[19:09:23] - Connecting to assignment server
[19:09:24] - Successful: assigned to (171.67.108.31).
[19:09:24] + News From Folding@Home: Welcome to Folding@Home
[19:09:24] Loaded queue successfully.
[19:09:24] Gpu type=2 species=30.
[19:09:25] + Closed connections
[19:09:30]
[19:09:30] + Processing work unit
[19:09:30] Core required: FahCore_15.exe
[19:09:30] Core found.
[19:09:30] Working on queue slot 02 [September 30 19:09:30 UTC]
[19:09:30] + Working ...
[19:09:30]
[19:09:30] *------------------------------*
[19:09:30] Folding@Home GPU Core
[19:09:30] Version                2.20 (Tue Aug 2 15:33:05 PDT 2011)
[19:09:30] Build host             amoeba
[19:09:30] Board Type             NVIDIA/CUDA
[19:09:30] Core                   15
[19:09:30]
[19:09:30] Window's signal control handler registered.
[19:09:30] Preparing to commence simulation
[19:09:30] - Looking at optimizations...
[19:09:30] DeleteFrameFiles: successfully deleted file=work/wudata_02.ckp
[19:09:30] - Created dyn
[19:09:30] - Files status OK
[19:09:30] sizeof(CORE_PACKET_HDR) = 512 file=<>
[19:09:30] - Expanded 38128 -> 167707 (decompressed 439.8 percent)
[19:09:30] Called DecompressByteArray: compressed_data_size=38128 data_size=167707, decompressed_data_size=167707 diff=0
[19:09:30] - Digital signature verified
[19:09:30]
[19:09:30] Project: 11177 (Run 12, Clone 170, Gen 9)
[19:09:30]
[19:09:30] Assembly optimizations on if available.
[19:09:30] Entering M.D.
[19:09:32] Tpr hash work/wudata_02.tpr:  3019175444 496459859 426088452 4121980785 1664664548
[19:09:32] calling fah_main gpuDeviceId=0
[19:09:32] Working on ALZHEIMER'S DISEASE AMYLOID
[19:09:32] Client config found, loading data.
[19:09:33] Starting GUI Server
[19:10:33] Setting checkpoint frequency: 500000
[19:10:33] Completed         0 out of 50000000 steps (0%).
[19:10:33] mdrun_gpu returned 53
[19:10:33] Calculated & specified T inconsisitent
[19:10:33]
[19:10:33] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:10:37] CoreStatus = 7A (122)
[19:10:37] Sending work to server
[19:10:37] Project: 11177 (Run 12, Clone 170, Gen 9)
[19:10:37] - Read packet limit of 540015616... Set to 524286976.
[19:10:37] - Error: Could not get length of results file work/wuresults_02.dat
[19:10:37] - Error: Could not read unit 02 file. Removing from queue.
[19:10:37] - Preparing to get new work unit...
[19:10:37] Cleaning up work directory
[19:10:37] + Attempting to get work packet
[19:10:37] Passkey found
[19:10:37] Gpu type=2 species=30.
[19:10:37] - Connecting to assignment server
[19:10:37] - Successful: assigned to (171.67.108.31).
[19:10:37] + News From Folding@Home: Welcome to Folding@Home
[19:10:37] Loaded queue successfully.
[19:10:37] Gpu type=2 species=30.
[19:10:38] + Closed connections
[19:10:43]
[19:10:43] + Processing work unit
[19:10:43] Core required: FahCore_15.exe
[19:10:43] Core found.
[19:10:43] Working on queue slot 03 [September 30 19:10:43 UTC]
[19:10:43] + Working ...
[19:10:43]
[19:10:43] *------------------------------*
[19:10:43] Folding@Home GPU Core
[19:10:43] Version                2.20 (Tue Aug 2 15:33:05 PDT 2011)
[19:10:43] Build host             amoeba
[19:10:43] Board Type             NVIDIA/CUDA
[19:10:43] Core                   15
[19:10:43]
[19:10:43] Window's signal control handler registered.
[19:10:43] Preparing to commence simulation
[19:10:43] - Looking at optimizations...
[19:10:43] DeleteFrameFiles: successfully deleted file=work/wudata_03.ckp
[19:10:43] - Created dyn
[19:10:43] - Files status OK
[19:10:43] sizeof(CORE_PACKET_HDR) = 512 file=<>
[19:10:43] - Expanded 38128 -> 167707 (decompressed 439.8 percent)
[19:10:43] Called DecompressByteArray: compressed_data_size=38128 data_size=167707, decompressed_data_size=167707 diff=0
[19:10:43] - Digital signature verified
[19:10:43]
[19:10:43] Project: 11177 (Run 12, Clone 170, Gen 9)
[19:10:43]
[19:10:43] Assembly optimizations on if available.
[19:10:43] Entering M.D.
[19:10:45] Tpr hash work/wudata_03.tpr:  3019175444 496459859 426088452 4121980785 1664664548
[19:10:45] calling fah_main gpuDeviceId=0
[19:10:45] Working on ALZHEIMER'S DISEASE AMYLOID
[19:10:45] Client config found, loading data.
[19:10:46] Starting GUI Server
[19:11:48] Setting checkpoint frequency: 500000
[19:11:48] Completed         0 out of 50000000 steps (0%).
[19:11:48] mdrun_gpu returned 53
[19:11:48] Calculated & specified T inconsisitent
[19:11:48]
[19:11:48] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:11:52] CoreStatus = 7A (122)
[19:11:52] Sending work to server
[19:11:52] Project: 11177 (Run 12, Clone 170, Gen 9)
[19:11:52] - Read packet limit of 540015616... Set to 524286976.
[19:11:52] - Error: Could not get length of results file work/wuresults_03.dat
[19:11:52] - Error: Could not read unit 03 file. Removing from queue.
[19:11:52] - Preparing to get new work unit...
[19:11:52] Cleaning up work directory
[19:11:52] + Attempting to get work packet
[19:11:52] Passkey found
[19:11:52] Gpu type=2 species=30.
[19:11:52] - Connecting to assignment server
[19:11:52] - Successful: assigned to (171.67.108.31).
[19:11:52] + News From Folding@Home: Welcome to Folding@Home
[19:11:52] Loaded queue successfully.
[19:11:52] Gpu type=2 species=30.
[19:11:53] + Closed connections
[19:11:58]
[19:11:58] + Processing work unit
[19:11:58] Core required: FahCore_15.exe
[19:11:58] Core found.
[19:11:58] Working on queue slot 04 [September 30 19:11:58 UTC]
[19:11:58] + Working ...
[19:11:59]
[19:11:59] *------------------------------*
[19:11:59] Folding@Home GPU Core
[19:11:59] Version                2.20 (Tue Aug 2 15:33:05 PDT 2011)
[19:11:59] Build host             amoeba
[19:11:59] Board Type             NVIDIA/CUDA
[19:11:59] Core                   15
[19:11:59]
[19:11:59] Window's signal control handler registered.
[19:11:59] Preparing to commence simulation
[19:11:59] - Looking at optimizations...
[19:11:59] DeleteFrameFiles: successfully deleted file=work/wudata_04.ckp
[19:11:59] - Created dyn
[19:11:59] - Files status OK
[19:11:59] sizeof(CORE_PACKET_HDR) = 512 file=<>
[19:11:59] - Expanded 38128 -> 167707 (decompressed 439.8 percent)
[19:11:59] Called DecompressByteArray: compressed_data_size=38128 data_size=167707, decompressed_data_size=167707 diff=0
[19:11:59] - Digital signature verified
[19:11:59]
[19:11:59] Project: 11177 (Run 12, Clone 170, Gen 9)
[19:11:59]
[19:11:59] Assembly optimizations on if available.
[19:11:59] Entering M.D.
[19:12:01] Tpr hash work/wudata_04.tpr:  3019175444 496459859 426088452 4121980785 1664664548
[19:12:01] calling fah_main gpuDeviceId=0
[19:12:01] Working on ALZHEIMER'S DISEASE AMYLOID
[19:12:01] Client config found, loading data.
[19:12:01] Starting GUI Server
[19:13:03] Setting checkpoint frequency: 500000
[19:13:03] Completed         0 out of 50000000 steps (0%).
[19:13:03] mdrun_gpu returned 53
[19:13:03] Calculated & specified T inconsisitent
[19:13:03]
[19:13:03] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:13:07] CoreStatus = 7A (122)
[19:13:07] Sending work to server
[19:13:07] Project: 11177 (Run 12, Clone 170, Gen 9)
[19:13:07] - Read packet limit of 540015616... Set to 524286976.
[19:13:07] - Error: Could not get length of results file work/wuresults_04.dat
[19:13:07] - Error: Could not read unit 04 file. Removing from queue.
[19:13:07] - Preparing to get new work unit...
[19:13:07] Cleaning up work directory
[19:13:07] + Attempting to get work packet
[19:13:07] Passkey found
[19:13:07] Gpu type=2 species=30.
[19:13:07] - Connecting to assignment server
[19:13:07] - Successful: assigned to (171.67.108.31).
[19:13:07] + News From Folding@Home: Welcome to Folding@Home
[19:13:08] Loaded queue successfully.
[19:13:08] Gpu type=2 species=30.
[19:13:09] + Closed connections
[19:13:14]
[19:13:14] + Processing work unit
[19:13:14] Core required: FahCore_15.exe
[19:13:14] Core found.
[19:13:14] Working on queue slot 05 [September 30 19:13:14 UTC]
[19:13:14] + Working ...
[19:13:14]
[19:13:14] *------------------------------*
[19:13:14] Folding@Home GPU Core
[19:13:14] Version                2.20 (Tue Aug 2 15:33:05 PDT 2011)
[19:13:14] Build host             amoeba
[19:13:14] Board Type             NVIDIA/CUDA
[19:13:14] Core                   15
[19:13:14]
[19:13:14] Window's signal control handler registered.
[19:13:14] Preparing to commence simulation
[19:13:14] - Looking at optimizations...
[19:13:14] DeleteFrameFiles: successfully deleted file=work/wudata_05.ckp
[19:13:14] - Created dyn
[19:13:14] - Files status OK
[19:13:14] sizeof(CORE_PACKET_HDR) = 512 file=<>
[19:13:14] - Expanded 38128 -> 167707 (decompressed 439.8 percent)
[19:13:14] Called DecompressByteArray: compressed_data_size=38128 data_size=167707, decompressed_data_size=167707 diff=0
[19:13:14] - Digital signature verified
[19:13:14]
[19:13:14] Project: 11177 (Run 12, Clone 170, Gen 9)
[19:13:14]
[19:13:14] Assembly optimizations on if available.
[19:13:14] Entering M.D.
[19:13:16] Tpr hash work/wudata_05.tpr:  3019175444 496459859 426088452 4121980785 1664664548
[19:13:16] calling fah_main gpuDeviceId=0
[19:13:16] Working on ALZHEIMER'S DISEASE AMYLOID
[19:13:16] Client config found, loading data.
[19:13:16] Starting GUI Server
[19:14:19] Setting checkpoint frequency: 500000
[19:14:19] Completed         0 out of 50000000 steps (0%).
[19:14:21] mdrun_gpu returned 53
[19:14:22] Calculated & specified T inconsisitent
[19:14:22]
[19:14:22] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:14:28] CoreStatus = 7A (122)
[19:14:28] Sending work to server
[19:14:28] Project: 11177 (Run 12, Clone 170, Gen 9)
[19:14:28] - Read packet limit of 540015616... Set to 524286976.
[19:14:28] - Error: Could not get length of results file work/wuresults_05.dat
[19:14:28] - Error: Could not read unit 05 file. Removing from queue.
[19:14:28] EUE limit exceeded. Pausing 24 hours.

UncleWilley
 
Posts: 13
Joined: Fri Feb 20, 2009 10:08 pm

Re: GPU EUEs

Postby 7im » Sun Oct 02, 2011 11:17 pm

Upgrade to the newest v6 client.

Run the MemtestG80 memory tester, available on the fah web site.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: GPU EUEs

Postby UncleWilley » Mon Oct 03, 2011 4:22 am

Upgrading to GPU3 resolved the problem. Thanks for the recommendation.
UncleWilley
 
Posts: 13
Joined: Fri Feb 20, 2009 10:08 pm


Return to NVIDIA specific issues

Who is online

Users browsing this forum: No registered users and 2 guests

cron