GPU clients crash and burn under a v7.1.52 service

Moderators: Site Moderators, FAHC Science Team

Post Reply
rab38505
Posts: 6
Joined: Fri Apr 11, 2008 11:22 pm

GPU clients crash and burn under a v7.1.52 service

Post by rab38505 »

Hi, I just reloaded the OS on my computer and loaded the new v7 client from the Windows installer. I set it to use SMP and GPU's. The SMP client seems to be working fine, but both GPU clients are crashing. GPU 0 is a GeForce GTX 275 and GPU 1 is a GeForce GTX 460. Both have been running for over a year on the previous GPU client with no problems.

Here's the log for the GTX 275:

Code: Select all

08:44:28:WU01:FS00:Connecting to 171.67.108.21:8080
08:44:28:WU03:FS00:Connecting to assign-GPU.stanford.edu:80
08:44:29:WU03:FS00:News: Welcome to Folding@Home
08:44:29:WU03:FS00:Assigned to work server 171.67.108.21
08:44:29:WU03:FS00:Requesting new work unit for slot 00: READY gpu:0:"GT200b [GeForce GTX 275]" from 171.67.108.21
08:44:29:WU03:FS00:Connecting to 171.67.108.21:8080
08:44:29:WU01:FS00:Upload complete
08:44:29:WU01:FS00:Server responded WORK_ACK (400)
08:44:29:WU01:FS00:Cleaning up
08:44:29:WU03:FS00:Downloading 61.86KiB
08:44:30:WU03:FS00:Download complete
08:44:30:WU03:FS00:Received Unit: id:03 state:DOWNLOAD error:OK project:10504 run:331 clone:0 gen:287 core:0x11 unit:0x0000030d6652eda54b75b1b600008d90
08:44:30:WU03:FS00:Starting
08:44:30:WU03:FS00:Running FahCore: d:\programs\FAHClient7/FAHCoreWrapper.exe d:/FAHClient7Data/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/G80/Core_11.fah/FahCore_11.exe -dir 03 -suffix 01 -version 701 -lifeline 1784 -checkpoint 15 -gpu 0 -service
08:44:30:WU03:FS00:Started FahCore on PID 3048
08:44:30:WU03:FS00:Core PID:3128
08:44:30:WU03:FS00:FahCore 0x11 started
08:44:31:WU03:FS00:0x11:
08:44:31:WU03:FS00:0x11:*------------------------------*
08:44:31:WU03:FS00:0x11:Folding@Home GPU Core
08:44:31:WU03:FS00:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
08:44:31:WU03:FS00:0x11:
08:44:31:WU03:FS00:0x11:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
08:44:31:WU03:FS00:0x11:Build host: amoeba
08:44:31:WU03:FS00:0x11:Board Type: Nvidia
08:44:31:WU03:FS00:0x11:Core      : 
08:44:31:WU03:FS00:0x11:Preparing to commence simulation
08:44:31:WU03:FS00:0x11:- Looking at optimizations...
08:44:31:WU03:FS00:0x11:DeleteFrameFiles: successfully deleted file=03/wudata_01.ckp
08:44:31:WU03:FS00:0x11:- Created dyn
08:44:31:WU03:FS00:0x11:- Files status OK
08:44:31:WU03:FS00:0x11:- Expanded 62828 -> 336799 (decompressed 536.0 percent)
08:44:31:WU03:FS00:0x11:Called DecompressByteArray: compressed_data_size=62828 data_size=336799, decompressed_data_size=336799 diff=0
08:44:31:WU03:FS00:0x11:- Digital signature verified
08:44:31:WU03:FS00:0x11:
08:44:31:WU03:FS00:0x11:Project: 10504 (Run 331, Clone 0, Gen 287)
08:44:31:WU03:FS00:0x11:
08:44:31:WU03:FS00:0x11:Assembly optimizations on if available.
08:44:31:WU03:FS00:0x11:Entering M.D.
08:44:37:WU03:FS00:0x11:Tpr hash 03/wudata_01.tpr:  1279372263 2054781382 2694799241 3133399893 1444846642
08:44:37:WU03:FS00:0x11:
08:44:37:WU03:FS00:0x11:Calling fah_main args: 14 usage=100
08:44:37:WU03:FS00:0x11:
08:44:37:WU03:FS00:0x11:mdrun_gpu returned 
08:44:37:WU03:FS00:0x11:Going to send back what have done -- stepsTotalG=0
08:44:37:WU03:FS00:0x11:Work fraction=0.0000 steps=0.
08:44:41:WU03:FS00:0x11:logfile size=0 infoLength=0 edr=0 trr=25
08:44:41:WU03:FS00:0x11:+ Opened results file
08:44:41:WU03:FS00:0x11:- Writing 635 bytes of core data to disk...
08:44:41:WU03:FS00:0x11:Done: 123 -> 123 (compressed to 100.0 percent)
08:44:41:WU03:FS00:0x11:  ... Done.
08:44:41:WU03:FS00:0x11:DeleteFrameFiles: successfully deleted file=03/wudata_01.ckp
08:44:41:WU03:FS00:0x11:
08:44:41:WU03:FS00:0x11:Folding@home Core Shutdown: UNSTABLE_MACHINE
08:44:41:WU03:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
08:44:41:WU03:FS00:Sending unit results: id:03 state:SEND error:FAULTY project:10504 run:331 clone:0 gen:287 core:0x11 unit:0x0000030d6652eda54b75b1b600008d90
08:44:41:WU03:FS00:Uploading 635B to 171.67.108.21
08:44:41:WU03:FS00:Connecting to 171.67.108.21:8080
08:44:42:WU03:FS00:Upload complete
08:44:42:WU03:FS00:Server responded WORK_ACK (400)
08:44:42:WU03:FS00:Cleaning up
Here's the log for the 460:

Code: Select all

08:58:13:WU00:FS01:Starting
08:58:13:WU00:FS01:Running FahCore: d:\programs\FAHClient7/FAHCoreWrapper.exe d:/FAHClient7Data/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 00 -suffix 01 -version 701 -lifeline 1784 -checkpoint 15 -gpu 1 -service
08:58:13:WU00:FS01:Started FahCore on PID 1444
08:58:13:WU00:FS01:Core PID:1256
08:58:13:WU00:FS01:FahCore 0x15 started
08:58:14:WU00:FS01:0x15:
08:58:14:WU00:FS01:0x15:*------------------------------*
08:58:14:WU00:FS01:0x15:Folding@Home GPU Core
08:58:14:WU00:FS01:0x15:Version                2.22 (Thu Dec 8 17:08:05 PST 2011)
08:58:14:WU00:FS01:0x15:Build host             SimbiosNvdWin7
08:58:14:WU00:FS01:0x15:Board Type             NVIDIA/CUDA
08:58:14:WU00:FS01:0x15:Core                   15
08:58:14:WU00:FS01:0x15:GPU device info vendor=0 device=0 name=NA match=0 deviceId=1
08:58:14:WU00:FS01:0x15:
08:58:14:WU00:FS01:0x15:Window's signal control handler registered.
08:58:14:WU00:FS01:0x15:Preparing to commence simulation
08:58:14:WU00:FS01:0x15:- Ensuring status. Please wait.
08:58:23:WU00:FS01:0x15:- Looking at optimizations...
08:58:23:WU00:FS01:0x15:- Working with standard loops on this execution.
08:58:23:WU00:FS01:0x15:- Previous termination of core was improper.
08:58:23:WU00:FS01:0x15:- Going to use standard loops.
08:58:23:WU00:FS01:0x15:- Files status OK
08:58:23:WU00:FS01:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
08:58:23:WU00:FS01:0x15:- Expanded 145445 -> 660994 (decompressed 454.4 percent)
08:58:23:WU00:FS01:0x15:Called DecompressByteArray: compressed_data_size=145445 data_size=660994, decompressed_data_size=660994 diff=0
08:58:23:WU00:FS01:0x15:- Digital signature verified
08:58:23:WU00:FS01:0x15:
08:58:23:WU00:FS01:0x15:Project: 8020 (Run 5, Clone 269, Gen 57)
08:58:23:WU00:FS01:0x15:
08:58:23:WU00:FS01:0x15:Entering M.D.
08:58:25:WU00:FS01:0x15:Tpr hash 00/wudata_01.tpr:  205948020 226739098 28531194 2781083651 111202163
08:58:25:WU00:FS01:0x15:GPU device info: vendor=0 device=0 name=<NA> match=0
08:58:25:WU00:FS01:0x15:Working on Gromacs Runs On Most of All Computer Systems
08:58:25:WU00:FS01:0x15:Client config unavailable.
08:58:25:WU00:FS01:FahCore returned: UNKNOWN_ENUM (-1 = 0xffffffff)
08:58:25:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
Are there any extra flags that I need to set for the GPU clients to make them run correctly? I remember with the previous clients that I had to set a flag for the GPU microarchitecture (g80 for the 275 and fermi for the 460.) Is this still required? Also, is there still an analouge for the -bigadv flag from previous clients? I'm afraid I've been out of the loop on the state of F@H for a while, so I'm not familiar with any tweeks that may be required to get things running optimally (or running at all) with the new client. If there's an FAQ somewhere that answers these questions, please let me know.

Thanks,
rab38505
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU clients crash and burn under v 7.1.52

Post by bruce »

An UNSTABLE_MACHINE error occurs in the FahCore, not in the client, so V7.1.52 has nothing to do with it. It's most likely an overclocking or overheating or some other kind of actual hardware error.

I looked up Project: 10504 (Run 331, Clone 0, Gen 287) and there's a record of the error you had (earning 0 points) but the WU was reissued and someone else finished it successfully (for 587 points) so it's not a bad WU.

Project: 8020 (Run 5, Clone 269, Gen 57) has not been returned by anyone, so we can't draw any conclusions, yet.

The V6 client had a -forcegpu switch which was needed for certain GPUs. V7 uses an entirely different system called whitelisting which is done centrally on the Stanford servers.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: GPU clients crash and burn under v 7.1.52

Post by 7im »

What driver version did you use? Because GPU functionality is highly dependent on the driver.

Also, in V7, the default settings are the recommended settings for most people. It was designed to need little or no tweaking after installation.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
rab38505
Posts: 6
Joined: Fri Apr 11, 2008 11:22 pm

Re: GPU clients crash and burn under v 7.1.52

Post by rab38505 »

GPU's are currently not OC'd at all, as I just reinstalled Windows and haven't installed anything to tweak the clocks yet. They had both previously been running fine with moderate OC's under the v6 clients. It did not appear that they ever even began to execute anything on the GPU's and both are quite cool, so heat is definitely not an issue.

I'm using the newest CUDA x64 driver (8.17.13.132.) I downloaded it from the CUDA developer tools section of the NVidia website about 2 days ago.

I remember with the old clients, they had to run the GPU client under the user's session (because Windows would only allow the active session to access the GPU(s).) Is this still the case? When I set up the client, I did set it to run as a service, so that could be the problem.

I noticed just now that FAHControl says "CUDA: Not Detected" under the "System Info" tab, though I have verified that CUDA works just fine with other applications (including the samples included with the CUDA SDK.) I have CUDA 4.2 drivers installed, along with the CUDA 4.2 SDK. I verified that I can run CUDA apps successfully on both cards.
Joe_H
Site Admin
Posts: 7870
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: GPU clients crash and burn under v 7.1.52

Post by Joe_H »

rab38505 wrote:I remember with the old clients, they had to run the GPU client under the user's session (because Windows would only allow the active session to access the GPU(s).) Is this still the case? When I set up the client, I did set it to run as a service, so that could be the problem.
Yes, that is the problem. The limitation is not the folding client. So just as you had to fold under the user session with V6, you still need to do so with V7. The limitation was imposed by MS starting with Windows Vista and still applies when doing GPU folding with Windows 7. You should be able to fold after reinstalling the V7 client not set up as a service.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
rab38505
Posts: 6
Joined: Fri Apr 11, 2008 11:22 pm

Re: GPU clients crash and burn under v 7.1.52

Post by rab38505 »

Ok, thanks, Joe. I figured that was probably it. While I was aware it was an issue with Windows itself, I wasn't sure if they had changed it in Win 7 or not. It seems like something they'll need to change eventually with the increasing popularity of GPGPU usage. Doing GPGPU stuff from a service doesn't seem like an unreasonable thing for people to want to do, especially on compute nodes where jobs usually aren't run interactively.

Edit: Yep, that fixed it. There should probably be a note on that installer page that the service option cannot be used with GPU's on Win 7 or Vista.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU clients crash and burn under v 7.1.52

Post by bruce »

Windows disabled GPUs in a service as a security improvement. Apparently somebody figured out how to bypass normal CPU-based security measures by hiding a program in a service and then using a GPU to do its processing out of sight of the security protocols. (I have heard no facts; that's just a SWAG. If somebody wants to research the issue, I'd be happy to get a real explanation.) Anyway, I don't expect MS will be "fixing it"

Both V6 and V7 are at the mercy of the security restrictions of the OS. They're capable of running on whatever GPUs are supported by CUDA/OpenGL -- or will be whenever Stanford ports the FahCores for the GPUs to native Linux/OS-X.

EDIT: Updated the title to reflect the real issue.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: GPU clients crash and burn under a v7.1.52 service

Post by 7im »

Should have taken the crash and burn part out of the title as well. Problem doesn't have anything yo do with Fah.

And the windows install guide does talk about GPUs not running as a service. ;)
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
rab38505
Posts: 6
Joined: Fri Apr 11, 2008 11:22 pm

Re: GPU clients crash and burn under a v7.1.52 service

Post by rab38505 »

Bruce, IIRC, it had more to do with the desire to isolate services from potentially malicious interactive applications than anything else. In Win XP, the first user who logged on ran in session 0, along with all of the services. This made it easier for a malicious application running in session 0 (which most of them would have been on a typical home PC) to hack a service running as SYSTEM in order to run code with the access level of SYSTEM. Win Vista made services still run in session 0 and made the first interactive session be new, separate session (session 1.) IIRC, even back in XP, the graphics hardware was only made available to one session at a time (whichever session was currently open on the console.) This meant that services (in session 0) couldn't access the GPU hardware (which was being used by session 1.) IIRC, the inability to use CUDA from services was more of a side-effect of this than the actual desired behavior, since GPGPU was still pretty new and not really in wide-scale use back when Vista was in development. At any rate, NVidia apparently has actually had a work-around for this in the Tesla drivers since early 2010, but they don't have it in the GeForce/Quadro drivers.

7im, yeah, I see that now. When you have an msi installer, though, I'm guessing most users will be like me and run the installer w/o reading the several-page-long guide first, especially when it's not linked by the installer link like it used to be. Adding a brief note that the service option doesn't support GPU clients to the screen where you select the type of installation might help to prevent confusion in the future.

By the way, thanks for making everything go in the correct folders in the v7 client. That makes life easier for those of us who use roaming profiles and such things. Lots of programs still don't bother to correctly support roaming profile setups, especially mandatory roaming profiles. Even Visual Studio itself didn't correctly support mandatory roaming profiles last time I tried to use it with them.
Last edited by rab38505 on Thu Jun 21, 2012 8:24 pm, edited 1 time in total.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: GPU clients crash and burn under a v7.1.52 service

Post by 7im »

You recall part of it correctly. But where did that desire to isolate services from the graphics system come from? It was to improve security as bruce said, among other things. Same difference. ;)
In Windows XP®, Windows Server® 2003, and earlier versions of the Windows® operating system, all services run in the same session as the first user who logs on to the console. This session is called Session 0. Running services and user applications together in Session 0 poses a security risk because services run at elevated privilege and therefore are targets for malicious agents that are looking for a means to elevate their own privilege levels.

The Windows Vista® and Windows Server® 2008 operating systems mitigate this security risk by isolating services in Session 0 and making Session 0 non-interactive. In Windows Vista and Windows Server 2008, only system processes and services run in Session 0. The first user logs on to Session 1, and subsequent users log on to subsequent sessions. This approach means that services never run in the same session as users' applications and are therefore protected from attacks that originate in application code.*
*http://msdn.microsoft.com/en-us/library/bb756986.aspx
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
rab38505
Posts: 6
Joined: Fri Apr 11, 2008 11:22 pm

Re: GPU clients crash and burn under a v7.1.52 service

Post by rab38505 »

whoops, sorry, I didn't see you had already responded before I added to my previous post.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: GPU clients crash and burn under a v7.1.52 service

Post by 7im »

rab38505 wrote:...
7im, yeah, I see that now. When you have an msi installer, though, I'm guessing most users will be like me and run the installer w/o reading the several-page-long guide first, especially when it's not linked by the installer link like it used to be. Adding a brief note that the service option doesn't support GPU clients to the screen where you select the type of installation might help to prevent confusion in the future.
Agreed. We have a feature request ticket open to add a note about GPU vs. Service to this installer screen...

Image

We also have a request ticket to improve the feedback in the log file about this problem in the event that someone did read the warning that we are adding, or that they added the service option manually.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply