Page 1 of 2

Folding with Tesla K20C

Posted: Thu Jun 13, 2013 7:12 pm
by N0OA
Does anyone have experience folding with the Tesla K20C? How should it be setup and what experience do folks have with its PPD? I am currently getting a 37101 on a 7626 work unit configured as "client-type=BIGADV". Is it configured right? What can I expect for PPD out of this card?

Thanks in advance for any advice or thoughts...

N0OA

Re: Folding with Tesla K20C

Posted: Thu Jun 13, 2013 7:28 pm
by 7im
bigadv only applies to CPU work units, not GPU.

Not many have Tesla's and I've seen no PPD numbers posted, but there is this thread showing FAHBench performance against many other GPUs, so you can see relative performance. http://foldingforum.org/viewtopic.php?f=38&t=23440

Re: Folding with Tesla K20C

Posted: Fri Jun 14, 2013 4:55 pm
by N0OA
Thanks for the client-type correction. I didn't catch that in the V7 documentation. I will take a look at the link to see what to expect. The Tesla isn't the fastest card I have - but it runs very cool and has a nice compact form factor for the performance it seems to return.

-N0OA

Re: Folding with Tesla K20C

Posted: Fri Jun 14, 2013 7:25 pm
by Quisarious
As far as FAH is concerned, a K20C is a detuned gtx780. There are posted PPD estimates for the 780, just divide tpfs by ~0.65 (K20C runs at ~700 core clock, while the 780 will boost to 1000-1200) to get a good estimate for the tesla.

Re: Folding with Tesla K20C

Posted: Sat Aug 10, 2013 4:52 pm
by jaysenw
Hello;

I have 2 Tesla C1060's. I have not been able to successfully get them to work on FAH. The Tesla card begins folding, then I eventually stops at 99.99%. I have tried removing and replacing the slot. Deleting work units and restarting. The same problem occurs. Always gets to 99.99% on the GUI, never to the same percentage in the log.

I have attached to most recent snippet of the system log for that slot below. Does anyone know what this issue is and possibly how to fix it?

Code: Select all

07:07:34:WU02:FS01:Cleaning up
07:07:34:WU01:FS01:Connecting to assign-GPU.stanford.edu:80
07:07:34:WU01:FS01:News: Welcome to Folding@Home
07:07:34:WU01:FS01:Assigned to work server 171.67.108.21
07:07:34:WU01:FS01:Requesting new work unit for slot 01: READY gpu:1:GT200 [Tesla C1060] from 171.67.108.21
07:07:34:WU01:FS01:Connecting to 171.67.108.21:8080
07:07:35:WU01:FS01:Downloading 61.92KiB
07:07:35:WU01:FS01:Download complete
07:07:35:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:10501 run:162 clone:1 gen:1340 core:0x11 unit:0x00000b466652eda54b6ea7a700003f4b
07:07:35:WU01:FS01:Starting
07:07:35:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/G80/Core_11.fah/FahCore_11.exe -dir 01 -suffix 01 -version 703 -lifeline 2692 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
07:07:35:WU01:FS01:Started FahCore on PID 1524
07:07:35:WU01:FS01:Core PID:3524
07:07:35:WU01:FS01:FahCore 0x11 started
07:07:36:WU01:FS01:0x11:
07:07:36:WU01:FS01:0x11:*------------------------------*
07:07:36:WU01:FS01:0x11:Folding@Home GPU Core
07:07:36:WU01:FS01:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
07:07:36:WU01:FS01:0x11:
07:07:36:WU01:FS01:0x11:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
07:07:36:WU01:FS01:0x11:Build host: amoeba
07:07:36:WU01:FS01:0x11:Board Type: Nvidia
07:07:36:WU01:FS01:0x11:Core      : 
07:07:36:WU01:FS01:0x11:Preparing to commence simulation
07:07:36:WU01:FS01:0x11:- Looking at optimizations...
07:07:36:WU01:FS01:0x11:DeleteFrameFiles: successfully deleted file=01/wudata_01.ckp
07:07:36:WU01:FS01:0x11:- Created dyn
07:07:36:WU01:FS01:0x11:- Files status OK
07:07:36:WU01:FS01:0x11:- Expanded 62895 -> 336763 (decompressed 535.4 percent)
07:07:36:WU01:FS01:0x11:Called DecompressByteArray: compressed_data_size=62895 data_size=336763, decompressed_data_size=336763 diff=0
07:07:36:WU01:FS01:0x11:- Digital signature verified
07:07:36:WU01:FS01:0x11:
07:07:36:WU01:FS01:0x11:Project: 10501 (Run 162, Clone 1, Gen 1340)
07:07:36:WU01:FS01:0x11:
07:07:36:WU01:FS01:0x11:Assembly optimizations on if available.
07:07:36:WU01:FS01:0x11:Entering M.D.
07:07:41:WU01:FS01:0x11:Tpr hash 01/wudata_01.tpr:  3117068995 2761589855 2641796126 1345936202 2531672404
07:07:41:WU01:FS01:0x11:
07:07:41:WU01:FS01:0x11:Calling fah_main args: 14 usage=100
07:07:41:WU01:FS01:0x11:
07:07:42:WU01:FS01:0x11:Working on Protein
07:07:43:WU01:FS01:0x11:Client config unavailable.
07:07:43:WU01:FS01:0x11:Starting GUI Server
07:08:48:WU01:FS01:0x11:Completed 1%
07:09:53:WU01:FS01:0x11:Completed 2%
Thanks for any input you may have...


Jaysen

Re: Folding with Tesla K20C

Posted: Sat Aug 10, 2013 5:40 pm
by bruce
When FAH's control application eventually stops at 99.99% there's a definite problem with your GPU or it's drivers. If you look at the log, you will find that folding stopped before reaching 99.99%. You'll also find that there was a driver reset error logged by Windows (if you're running Windows) at that same time and you may have seen the message. Once a driver reset occurs, FAH cannot continue processing but FAH's Control application continues to (incorrectly) report progress until it reaches 99.99%.

Driver resets are caused by a GPU that has hung. That hang may be due to overclocking, due to overheating, due to flaky drivers or due to defective hardware. You need to figure out what's wrong with your system.

Re: Folding with Tesla K20C

Posted: Sat Aug 10, 2013 10:10 pm
by GreyWhiskers
If it just stopped because of a GPU/driver reset, the system should be able to recover by rebooting the computer. Upon restart, the FAH software should recover to the last checkpoint. It wouldn't be a bad idea to let the system "rest" for a little time (unspecified duration) until restart if the root cause was thermal.

If this doesn't work, then there was something else going on that caused the client to purge the files.

Was the log snippet posted above the bottom of the log when the FAH software quit? If not, it would be interesting to see from the very end of the log any warnings or errors that had been logged.

07:07:36:WU01:FS01:0x11:Project: 10501 (Run 162, Clone 1, Gen 1340)

Re: Folding with Tesla K20C

Posted: Fri Aug 16, 2013 1:22 am
by jaysenw
Hmmm. I have tried the rebooting thing, but it still is unable to recover. I'll install some updated drivers and see what the dealio is. If it IS flaky hardware, do you guys recommend software that I can use to test the load and use of the card? I'm thinking like a CPU torture test but for Tesla's instead...

I'll post when I find out my next step. Thanks for the recommendations so far.

:)

Jaysen

Re: Folding with Tesla K20C

Posted: Fri Aug 16, 2013 4:53 am
by N0OA
Hi Jaysenw,

It looks like from your log that you are downloading Core_11. What client-type do you have defined for your slot with the Tesla in it. I would suggest that you set the client-type to advanced so that you use the core_17 which will run much better on the Tesla cards. I am running the Core_17 on my Tesla K20C without any issues at all.

N0OA

Re: Folding with Tesla K20C

Posted: Fri Aug 16, 2013 4:27 pm
by AndyE
N0OA,
would you mind sharing some perf numbers of your K20C?

thanks,
Andy

Re: Folding with Tesla K20C

Posted: Sat Aug 17, 2013 5:58 am
by bruce
That gpu uses the GK110 which really shouldn't be getting assignments for FahCore_11. I would have expected assignments for FahCore_15 prior to setting the client-type to advanced and FahCore_17 after.

You didn't answer my (implied) question: Is this Windows or Linux?

Is the file GPUs.txt present, and if so when was it created?

Re: Folding with Tesla K20C

Posted: Sat Aug 17, 2013 5:20 pm
by Joe_H
@jaysenw - Could you post the beginning of your log that shows the system configuration, etc.

Your questions are about folding with a Tesla C1060 which is based on a different GPU that the Tesla K20C. It appears from what I read on wikipedia to be based on the same GPU as the GTX 285, and recommendations for settings to use and which projects it will fold well will be different.

Re: Folding with Tesla K20C

Posted: Tue Aug 20, 2013 5:15 am
by jaysenw
Sure thing! One i power down theist em and reboot ill send entire log.

With the hardware recommendations made earlier, I custom built a reducer fan and have been able to get the card balanced at 89 degrees under full load. With this, it folds a work unit in about 65 minutes. Is this a reasonable speed given the processor?

I'll get the logs tomorrow and tell you guys when I get a chance to tommowow after school, no rest fortune wicked med students...

Re: Folding with Tesla K20C

Posted: Thu Aug 22, 2013 4:51 am
by Jesse_V
Some projects consist of workunits that take longer than workunits from other projects. This can be due to different protein sizes or complexity, or for other reasons. Points Per Day is often a much more accurate yardstick.

Re: Folding with Tesla K20C

Posted: Thu Aug 22, 2013 7:14 am
by n_w95482
jaysenw wrote:Sure thing! One i power down theist em and reboot ill send entire log.

With the hardware recommendations made earlier, I custom built a reducer fan and have been able to get the card balanced at 89 degrees under full load. With this, it folds a work unit in about 65 minutes. Is this a reasonable speed given the processor?

I'll get the logs tomorrow and tell you guys when I get a chance to tommowow after school, no rest fortune wicked med students...
Hmm, that's a bit warm for that card to be running. From what I've noticed, Tesla cards are usually underclocked compared to their GeForce equivalent, presumably to maximize stability. That, in turn, should lower temperatures.

How is airflow in the case/around the card? I'd also check to see if the card's heatsink and fan need to be cleaned out, and possibly apply fresh thermal paste.