Guide For GPU3 BETA Client {Windows & Linux}

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: slegrand, Site Moderators, PandeGroup

15 - Troubleshooting The GPU3 BETA Client

Postby PantherX » Sun Sep 26, 2010 1:54 pm

Legend:
Green Text -> Specifically For Nvidia GPUs
Black Text -> Applicable To Both GPU Vendors


  • Troubleshooting The GPU3 BETA Client
Despite reading all the above notes, if for some reason, your Client still gives you errors, then you may want to first check the F@h Forum for that WU as some WUs in a Project can be faulty/bad (Details). If it is present, add your report to the preexisting thread along with your FAHlog by using the Code Button (Details) If it isn't present, you may create a new thread in this sub-forum -> Issues with a specific WU and post your FAHlog/FAHlog-Prev (Details) and an Admin/Mod will look it up in the database. If multiple errors are reported, then it will be marked as bad and you need not worry about your system. If it was able to complete successfully, then it is worth reading these suggestions (Some suggestions may not apply to you):

1) Have you overclocked your GPUs? If yes, then I suggest that you return them to stock settings and then see if the EUE error occurs. Please note that F@h stress your system in a different way when compared to different stress softwares hence a stable setting in stress program A might be unstable in F@h GPU Client or vice-versa.

2) If you are using a Nvidia Fermi GPU and received "CoreStatus = 63 (99)" message, consider the following:
A) If you were initially tweaking the system for CUDA apps, you may have added this Environment Variable "CUDA_FORCE_PTX_JIT=1" Deleting this will allow the GPU3 Client to work: <Click>
B) Check if the GPU is completely securely fixed inside the PCI-E Slot: <Click>
C) Make sure that the installed Drivers work with the F@H GPU Clients.


3A) [Nvidia] Can you run MemtestG80 without any errors at default? If yes, then repeat the test by increasing the number of runs and memory used by this method:
Code: Select all
Step 1: Start up a command prompt (start -> run -> cmd OR Win key+R -> type cmd OR Windows 7 users can browse to the directory and Shift+Right Click -> "Open command window here")

Step 2: Change to the directory where the memtest executable is located

Step 3: Type this:
memtestG80 128 1000

Step 4: The first value is the value of GPU RAM to be used while the second value is the number of times the test will run.

NOTE - If errors occurs and the value is 4 Billion (Exact number is: 4,294,967,284), then it is a time out so ensure that you are running the latest version. If this occurs even with the latest version, please contact ihaque who is a Member of Pande Group and the developer of this application.
If the error value is not the previously mentioned one, then the GPU is faulty. Consider returning your GPU frequencies to stock or even even lower to see if the error stops. If it doesn't then the GPU isn't producing scientifically valid data so consider changing the GPU or stop the F@h GPU Client on that GPU. (Discussion Thread)


3B) [ATI - 9.12 or higher Driver + ATI Stream SDK & Nvidia - 195 or higher Driver] Can you run MemtestCL without any errors at default? If yes, then repeat the test by increasing the number of runs and memory used by this method:
Code: Select all
Step 1: Start up a command prompt (start -> run -> cmd OR Win key+R -> type cmd OR Windows 7 users can browse to the directory and Shift+Right Click -> "Open command window here")

Step 2: Change to the directory where the MemtestCL executable is located

Step 3: Type this:
memtestCL 128 1000

Step 4: The first value is the value of GPU RAM to be used while the second value is the number of times the test will run, both can be changed so you can check your GPU

Step 5: Once it completes the test, it will show you the Final error count. 0 will indicate everything is fine while a non-zero digit may indicate instabilities.
If the error value is not zero (0), then the GPU is faulty. Consider returning your GPU frequencies to stock or even even lower to see if the error stops. If it doesn't then the GPU isn't producing scientifically valid data so consider changing the GPU or stop the F@h GPU Client on that GPU. (Discussion Thread)

4) It could be possible that the error is heat related so I suggest that you monitor the GPU temperature(s) by using GPU-Z or HWMonitor to ensure that it is at a safe level. If it isn't, then you have the following options:
Option 1: Manually increase the fan speed of your GPU(s) to ensure that the temperature falls to the safe level. (Please note that the safe level of temperature will vary due to personal choice. Personally, I like my GPU <75C when fully stressed while others prefer <80C and some don't mind <90C. However, if the temperature is too high, thermal protection will be activated and will automatically reduce the GPU frequencies to reduce its temperature in order to avoid hardware damage)
Option 2: Use this Nvidia Environment Variable: FAH_GPU_IDLE with a value between 5 and 10 (Guide 1, Guide 2) which might help in reducing the GPU temperature.

5) Some users complain about desktop lagging when performing basic tasks. Possible solutions are:
A) Enable Aero if you are using Windows Vista or Windows 7.
B) Use this Nvidia Environment Variable; FAH_GPU_IDLE and use a value between 5 and 10 (Guide 1, Guide 2)

6) If you are noticing a less GPU Usage and if you have a Classic/SMP2 Client installed, make sure that you set the GPU3's priority to low and the Classic/SMP2 to idle. With this method, the GPU3 will have a constant data stream to process and won't have to wait for the CPU thus it will keep the GPU at maximum load.

7) If you are experiencing FILE_IO_ERROR, you should do the following:
A) Run CHKDSK to ensure that the hard disk drive isn't faulty.
B) Make sure that the folder isn't being "shared" by another Client. If you have multiple Clients, please use separate folders for each.
C) Some Anti-Virus programs can interfere with the folding files. Make sure you add the folding file to the exception list.
D) You may not have write permission for that folder so check your permission level.
E) The drive may be full so consider freeing up some space.

8) Make sure that you are using correct flags. Incorrect ones may cause unexpected results. Here are some of the commonly used flags:
    -verbosity 9 = Makes the FAHlog more detailed which helps in troubleshooting the Client.
    -advmethods = Grants you access to WUs that are in the late BETA Stage. Your probability of a WU EUEing increases.
    -configonly = The Client will read/write the configurations and exit. No WU processing is done even if it is present in the queue.
    -send all = The Client will try to upload the wuresult file (if found) to the Server. Once it does so, it will exit and won't process any WU even if it is present in the queue.
    -oneunit = The Client will process the current WU and once it uploads the wuresult to the Server, it will exit without downloading a new WU.
9) If you suspect that it is your motherboard's fault, try to run the GPU in a different PCI-E Slots and monitor its progress, if it gives errors on all PCI-E Slots, then you can assume it is the motherboard (you need to switch the GPU to ensure that it is the motherboards' problem and not your GPU).

10) What drivers are you using? If you are using BETA Drives, uninstall them and try the recommended ones or search the Forum to find out what Drivers are working.

11) Is your Power Supply Unit working properly? A faulty PSU can cause hardware failure so always use a branded one and not a nameless one as F@h stresses your GPU.

12) Have you tweaked unknown variables in your motherboards' BIOS? If yes, then restore everything to default and try again.

13) If you are using Windows Remote Desktop Connection on the system, it will crash the GPU Client. If you have to access the system, you can use these freeware applications, Ultra VNC or TightVNC, since they won't crash the GPU Client.

14) Under some circumstances, you may get an error about cudart.dll so you may want to post your FAHlog here so we can figure out the problem. Coping the file(s) may or may not work.

15) If you are folding in Windows XP and experience slow clocks, please disable the screensaver (Discussion Thread).
Last edited by PantherX on Fri Jan 28, 2011 12:33 am, edited 3 times in total.
User avatar
PantherX
Super Moderator
 
Posts: 6256
Joined: Wed Dec 23, 2009 9:33 am

16 - Credits

Postby PantherX » Sun Sep 26, 2010 1:55 pm

  • Credits
I would like to extend my thanks to the following users who have helped me in writing this GPU guide: (no particular order)

VijayPande -> Who allowed me to write this GPU Guide & clarified the role of -forcegpu flag; P5-133XL -> Method for systray and multiple instances; stevehat1 -> Upgrading the console version; Sidicas -> GPU3 BETA Client on Linux; Hyperlife -> GPU3 BETA Client on Linux; Shelnutt2 -> GPU3 BETA Client on Linux; grunion -> Using dual GTX 400 Series GPU in one system; uncle_fungus -> making my guide sticky; ikarppin -> Testing with GTX 400 Series GPU and GTX 200 Series GPU in one system; Sidicas -> Provided point A for CoreStatus = 63 (99); rbpeake -> Provided point B for CoreStatus = 63 (99); uncle fuzzy -> CPU Usage required for multiple Fermi GPUs & GPU2 and GPU3 for Mixed GPUs; 7im -> Representing 6.32 in different ways; Leonardo -> Installation of multiple GPUs in one system (the easiest); braindancer -> Installation of multiple GPUs in one system (the easiest); Adam A. Wanderer -> FahCore priority; Speed -> Ultra VNC & TightVNC; Xerxes_Phi -> Some points on changing GPUs from ATI to NV; rhavern -> Suggested that my idea was worth pursuing (Supported Hardware) & solution to downclocking in Windows XP; friedrim -> helped me out in the GPU Messages present in FAHlog; Zagen30 -> Suggested a blurb for the -forcegpu flags under Supported Hardware; MtM -> Help with the blurb for ATI in Supported Hardware and difference between Ctrl + C and X Button; bruce -> Catching a typo in Supported Hardware; HaloJones -> Catching a typo in Supported Hardware; derrickmcc -> Catching a typo in Supported Hardware; BikerBry -> Provided the correct name of the GPU folder in %APPDATA%; sortofageek -> removing non-related posts to a new thread where users can freely discuss my GPU Guide; ihaque -> provided the official stance on GPU overclocking & why using recommended drivers are important; friedrim -> provided information on not using beta drivers; Xilikon -> Provided cause for FILE_IO_ERROR; PeddlerOfFlesh -> Provided cause for FILE_IO_ERROR.
Plus anyone else that I may have accidentally forgotten.
User avatar
PantherX
Super Moderator
 
Posts: 6256
Joined: Wed Dec 23, 2009 9:33 am

17 - Changelog

Postby PantherX » Mon Jan 16, 2012 5:16 pm

  • Changelog
Added new GPUs which can't fold FahCore_15 WUs. EDIT The Gt 420 was mistakenly "blacklisted" since there was a typo on the Nvidia CUDA Page.
Added rhavern's solution to downclocking in Windows XP (viewtopic.php?f=59&t=20160).


If you want to make a post, please do so in this thread -> Suggestions for GPU3 Guide For Windows & Link To Linux. Thanks for your understanding and cooperation.
User avatar
PantherX
Super Moderator
 
Posts: 6256
Joined: Wed Dec 23, 2009 9:33 am

Previous

Return to V6 GPU3 beta (including Fermi) OpenMM

Who is online

Users browsing this forum: No registered users and 0 guests