core 17 temperature control with nvidia under linux

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
mattifolder

core 17 temperature control with nvidia under linux

Post by mattifolder »

I have configured my folding client under linux as described there https://folding.stanford.edu/home/chang ... -full-fah/ or there viewtopic.php?f=74&t=26533&p=266854#p266854. The Log shows the following configuration:

Code: Select all

...
19:32:19:         OS: Linux 3.19.0-26-generic x86_64
19:32:19:    OS Arch: AMD64
19:32:19:       GPUs: 1
19:32:19:      GPU 0: NVIDIA:5 GM204 [GeForce GTX 970]
19:32:19:       CUDA: 5.2
19:32:19:CUDA Driver: 7050
...
19:48:35:<config>
...
19:48:35:  <!-- Folding Slot Configuration -->
19:48:35:  <extra-core-args v='-forceasm -tmax=70 -twait=900'/>
...
19:48:35:  <!-- Folding Slots -->
...
19:32:19:  <slot id='1' type='GPU'>
...
19:32:19:    <extra-core-args v='-tmax=70 -twait=900'/>
...
19:32:19:  </slot>
19:48:35:</config>
Now with Core 17, 18 or 21 projects it seems that the options are not accepted. The process starts with theses options and is listed so in process list, but there is also a warning logged, that temperature control is disabled:

Code: Select all

19:34:08:WU00:FS01:Connecting to 171.67.108.200:80
19:34:08:WU00:FS01:Assigned to work server 171.64.65.98
19:34:08:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM204 [GeForce GTX 970] from 171.64.65.98
19:34:08:WU00:FS01:Connecting to 171.64.65.98:8080
19:34:09:WU00:FS01:Downloading 7.48MiB
19:34:12:WU00:FS01:Download complete
19:34:12:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9712 run:67 clone:32 gen:18 core:0x21 unit:0x00000067ab40416255b9b07824c8d8d6
19:34:12:WU00:FS01:Starting
19:34:12:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 2243 -checkpoint 3 -gpu 0 -gpu-vendor nvidia -tmax=70 -twait=900
19:34:12:WU00:FS01:Started FahCore on PID 3346
19:34:12:WU00:FS01:Core PID:3350
19:34:12:WU00:FS01:FahCore 0x21 started
19:34:12:WU00:FS01:0x21:*********************** Log Started 2015-08-31T19:34:12Z ***********************
19:34:12:WU00:FS01:0x21:Project: 9712 (Run 67, Clone 32, Gen 18)
19:34:12:WU00:FS01:0x21:Unit: 0x00000067ab40416255b9b07824c8d8d6
19:34:12:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
19:34:12:WU00:FS01:0x21:Machine: 1
19:34:12:WU00:FS01:0x21:Reading tar file core.xml
19:34:12:WU00:FS01:0x21:Reading tar file integrator.xml
19:34:12:WU00:FS01:0x21:Reading tar file system.xml
19:34:13:WU00:FS01:0x21:Reading tar file state.xml
19:34:13:WU00:FS01:0x21:Digital signatures verified
19:34:13:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
19:34:13:WU00:FS01:0x21:Version 0.0.11
19:34:35:WU00:FS01:0x21:Completed 0 out of 1280000 steps (0%)
19:34:35:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
...
Is this function not activated under linux or is my configuration not correct ?
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: core 17 temperature control with nvidia under linux

Post by 7im »

This is a very crude tool of last resort. You would be better off trying to use cool bits to manage temps to a finer grain.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: core 17 temperature control with nvidia under linux

Post by bruce »

davidcoton wrote: sudo nvidia-xconfig --cool-bits=12
reboot

Then fan control and clocking adjustments are available in nvidia-settings
mattifolder

Re: core 17 temperature control with nvidia under linux

Post by mattifolder »

I know cool-bits, nvidia-settings and nvidia-smi and already use it to adjust gpu clocks. Btw. there are temporarely high room temperatures, that not may to be planned. Scripted automation would produce an extra cpu overhead and riscs of instability. So I thougt, tmax and twait may be a simple solution. I only want to know, if it has to be functionally or not.
ChristianVirtual
Posts: 1596
Joined: Tue May 28, 2013 12:14 pm
Location: Tokyo

Re: core 17 temperature control with nvidia under linux

Post by ChristianVirtual »

To my understanding chips don't like frequent jump in temperature. Rather constant temps is what they can deal better with.
And even if you have a waiting time of whatever seconds, as soon the folding continues the temp rise again.
Better to ensure sufficient ambient temps/airflow; or if you want to get fancy replace the fans with a water-block
ImageImage
Please contribute your logs to http://ppd.fahmm.net
mattifolder

Re: core 17 temperature control with nvidia under linux

Post by mattifolder »

May someone please answer my question: are extra-core-args tmax and twait functionally under linux ?
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: core 17 temperature control with nvidia under linux

Post by 7im »

From the core 17 thread.

Code: Select all

This is a feature present in FahCore_17 version .52 and higher.
Until they release a newer version of the present Linux core (.46) which is unlikely since they have already moved on to core 18 and 21, the feature is not supported in Linux. As I posted earlier, coolbits is a better option, assuming a workable environment temp.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
mattifolder

Re: core 17 temperature control with nvidia under linux

Post by mattifolder »

Ok, thank you. That's just now an clearly answer. I thought, I had also core 17 projects under linux. But in my saved logs only core 18 (version 0.0.4) and 21 (version 0.0.3). One critically opinion I've so long, such log entries are bewildering:

Code: Select all

19:34:35:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
In my system that actually means: Temperature control disabled, because (although) I have an Nvidia GPU and configured tmax < 119 and twait >= 900.
I'm developing software by myself at business and I hate :ewink: programs that seem doing things, that they don't do.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: core 17 temperature control with nvidia under linux

Post by 7im »

I do not know if this feature works in core 18 or 21, or in what versions of those cores.

But turning off folding for 5 minutes (900 seconds) or more was wasteful, especially when checkpoints are saved every 5 frames (%). Each 5 minute shutdown could cost several minutes to several hours of folding, depending on average frame times, and when the last checkpoint was saved to disk.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: core 17 temperature control with nvidia under linux

Post by davidcoton »

mattifolder wrote: I hate :ewink: programs that seem doing things, that they don't do.
So do most of us. As I found when I was involved in software development, getting it right is not always the management priority :evil: :shock:
In this case software development is overloaded with stuff to benefit the science plus a lot of recent work on the server infrastructure. There are lots of bits left over from the initial v7 development (up to 7.4.4) which are not quite right, but have not been assigned enough priority to get them fixed -- there is no announced intention to release a 7.5.x client. That misleading message is probably not even high amongst the known issues :(
Image
mattifolder

Re: core 17 temperature control with nvidia under linux

Post by mattifolder »

7im wrote:But turning off folding for 5 minutes (900 seconds) or more was wasteful
Oh man, twait are seconds ! :eo Who reads correctly has really an advantage. :mrgreen: I really thought, that are milliseconds. You can forget (close) the thread. I'll fast deactivate that configuration option and solve my problem with appropriate reduced OC and cooling of the system when the summer is too hot. At the moment it nearly reaches to reduce GPU Core Clock of 5 to 10 Mhz and / or suspend CPU Folding.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: core 17 temperature control with nvidia under linux

Post by 7im »

Ja, Feinabstimmung und nicht die Grobabstimmung.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply