Just found out a feat that's really helpful in Linux crunching (works for Folding, Crunching, Benchmarks).
I noticed certain Nvidia RTX GPUs would be lost at random, and I couldn't figure out why.
I thought it were power fluctuations, so the method I used was to power cap the GPU to a very low value (usually 125-127Watts), and the situation would be resolved, but at the cost of lower boost frequencies.
However, I recently found out that a lost card is more than likely the cause of the GPU load decreasing, and the GPU overclocking hitting too high frequency.
That's when I found out about the LGC syntaxis for nvidia-smi.
It allows my GPU to run higher or lower power profiles, without altering the GPU frequency.
Locking the GPU frequency allows me to increase power consumption, and so far I haven't lost a GPU yet.
Here are a few helpful Nvidia GPU commands in Linux Terminal:
Power cap Nvidia GPU in Linux on all GPUs (to eg:133W):sudo nvidia-smi -pl 133
Or Power cap certain Nvidia GPUs (only changing GPU 1 and 3, not changing GPU 0, 2 and others):sudo nvidia-smi -i 1,3 -pl 133
Lock Graphics Card Frequency to not exceed 1870Mhz:sudo nvidia-smi -lgc 1870
Set LGC upper and lower floor on GPU 0 and 3 frequency lock:sudo nvidia-smi -i 0,3 -lgc 1870,1935
Release lock on all GPUs, or release on GPU2:sudo nvidia-smi -rgc
sudo nvidia-smi -i 2 -rgc
- Setting only the upper floor, will prevent the GPU from hitting boost frequencies that could damage the card, or disconnect.
- Setting upper and lower floor, will help run small WUs (WUs that don't tax the GPU much, and cause the GPU to run at lower frequencies like the famous 1350Mhz lock). Notice that you can't set the lower floor (=lowest aimed GPU frequency), without setting the upper floor.
- Setting the upper or lower floor too high (eg: 2175Mhz or 9999Mhz), in most cases the GPU will act as if without floor. It won't hit the frequencies anyway.
But if you occasionally see a lost GPU, or boost frequencies that cause errors, LGC could help increase stability.
- Setting the lower floor too low (eg: 700Mhz) does no damage, however if the task is completed, there is a chance that the GPU will remain in boost @ the 700Mhz until turned off, thus wasting more electricity between WUs (they usually drop to idle frequencies when there's less than a few percent load).
- Setting the upper floor too low, causes the GPU to just run at a slower speed. There's no good purpose for this other than to stabilize the GPU frequency.
Like, if your frequency is fluctuating between 1865 and 1905Mhz, setting the upper floor to eg: 1875Mhz (=somewhere in the middle) could stabilize the frequency, as during peaks, the GPU 'saves' the power it would use to boost the frequency and use it during when the load is higher, and more power is needed.
A well balanced system will level off the GPU frequency.
Stable GPU frequencies might actually benefit crunching data; but there's no documentation or proof that this is true.
All I know is that the GPU doesn't up or down throttle the chips, so that's got to account for some latency reduction?
I'll leave that to a more experienced user to fill out the blanks here...
post edited by ProDigit - 2020/04/11 01:28:57