EVGA

Lock Graphics Card terminal command (linux) for Nvidia GPUs (overclocking)

Author
ProDigit
iCX Member
  • Total Posts : 465
  • Reward points : 0
  • Joined: 2019/02/20 14:04:37
  • Status: offline
  • Ribbons : 4
2020/04/11 00:03:53 (permalink)
Just found out a feat that's really helpful in Linux crunching (works for Folding, Crunching, Benchmarks).

I noticed certain Nvidia RTX GPUs would be lost at random, and I couldn't figure out why.
I thought it were power fluctuations, so the method I used was to power cap the GPU to a very low value (usually 125-127Watts), and the situation would be resolved, but at the cost of lower boost frequencies.
However, I recently found out that a lost card is more than likely the cause of the GPU load decreasing, and the GPU overclocking hitting too high frequency.
 
That's when I found out about the LGC syntaxis for nvidia-smi.
It allows my GPU to run higher or lower power profiles, without altering the GPU frequency.
Locking the GPU frequency allows me to increase power consumption, and so far I haven't lost a GPU yet.
 
Here are a few helpful Nvidia GPU commands in Linux Terminal:
 
Power cap Nvidia GPU in Linux on all GPUs (to eg:133W):
sudo nvidia-smi -pl 133

 
Or Power cap certain Nvidia GPUs (only changing GPU 1 and 3, not changing GPU 0, 2 and others):
sudo nvidia-smi -i 1,3 -pl 133

 
Lock Graphics Card Frequency to not exceed 1870Mhz:
sudo nvidia-smi -lgc 1870

 
Set LGC upper and lower floor on GPU 0 and 3 frequency lock:
sudo nvidia-smi -i 0,3 -lgc 1870,1935

 
Release lock on all GPUs, or release on GPU2:
sudo nvidia-smi -rgc 

sudo nvidia-smi -i 2 -rgc 

 
- Setting only the upper floor, will prevent the GPU from hitting boost frequencies that could damage the card, or disconnect.
- Setting upper and lower floor, will help run small WUs (WUs that don't tax the GPU much, and cause the GPU to run at lower frequencies like the famous 1350Mhz lock). Notice that you can't set the lower floor (=lowest aimed GPU frequency), without setting the upper floor. 
- Setting the upper or lower floor too high (eg: 2175Mhz or 9999Mhz), in most cases the GPU will act as if without floor. It won't hit the frequencies anyway.
But if you occasionally see a lost GPU, or boost frequencies that cause errors, LGC could help increase stability.
- Setting the lower floor too low (eg: 700Mhz) does no damage, however if the task is completed, there is a chance that the GPU will remain in boost @ the 700Mhz until turned off, thus wasting more electricity between WUs (they usually drop to idle frequencies when there's less than a few percent load).
- Setting the upper floor too low, causes the GPU to just run at a slower speed. There's no good purpose for this other than to stabilize the GPU frequency.
Like, if your frequency is fluctuating between 1865 and 1905Mhz, setting the upper floor to eg: 1875Mhz (=somewhere in the middle) could stabilize the frequency, as during peaks, the GPU 'saves' the power it would use to boost the frequency and use it during when the load is higher, and more power is needed.
A well balanced system will level off the GPU frequency.
Stable GPU frequencies might actually benefit crunching data; but there's no documentation or proof that this is true.
All I know is that the GPU doesn't up or down throttle the chips, so that's got to account for some latency reduction?
I'll leave that to a more experienced user to fill out the blanks here...
post edited by ProDigit - 2020/04/11 01:28:57
#1

3 Replies Related Threads

    yodap
    CLASSIFIED Member
    • Total Posts : 4642
    • Reward points : 0
    • Joined: 2011/05/15 06:13:40
    • Location: NY, Upstate
    • Status: offline
    • Ribbons : 8
    Re: Lock Graphics Card terminal command (linux) for Nvidia GPUs (overclocking) 2020/04/11 04:46:50 (permalink)
    Thanks that is interesting.
    Do you OC your gpu’s in Linux? Not sure if I’ve ever ”lost” one.
     


     

     
    #2
    ProDigit
    iCX Member
    • Total Posts : 465
    • Reward points : 0
    • Joined: 2019/02/20 14:04:37
    • Status: offline
    • Ribbons : 4
    Re: Lock Graphics Card terminal command (linux) for Nvidia GPUs (overclocking) 2020/04/20 11:26:06 (permalink)
    OK, nevermind,
    It does seem to somewhat alleviate the 'lost gpu' situation, but it's still there.
    Only now I can boost to almost 130W, vs 127 without LGC.
     
    Lost GPU is a situation where an undervolt happens as a cause of PCIE x4 or x1 risers, without sufficient voltage.
    Can be because the 6pin VGA power cable, the SSD power connector, or the 4 pin HDD cable is shared with another GPU, or has an additional load on, that cause undervoltage spikes.
    The best solution would probably be to connect a secondary PCIE lead, or install a capacitor.
    It doesn't do damage, just lost crunching time.
    A reboot will restore the GPU.
    #3
    ProDigit
    iCX Member
    • Total Posts : 465
    • Reward points : 0
    • Joined: 2019/02/20 14:04:37
    • Status: offline
    • Ribbons : 4
    Re: Lock Graphics Card terminal command (linux) for Nvidia GPUs (overclocking) 2020/05/26 09:48:40 (permalink)
    Getting back to this, this definitely can reduce GPU compute instability.
    While newer RTX GPUs have a protection built in, older GPUs don't have this protection.
    if you max out the WATTs on a GPU, overclock, and do a ~100W compute load on them, they might hit a frequency they can't maintain.
    I've had a few times a GPU go offline, due to bad WUs, but when I saw it boosted to 2080-2085Mhz (or so), I lowered it to 2050Mhz, and haven't seen the error return.
    But it's only on one of the very first RTX GPUs, I've purchased. 
    #4
    Jump to:
  • Back to Mobile