2021/02/10 02:18:39
Feklar
yaggaz
Rewire92

A firmware update should be able to resolve this.




Wow thanks for figuring all this out.  If they could fix with a firmware update, would that be a case of limiting the card's performance to achieve safety?


It better not be considered a solution if it limits the cards perfomance. This is not why we buy FTW3 cards at a higher cost. I wouldn't accept that.
2021/02/10 03:17:26
Rewire92
Feklar
yaggaz
Rewire92

A firmware update should be able to resolve this.




Wow thanks for figuring all this out.  If they could fix with a firmware update, would that be a case of limiting the card's performance to achieve safety?


It better not be considered a solution if it limits the cards perfomance. This is not why we buy FTW3 cards at a higher cost. I wouldn't accept that.


It shouldn't limit the performance at all, as the voltage limits *should* only apply for the lower power clock states.  Remember, the card has issues at LOW power states, not high power states.

An update, no crashes experienced since setting a voltage limit of 1.062V.  Running an overclock of +120 Core/+1250 Memory. Runs Cyberpunk at 2070 Mhz/11000 Mhz on Ultra without Ray-Tracing at 1440p at 80-95 FPS without DLSS.

Capped my FPS in LoL and Halo and the like at 163 FPS.  Still running CSGO at like 400 FPS.  No issues.

I am confident this workaround fixes the FTW3 series issues with having to be RMAed constantly at this point.  Haven't had any more black screens at all, no issues in games, great FPS everywhere.
2021/02/10 13:32:35
neteng101
Its the transient response of the card during voltage shifts.  If you want to know more about transient response and how it affects electronic circuits I recommend watching some of Buildzoid's videos on Youtube.
 
The really oversimplified version is that changes in state can lead to overshoot and you can't see this in monitoring software, only on an oscilloscope.  The voltage spikes you can't see can be much higher than the safe limits, eg. the card tries to ramp up to 1.081V, but overshoot could say lead it to run at 1.3V for a brief moment.  It leads to crashes at least, and at worse will just burn out components.
 
The power delivery of the FTW3 cards can't respond fast enough to regulate the voltage changes safely - my guess is they didn't quite account for something correctly with the 3x8 power inputs or the programming for the VRMs isn't right.  Best case they can reprogram the firmware and a BIOS update can fix this, worse case the card might need a redesign of its power delivery/filtering.  We already know that Nvidia pushed Ampere to the limits so much that early cards were crashing, with some cheaper cards being more prone to crashing.
 
If you're wondering why someone would be interested enough to go listen to Buildzoid's technical ramblings, it taught me a lot about LLC settings and overclocking my CPU.  I can instantly crash my otherwise stable system by forcing an extreme state transition - eg. launching Intel XTU's benchmark test.  I was able to tweak LLC on my Z370 board enough to deal with a lot of instability but its VRMs were never designed for the power hungry i9-9900k I upgraded to.  What you're doing in LOL is going from low load state to max speed/voltage so its a big jump in the curve - you can mask the problem by limiting the overshoot say setting 1.05v max, so your overshoot is lowered enough that it doesn't crash your system, but it doesn't solve the bad transient response on the card itself.
2021/02/10 16:18:42
f0resight
My original 3090 failed, I'm not sure what the voltages were on it.  The replacement 2114 S/N model does not go above 1.062V for the GPU stock without any tweaking in light loads.  I've tested Halo and FFXIV, not LoL yet.  I've not had any issues the last couple of weeks I've had it.
2021/02/10 17:52:15
ClowReed
neteng101
Its the transient response of the card during voltage shifts.  If you want to know more about transient response and how it affects electronic circuits I recommend watching some of Buildzoid's videos on Youtube.
 
The really oversimplified version is that changes in state can lead to overshoot and you can't see this in monitoring software, only on an oscilloscope.  The voltage spikes you can't see can be much higher than the safe limits, eg. the card tries to ramp up to 1.081V, but overshoot could say lead it to run at 1.3V for a brief moment.  It leads to crashes at least, and at worse will just burn out components.
 
The power delivery of the FTW3 cards can't respond fast enough to regulate the voltage changes safely - my guess is they didn't quite account for something correctly with the 3x8 power inputs or the programming for the VRMs isn't right.  Best case they can reprogram the firmware and a BIOS update can fix this, worse case the card might need a redesign of its power delivery/filtering.  We already know that Nvidia pushed Ampere to the limits so much that early cards were crashing, with some cheaper cards being more prone to crashing.
 
If you're wondering why someone would be interested enough to go listen to Buildzoid's technical ramblings, it taught me a lot about LLC settings and overclocking my CPU.  I can instantly crash my otherwise stable system by forcing an extreme state transition - eg. launching Intel XTU's benchmark test.  I was able to tweak LLC on my Z370 board enough to deal with a lot of instability but its VRMs were never designed for the power hungry i9-9900k I upgraded to.  What you're doing in LOL is going from low load state to max speed/voltage so its a big jump in the curve - you can mask the problem by limiting the overshoot say setting 1.05v max, so your overshoot is lowered enough that it doesn't crash your system, but it doesn't solve the bad transient response on the card itself.


 Nice explanation! I tried to reach out to Buildzoid about this matter, but so far he didn't answer.
You think it's possible for us to limit the overshoot? Like make a custom bios or something? Or it's just EVGA that holds this kind of power?
2021/02/10 19:09:05
TechJessica87
Solid info, I'm sure it'll come together!
2021/02/10 19:12:05
theanalyzer
Does mining affect cards? Ill have to go back and check (got 1x3080 FTW3 mining in my spare time)
 
Have only played Dota 2 while mining, and a bit of SOTR.
2021/02/10 20:55:49
Rewire92
theanalyzer
Does mining affect cards? Ill have to go back and check (got 1x3080 FTW3 mining in my spare time)
 
Have only played Dota 2 while mining, and a bit of SOTR.


Funnily enough, playing a game while mining would be a proper mitigation, because the card would be locked at full voltage regardless of rendering load.
2021/02/10 21:23:22
jackychim
Can anyone tell me how to adjust the MSI curve on the back end? It keeps jumping in alignment with the lower voltages.
2021/02/10 21:32:25
jackychim
Rewire92
arestavo
1.1V is what these GPUs are rated for. Whether or not the GPU will be able to hit that due to GPU boost's algorithm is a whole different story.


Well they may be "rated" for 1.1V, but since I did this fix I found, I'm going on 6 hours with no crashing, and no voltage spikes past 1.068V.

It may not be the root of the problem, but it's certainly fixed it.

EDIT:  Also, the crashes were happening in low power states at low wattage and GPU usage.  You're running the the highest performance state with full GPU usage, which has no problems as demonstrated by my 150 hours on cyberpunk.




Would you kindly be able to show me how to adjust the back curve of the clock speeds like in OP's screenshot?  My MSI Afterburner keeps resetting higher to match the previous power curve pivot point.

Use My Existing Forum Account

Use My Social Media Account