My 3080 is having implausible and inconsistent voltage readouts and hitting power limit at idle, all while having perfectly normal temperatures. I don't think it's actually reaching the power limit, rather incorrectly reading it's own input voltages. This card had this issue when I got it, but after Precision X1 firmware flashing and new thermal pads because I could see air above the inductors and MOSFETS it was fine for around 2 months. Now the issue is back and not going away, I'd love some help if possible. Obviously warranty is void and I got this used. I have a full electrical lab and test facility available, thermal camera, microscope, and like three electrical engineers happy to help but none of them have worked on GPUs and I'd like to respect their time if someone has an idea what is causing this issue.
While running correctly, this card is fantastic and autoclocks typically somewhere at or above 1965MHz. I do not overclock, nor have I altered any voltage settings from stock since receiving the card.
Under any load, however, after about 30 seconds it clocks down to 225MHz and typically stays there, sometimes clocking back up for a minute and running fine but then clocking down again. Whenever it clocks down, it consistently has implausible voltages in GPU-Z. Normally it reads out 12V-12.1V with normal performance, but when it clocks down it will either read between 4V and 10V on the power connectors or it will read 12V on power connectors and 3000V+ on "PWR_SRC Voltage". Whenever it downclocks, Precision X1 reads out at max power, GPU-Z reads out at power threshold and throttle, and temperatures go down from functioning fine at ~65C under load but not max power to 30C when it says full power threshold. This tracks consistently between Precision X1 and GPU-Z.
Things I know about this board:
-It's used, and the previous owner said this was an issue. I get it, buying this was probably dumb.
-I replaced all thermals and did a firmware update immediately when I got it, the card cooled and ran very well after that
-This card ran Starfield very well for 12 hours at a time on release. It also ran a bunch of other games as well as expected for months until last week
-There are no visible cracks on the board.
-There is no warp on the board
-This board has likely been quite hot before for extended periods. The GPU die was still well cooled, but on receipt there was a visible airgap above the MOSFETS and inductors for the VRMs. All of them. Pictures below if it lets me link them.
-The tape below the wire connector for the LED plate was melted. No components look damaged. Pics also included.
-The firmware updated successfully literally this morning, the issue persists
-The issue persists with different power connectors and in different PCI-e slots, I tested it in two of the four I have available and the issue persists
Things I've tested already:
-There is no short through VRMs that I can detect through the coils. Weren't when I got the card, aren't as of ten minutes ago.
-There are no blown fuses that I can find, at least none of the ones big enough to have the amperage rating on them
Things I can't really test today:
-I don't have a power supply to apply small voltage and look for heat today
-I don't have a thermal camera to detect heat
-I don't have a multimeter that can test capacitors today, I can test resistance or continuity though
Things I can test tomorrow
-I'll have a power supply that can apply discrete voltage
-I'll have a thermal camera to see heat
-I'll have a microscope
-I'll have solder setup and ESD equipment for replacing components
Pictures of physical card and GPU-Z readouts"
Thanks in advance for anything anyone can suggest testing today or tomorrow in the lab to save time. I'm looking to rule out or isolate issues today to save time tomorrow.