Wednesday, April 28, 2021 7:46 PM
(permalink)
Hey,
I bought a second-hand PC recently that unfortunately has a GPU issue. The buyer said that the PC works perfectly (shame on him), but I've found out quickly that the PC loses signal to the monitor a lot of times when it's under load, I can hear a device disconnect sound and the screen fades to black. The peripherals seem to still work, however the only thing that helps resolve the problem is to reset the PC manually. Sometimes I can see Warning and Error messages in the Event Viewer saying:
Warning: "Display driver nvlddmkm stopped responding and has successfully recovered."
Error: "The description for Event ID 13 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer."
However, this is not always the case, sometimes the PC detects no issues in the Event Viewer.
I've definitely narrowed it down to it being a GPU issue, because today I ran the video card in a different PC, with a different PSU (bigger than the current one), different RAM, storage, Motherboard, etc, and the issue still occured. For quite some time I wasn't able to reproduce the issue with >80% rate, but I've found out that the EVGA Precision X1 Tool offers a way to Test and Scan your VF Curve Tuner, so that you can see if your GPU works well with the current clock speeds or can it run higher too. Using the test function at 100% Target Power Draw the GPU crashes almost always.
I've found two things that seem to reduce or eliminate the number of issues:
-Blow cold air to the GPU with my hair dryer (but of course this is not a realistic solution).
-Run the GPU at about a 75% Power Target (but I guess this reduces the speed I can get out of it)
Judging from this I would say this is a thermal issue. Of course my tests are not conclusive and I've only had time so far for a few tests, but it's really interesting how 30 mins of use with the hair dryer didn't make it crash at 100% Power Target, while 10 mins of use without it makes it crash almost always. GPU-Z shows the video card running at about 75-80 Celsius under load with hot spots of 85-90 Celsius. Previously it went up to 83 Celsius and 93-94 Celsius hot spot, but I've replaced the thermal paste and the thermal pads on the card and I've flipped one of the fans around in my case, so there's a better air flow. I also pumped up the fan curves to make the fans of the CPU and GPU run faster. This doesn't really reduce the temps though, but with a Power Target of 75% I can run at about 73 Celsius.
Right now I don't know what else to do. It could be a design flaw that makes one of the components overheat, or maybe one of the sensors are bad, or maybe the air flow is bad, but it's hard to pinpoint it. Unfortunately since it's a second hand product and the support team said there's no warranty I can't RMA it. I hope my post helps a couple of people who seem to have the same issue atleast in narrowing the cause of the problem down, and I hope someone can maybe help me to find a way to deal with this issue. So if anyone knows the root of the problem or can help me with some valuable info/data regarding how the GTX 1080 FTW should theoretically run than I would be really thankful.