EVGA

Helpful ReplyHaving trouble diagnosing an issue

Author
bg8780
CLASSIFIED Member
  • Total Posts : 2540
  • Reward points : 0
  • Joined: 2008/02/19 14:21:34
  • Status: offline
  • Ribbons : 4
2023/03/28 06:14:15 (permalink)
Hey everyone,
 
I've been having an issue with my PC hardlocking, sound and stuff works for about 10 seconds then fully locks. All monitors go black and my GPU fans go to 100%. Looking through event viewer doesn't show me anything other than unexpected shutdown but maybe I don't know what to look for and could use some help here. I was on the latest BIOS but have since rolled back to 2.06 hoping this fixes the issue as that's been the most stable version. I have also done a clean OS install to rule out any of those issues. If this BIOS rollback doesn't resolve this, I'm not sure what else to troubleshoot. Overall, I'm trying to figure out why it's mostly display related since all monitors go black and GPU fans go to 100%, that's really the only hint I have to go off of to figure this out.
 
Specs
Z690 Classy
13900k
G skill 6400mhz
Gigabyte RTX 4090
Multiple NVMe and SATA SSDs. One HDD
 
Any help diagnosing this will be greatly appreciated, not sure where else to turn. I'm out of things to troubleshoot lol
 
PSA: If you're having the same issue as me and have a CableMod cable with the metal shroud "protector"; remove it. My issue was entirely caused by this metal shroud.
post edited by bg8780 - 2023/04/03 13:50:49
#1
B0baganoosh
CLASSIFIED Member
  • Total Posts : 2365
  • Reward points : 0
  • Joined: 2009/08/04 04:27:18
  • Status: offline
  • Ribbons : 39
Re: Having trouble diagnosing an issue 2023/03/28 06:25:17 (permalink)
What power supply do you have? This sounds like the old 3090 bug that was happening when 3090's were starting to fail, but there were a few cases where the GPU was hitting an over-current condition and doing a safety shutdown. Usually that didn't involve the GPU fans ramping up though...
 
Can you use HWiNFO64 to log temperatures to a log file? I'm curious if some temperature on your 4090 is getting too high, which would mean bad thermal paste/pad/contact or something. It would do this if a temperature got high enough to trigger a self-protection shut-down. If you can log temps, maybe one of the sensors will catch the high temperature before it crashes.
 
Also, if you go into Windows System event viewer, does it give you any critical events that have an indication of what is causing the crash?

6Q6CPFHPBPCU691 is a discount code anyone can use.
 
i9 13900k - EVGA Z690 Classy - Nvidia RTX 4090 FE - G.Skill 32GB DDR5-6000  - WD SN850 2TB NVMe Gen4 - Be Quiet! Straight Power 12 1200W - Be Quiet! Dark Base 900 Pro. MO-RA3 420 Pro. Dark Palimpsest MODS RIGS post for build notes.
#2
Cool GTX
EVGA Forum Moderator
  • Total Posts : 30996
  • Reward points : 0
  • Joined: 2010/12/12 14:22:25
  • Location: Folding for the Greater Good
  • Status: offline
  • Ribbons : 122
Re: Having trouble diagnosing an issue 2023/03/28 06:54:45 (permalink)
have you checked the GPU firmware for the Nvidia update?  For all those experiencing the 4090 no display issue here is the official fix from Nvidia
 
Is this a New issue, from a previously stable PC?
 
Or is this a new build that you just upgraded the MB BIOS? you mention rolling back BIOS
 
Which Nvidia drivers are you using / have tested?
 
Is the CPU, RAM, GPU overclocked?

Learn your way around the EVGA Forums, Rules & limits on new accounts Ultimate Self-Starter Thread For New Members

I am a Volunteer Moderator - not an EVGA employee

https://foldingathome.org -->become a citizen scientist and contribute your compute power to help fight global health threats

RTX Project EVGA X99 FTWK Nibbler EVGA X99 Classified EVGA 3080Ti FTW3 Ultra


#3
bg8780
CLASSIFIED Member
  • Total Posts : 2540
  • Reward points : 0
  • Joined: 2008/02/19 14:21:34
  • Status: offline
  • Ribbons : 4
Re: Having trouble diagnosing an issue 2023/03/28 07:14:53 (permalink)
B0baganoosh
What power supply do you have? This sounds like the old 3090 bug that was happening when 3090's were starting to fail, but there were a few cases where the GPU was hitting an over-current condition and doing a safety shutdown. Usually that didn't involve the GPU fans ramping up though...
 
Can you use HWiNFO64 to log temperatures to a log file? I'm curious if some temperature on your 4090 is getting too high, which would mean bad thermal paste/pad/contact or something. It would do this if a temperature got high enough to trigger a self-protection shut-down. If you can log temps, maybe one of the sensors will catch the high temperature before it crashes.
 
Also, if you go into Windows System event viewer, does it give you any critical events that have an indication of what is causing the crash?




Sorry, I should have included that info. My bad. Temps are fine. This usually happens when just in web browser, working, etc. Daily computing stuff. While gaming and stressing the system it makes it through no issue. It has hardlocked while gaming but I think it's unrelated since it happens outside of high load situations. GPU temp tops out around 72C on very demanding titles. Typically sits around the high 60's.
 
Power supply is EVGA 1600 P2. I don't suspect is an over-current situation since the PC stays on indefinitely. I have to do a hard shutdown via the power button.
 
I haven't been able to find anything in event viewer related to a bug report (BSOD). Granted, I'm proficient in event viewer but no expert so I could be missing something.
post edited by bg8780 - 2023/03/28 07:32:58
#4
bg8780
CLASSIFIED Member
  • Total Posts : 2540
  • Reward points : 0
  • Joined: 2008/02/19 14:21:34
  • Status: offline
  • Ribbons : 4
Re: Having trouble diagnosing an issue 2023/03/28 07:18:32 (permalink)
Cool GTX
have you checked the GPU firmware for the Nvidia update?  For all those experiencing the 4090 no display issue here is the official fix from Nvidia
 
Is this a New issue, from a previously stable PC?
 
Or is this a new build that you just upgraded the MB BIOS? you mention rolling back BIOS
 
Which Nvidia drivers are you using / have tested?
 
Is the CPU, RAM, GPU overclocked?




Yes, I did apply the firmware update as I suspected that could be the issue but the issue persisted.
 
I believe it to be a new issue after updating BIOS a few times (through EVGA's development). When I was on the initial BIOS to support 13th gen everything ran fine, hence my suspicion on BIOS issue. Also, this board had a 12400 i5 installed as a placeholder waiting for 13th gen launch and it was solid.
 
Past 3-4 drivers. I keep my drivers up to date quite frequently. Right now, I am on the latest Nvidia driver.
 
No overclocking outside of the power limit being set in MSI afterburner to 133, I even stayed away from XMP for the time being just to rule that out. I did run memtest86 with XMP set a couple weeks ago and everything passed.
 
#5
bg8780
CLASSIFIED Member
  • Total Posts : 2540
  • Reward points : 0
  • Joined: 2008/02/19 14:21:34
  • Status: offline
  • Ribbons : 4
Re: Having trouble diagnosing an issue 2023/03/28 10:53:54 (permalink)
Well, I was hoping BIOS 2.06 rollback would fix it but nope. Still hardlocked. I'll try to get a video next time it happens. I don't know what else to troubleshoot. Nothing in event viewer other than "unexpected shutdown"
#6
Mienko
Superclocked Member
  • Total Posts : 170
  • Reward points : 0
  • Joined: 2007/11/28 19:45:09
  • Status: offline
  • Ribbons : 3
Re: Having trouble diagnosing an issue 2023/03/28 13:01:43 (permalink)
bg8780
Well, I was hoping BIOS 2.06 rollback would fix it but nope. Still hardlocked. I'll try to get a video next time it happens. I don't know what else to troubleshoot. Nothing in event viewer other than "unexpected shutdown"


When you say nothing beyond unexpected shutdown, that's a bootup status indicating that it's recovering.  Have you gone back a little further (pre-reboot) to see what was popping up in the event log?  It may not be flagged as red/critical, but there may be something identifiable there.  Have you ran DSIM or SFC scans?
 

 
 
#7
bg8780
CLASSIFIED Member
  • Total Posts : 2540
  • Reward points : 0
  • Joined: 2008/02/19 14:21:34
  • Status: offline
  • Ribbons : 4
Re: Having trouble diagnosing an issue 2023/03/28 13:10:37 (permalink)
Mienko
bg8780
Well, I was hoping BIOS 2.06 rollback would fix it but nope. Still hardlocked. I'll try to get a video next time it happens. I don't know what else to troubleshoot. Nothing in event viewer other than "unexpected shutdown"


When you say nothing beyond unexpected shutdown, that's a bootup status indicating that it's recovering.  Have you gone back a little further (pre-reboot) to see what was popping up in the event log?  It may not be flagged as red/critical, but there may be something identifiable there.  Have you ran DSIM or SFC scans?
 



The only thing I'm seeing is "The device driver for the Trusted Platform Module (TPM) encountered a non-recoverable error in the TPM hardware, which prevents TPM services (such as data encryption) from being used. For further help, please contact the computer manufacturer."
 
I have not run DSIM or SFC because this is a completely fresh Windows 10 install. That was my first go to since that was a pretty old install that carried over from a few upgrades since 2018. It was time for a fresh OS regardless. The OS drive is a 2tb SK Hynix P41 Platinum. It hasn't shown signs of failure typical of a bad disk.
 
I have rolled the BIOS all the back to 2.01 which was the first to support 13th gen.
 
I guess I can try disabling the TPM as I have no need for it in this use case.
#8
Hoggle
EVGA Forum Moderator
  • Total Posts : 10102
  • Reward points : 0
  • Joined: 2003/10/13 22:10:45
  • Location: Eugene, OR
  • Status: offline
  • Ribbons : 4
Re: Having trouble diagnosing an issue 2023/03/28 17:35:46 (permalink)
Any chance of having another system to check to see if the crashes happen if the card is put into another system or a spare video card to see if the system crashes regardless of what video card is used? You can also try on board video but trying with another NVIDIA card would be better to see if it's the NVIDIA driver crashing.
post edited by Hoggle - 2023/03/28 17:36:53

Use an Associates Code & SAVE 5% - 10% on your purchase. Just click on the associates banner to save, or enter the associates code at checkout on your next purchase. If you choose to use my code I want to personally say "Thank You" for using it. 
 
 
#9
bg8780
CLASSIFIED Member
  • Total Posts : 2540
  • Reward points : 0
  • Joined: 2008/02/19 14:21:34
  • Status: offline
  • Ribbons : 4
Re: Having trouble diagnosing an issue 2023/03/28 19:06:01 (permalink)
Hoggle
Any chance of having another system to check to see if the crashes happen if the card is put into another system or a spare video card to see if the system crashes regardless of what video card is used? You can also try on board video but trying with another NVIDIA card would be better to see if it's the NVIDIA driver crashing.




Unfortunately, not. Might have to snag an extra GPU and see if the issue persist. My hunch tells me it's the GPU since it's fans crank to 100% and I love all display output.
#10
bg8780
CLASSIFIED Member
  • Total Posts : 2540
  • Reward points : 0
  • Joined: 2008/02/19 14:21:34
  • Status: offline
  • Ribbons : 4
Re: Having trouble diagnosing an issue 2023/03/28 19:36:28 (permalink)
https://youtu.be/tqGyOqzog5k
 
This is my exact issue. I have contacted CableMod for an RMA. I’ve found a few reddit and forum posts showing the same thing. I’m hoping this is the easy fix.
#11
Mienko
Superclocked Member
  • Total Posts : 170
  • Reward points : 0
  • Joined: 2007/11/28 19:45:09
  • Status: offline
  • Ribbons : 3
Re: Having trouble diagnosing an issue 2023/03/29 03:07:43 (permalink) ☄ Helpfulby rjohnson11 2023/03/29 03:35:42
bg8780
 
 
The only thing I'm seeing is "The device driver for the Trusted Platform Module (TPM) encountered a non-recoverable error in the TPM hardware, which prevents TPM services (such as data encryption) from being used. For further help, please contact the computer manufacturer."

Okay, I actually saw this TPM failure previously (maybe a couple months ago?) and I forgot how I resolved it.  Mine manifested slightly differently, but it still ended in an unscheduled reboot.  I manually updated my ME, ran the newest MEI and chipset drivers, etc.  Think I ended up disabling ME.  Give me a min.  Going to jump into my bios and take a look right quick and will edit this comment in just a few minutes.
 
Edit:  So, looking at my bios, I'm currently on 2.09 on my classify.  Intel ME & TPM are both enabled.  C-States disabled, all core overclock, yadda yadda.  For my Intel MEI driver, I'm running 2251.4.2.0.  For the firmware, 16.1.25.2124.  I'm honestly not sure if my errors were resolved by updating the Intel ME stuff, getting the CPU OC stable (I swapped from a 12900k to a 13900ks around the same time), or tuning my RAM down a bit (memtests came back as stable, but I saw crashes in MW2 if I pushed the RAM too hard), but those are the steps I took that led to resolution on my end.  Haven't seen a single TPM crash since.
 
post edited by Mienko - 2023/03/29 03:21:03

 
 
#12
bg8780
CLASSIFIED Member
  • Total Posts : 2540
  • Reward points : 0
  • Joined: 2008/02/19 14:21:34
  • Status: offline
  • Ribbons : 4
Re: Having trouble diagnosing an issue 2023/03/29 06:48:42 (permalink)
Mienko
bg8780
 
 
The only thing I'm seeing is "The device driver for the Trusted Platform Module (TPM) encountered a non-recoverable error in the TPM hardware, which prevents TPM services (such as data encryption) from being used. For further help, please contact the computer manufacturer."

Okay, I actually saw this TPM failure previously (maybe a couple months ago?) and I forgot how I resolved it.  Mine manifested slightly differently, but it still ended in an unscheduled reboot.  I manually updated my ME, ran the newest MEI and chipset drivers, etc.  Think I ended up disabling ME.  Give me a min.  Going to jump into my bios and take a look right quick and will edit this comment in just a few minutes.
 
Edit:  So, looking at my bios, I'm currently on 2.09 on my classify.  Intel ME & TPM are both enabled.  C-States disabled, all core overclock, yadda yadda.  For my Intel MEI driver, I'm running 2251.4.2.0.  For the firmware, 16.1.25.2124.  I'm honestly not sure if my errors were resolved by updating the Intel ME stuff, getting the CPU OC stable (I swapped from a 12900k to a 13900ks around the same time), or tuning my RAM down a bit (memtests came back as stable, but I saw crashes in MW2 if I pushed the RAM too hard), but those are the steps I took that led to resolution on my end.  Haven't seen a single TPM crash since.
 



Excellent! Thanks for this. I'm pretty confident my issue is with the CableMod cable as there's a couple youtube videos and forum posts matching my issue EXACTLY.
 
I will also disable ME and TPM anyway since I don't need it and it's throwing errors in Windows.
#13
Mienko
Superclocked Member
  • Total Posts : 170
  • Reward points : 0
  • Joined: 2007/11/28 19:45:09
  • Status: offline
  • Ribbons : 3
Re: Having trouble diagnosing an issue 2023/03/29 08:13:15 (permalink)
bg8780
 
Excellent! Thanks for this. I'm pretty confident my issue is with the CableMod cable as there's a couple youtube videos and forum posts matching my issue EXACTLY.
 
I will also disable ME and TPM anyway since I don't need it and it's throwing errors in Windows.




Interesting on the CableMod piece.  I am using CableMod, but no 12pin since I'm using a 3090.  Sorry I couldn't give you a more definitive "root" to the issue, but hope you get it resolved.
 

 
 
#14
bg8780
CLASSIFIED Member
  • Total Posts : 2540
  • Reward points : 0
  • Joined: 2008/02/19 14:21:34
  • Status: offline
  • Ribbons : 4
Re: Having trouble diagnosing an issue 2023/03/29 20:38:22 (permalink)
Mienko
bg8780
 
Excellent! Thanks for this. I'm pretty confident my issue is with the CableMod cable as there's a couple youtube videos and forum posts matching my issue EXACTLY.
 
I will also disable ME and TPM anyway since I don't need it and it's throwing errors in Windows.




Interesting on the CableMod piece.  I am using CableMod, but no 12pin since I'm using a 3090.  Sorry I couldn't give you a more definitive "root" to the issue, but hope you get it resolved.
 

 
It's been a few hours but so far, so good. Some crashes have taken a bit over 24 hours in the past so time will tell.
 
As a side note, does anybody know which drivers will resolve this? I've installed every driver from the download center but these are still missing and Windows can't find them (surprise, surprise)
https://imgur.com/HA9maxk
post edited by bg8780 - 2023/03/29 20:39:34
#15
Mienko
Superclocked Member
  • Total Posts : 170
  • Reward points : 0
  • Joined: 2007/11/28 19:45:09
  • Status: offline
  • Ribbons : 3
Re: Having trouble diagnosing an issue 2023/03/30 03:32:41 (permalink) ☄ Helpfulby bg8780 2023/04/01 19:58:23
Personally, I use this to get all my drivers:
https://rog-forum.asus.co...re-threads/td-p/827232

I grab all that are labeled for the 690 z690/790 boards (including the ME firmware).

 
 
#16
bg8780
CLASSIFIED Member
  • Total Posts : 2540
  • Reward points : 0
  • Joined: 2008/02/19 14:21:34
  • Status: offline
  • Ribbons : 4
Re: Having trouble diagnosing an issue 2023/03/30 07:18:17 (permalink)
Mienko
Personally, I use this to get all my drivers:
https://rog-forum.asus.co...re-threads/td-p/827232

I grab all that are labeled for the 690 z690/790 boards (including the ME firmware).



Ok, so those missing drivers are indeed all related to the ME platform?
#17
Mienko
Superclocked Member
  • Total Posts : 170
  • Reward points : 0
  • Joined: 2007/11/28 19:45:09
  • Status: offline
  • Ribbons : 3
Re: Having trouble diagnosing an issue 2023/03/30 08:37:03 (permalink) ☄ Helpfulby bg8780 2023/04/01 19:58:16
bg8780
Mienko
Personally, I use this to get all my drivers:
https://rog-forum.asus.co...re-threads/td-p/827232

I grab all that are labeled for the 690 z690/790 boards (including the ME firmware).



Ok, so those missing drivers are indeed all related to the ME platform?


I believe the unknown device may be the GNA.  You could look at the properties for each of the devices and get the hardware IDs for some sweet Google search action to narrow it down, but I'd probably start with the GNA & Serial I/O drivers first, and then intstall the latest chipset & MEI drivers if you still see a gap.  I don't have VMD enabled, so I don't have that one installed, but not sure what your setup looks like.
 

 
 
#18
bg8780
CLASSIFIED Member
  • Total Posts : 2540
  • Reward points : 0
  • Joined: 2008/02/19 14:21:34
  • Status: offline
  • Ribbons : 4
Re: Having trouble diagnosing an issue 2023/04/01 13:26:15 (permalink)
Well everyone, this is very strange but I think I solved the issue. It's been a few days with no crash. I removed the metal cable shroud included on the CableMod 12vPWR cable. I'll continue to monitor the situation and report back but so far I think that was the issue.
#19
Jump to:
  • Back to Mobile