I have a new CyberpowerPC with EVGA RTX 3080 card. Under low load, everything is fine. If I run Unigine Heaven 4.0 for around 10 minutes, the machine eventually crashes, even with EVGA fan curve set to "Aggressive". GPU temperature never exceeds 60 deg C before the crash, and CPU temperature also remains at moderate levels. Based on what I'm reading here and, in Tom's Hardware ("Nvidia RTX 3080 crashes caused by capacitors, says EVGA") and elsewhere, these graphics cards are defective in their design. I've included a crash dump analysis below my signature. I can get the machine to crash other ways, e.g. by trying to run EVGA VF Curve Tuner (without any overclocking). I have the latest NVidia driver for Windows 10, and latest BIOS for my motherboard (Asus Prime X570-P).
Will EVGA be fixing this, either by replacing the hardware or by a driver update? Dave Oshinsky
Crash dump analysis for Unigine Heaven 4.0 crash:
Microsoft (R) Windows Debugger Version 10.0.19041.685 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Windows\MEMORY.DMP]
Kernel Bitmap Dump File: Kernel address space is available, User address space may not be available.
Symbol search path is: srv*
Executable search path is:
Windows 10 Kernel Version 19041 MP (24 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 19041.1.amd64fre.vb_release.191206-1406
Machine Name:
Kernel base = 0xfffff806`69800000 PsLoadedModuleList = 0xfffff806`6a42a390
Debug session time: Thu Mar 4 15:50:46.905 2021 (UTC - 5:00)
System Uptime: 0 days 0:00:21.559
Loading Kernel Symbols
...............................................................
......Page 804e74 not present in the dump file. Type ".hh dbgerr004" for details
..........................................................
........................................................
Loading User Symbols
PEB is paged out (Peb.Ldr = 000000f0`25553018). Type ".hh dbgerr001" for details
Loading unloaded module list
.......
For analysis of this file, run !analyze -v
7: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: ffffc182cfe52028, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000bc800800, High order 32-bits of the MCi_STATUS value.
Arg4: 00000000060c0859, Low order 32-bits of the MCi_STATUS value.
Debugging Details:
------------------
KEY_VALUES_STRING: 1
Key : Analysis.CPU.Sec
Value: 1
Key : Analysis.DebugAnalysisProvider.CPP
Value: Create: 8007007e on OSHWIN2021
Key : Analysis.DebugData
Value: CreateObject
Key : Analysis.DebugModel
Value: CreateObject
Key : Analysis.Elapsed.Sec
Value: 1
Key : Analysis.Memory.CommitPeak.Mb
Value: 73
Key : Analysis.System
Value: CreateObject
BUGCHECK_CODE: 124
BUGCHECK_P1: 0
BUGCHECK_P2: ffffc182cfe52028
BUGCHECK_P3: bc800800
BUGCHECK_P4: 60c0859
BLACKBOXBSD: 1 (!blackboxbsd)
BLACKBOXNTFS: 1 (!blackboxntfs)
BLACKBOXWINLOGON: 1
PROCESS_NAME: YourPhone.exe
STACK_TEXT:
ffffaa81`63ac6938 fffff806`69cb39aa : 00000000`00000124 00000000`00000000 ffffc182`cfe52028 00000000`bc800800 : nt!KeBugCheckEx
ffffaa81`63ac6940 fffff806`655d15b0 : 00000000`00000000 ffffc182`cfe52028 ffffc182`cc9f7e40 ffffc182`cfe52028 : nt!HalBugCheckSystem+0xca
ffffaa81`63ac6980 fffff806`69db552e : 00000000`00000000 ffffaa81`63ac6a29 ffffc182`cfe52028 ffffc182`cc9f7e40 : PSHED!PshedBugCheckSystem+0x10
ffffaa81`63ac69b0 fffff806`69cb52d1 : ffffc182`d63d8c00 ffffc182`d63d8c00 ffffc182`cc9f7e90 ffffc182`cc9f7e40 : nt!WheaReportHwError+0x46e
ffffaa81`63ac6a90 fffff806`69cb5643 : 00000000`00000007 ffffc182`cc9f7e90 ffffc182`cc9f7e40 00000000`00000007 : nt!HalpMcaReportError+0xb1
ffffaa81`63ac6c00 fffff806`69cb5520 : ffffc182`cc904508 00000000`00000000 ffffaa81`63ac6e00 00000000`00000000 : nt!HalpMceHandlerCore+0xef
ffffaa81`63ac6c50 fffff806`69cb4a65 : ffffc182`cc904508 ffffaa81`63ac6ef0 00000000`00000000 00000000`00000000 : nt!HalpMceHandler+0xe0
ffffaa81`63ac6c90 fffff806`69cb7225 : ffffc182`cc904508 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalpHandleMachineCheck+0xe9
ffffaa81`63ac6cc0 fffff806`69d0c959 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalHandleMcheck+0x35
ffffaa81`63ac6cf0 fffff806`69c04bfa : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiHandleMcheck+0x9
ffffaa81`63ac6d20 fffff806`69c048b7 : 00000000`00000000 00000000`00000000 000000f0`25efe458 00000000`00000000 : nt!KxMcheckAbort+0x7a
ffffaa81`63ac6e60 00007ff8`40d5e239 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiMcheckAbort+0x277
000000f0`25efe310 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ff8`40d5e239
MODULE_NAME: AuthenticAMD
IMAGE_NAME: AuthenticAMD.sys
STACK_COMMAND: .thread ; .cxr ; kb
FAILURE_BUCKET_ID: 0x124_AuthenticAMD_PROCESSOR_BUS_L1_SRC_IRD_I_NOTIMEOUT
OS_VERSION: 10.0.19041.1
BUILDLAB_STR: vb_release
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
FAILURE_ID_HASH: {a44f54e5-033f-6603-b3e9-4b506acba551}
Followup: MachineOwner
---------
post edited by daveoshinsky - 2021/03/05 06:06:31