I’ve been having a very odd hardware issue on my PC lately
- Windows 10 (ver 1904)
- 2014 vintage Intel i7 4790
- 16GB RAM
- "EVGA Nvidia GTX 980 FTW" graphics card
- Gigabyte "H81M-DS2V" motherboard
- Samsung 3440×1440 monitor @ 60 Hz via either HDMI or DisplayPort
I’ve been using this PC for casual gaming for over a year now, it plays Doom 2016 (for example) very stably and at high graphics settings. Recently I’ve been playing another game (Snowrunner) and every so often the monitor just goes completely black as if the video cable had been unplugged (cables are fine and secure). Replugging the HDMI cable does nothing (it does not even make the "device detected" noise that Windows usually makes) which leads me to believe that the GPU itself has gone into a failure state of some kind. The card simply goes idle as if it has fallen off the PCIe bus completely (I can’t prove that because frustratingly there’s no screen to look at).
However Windows does not crash and remains responsive (I can still play/pause media player which is usually running in the background using hotkeys on my keyboard). I can still use "[WIN+X], [U], [U]" to safely power off the PC. Oddly, power-cycling the PC does not automatically bring back the display unless I then consequently replug either the HDMI or DisplayPort cables.
I’ve gone through the process of removing the Nvidia drivers using the DDU tool and physically removing the graphics card, cleaning the PCIe contacts carefully with isopropyl alcohol and blowing any dust out of the cable sockets with an air duster). Then I reinstalled the Nvidia drivers to the latest version. The problem remains.
The Windows "View Reliability History" tool tends to report the following for a 187 code:
Description A problem with your hardware caused Windows to stop working correctly. Problem signature Problem Event Name: LiveKernelEvent Code: 187 Parameter 1: 1 Parameter 2: 0 Parameter 3: 0 Parameter 4: 0 OS version: 10_0_18363 Service Pack: 0_0 Product: 256_1 OS Version: 10.0.18318.104.22.168.256.48 Locale ID: 2057
The "Code" number changes each time, sometimes it’s 141 report…
Description A problem with your hardware caused Windows to stop working correctly. Problem signature Problem Event Name: LiveKernelEvent Code: 141 Parameter 1: ffff9b0f493c7010 Parameter 2: fffff8003f567650 Parameter 3: 0 Parameter 4: 51e8 OS version: 10_0_18363 Service Pack: 0_0 Product: 256_1 OS Version: 10.0.18322.214.171.124.256.48 Locale ID: 2057
The other problem is that this is intermittent. I can go all week without this error and then get 2 in the same day. It does not appear to happen more when the CPU/GPU is under heavy load (although Snowrunner is a very graphics intensive game) it can just as well happen when the system is cold and hardly spinning it’s fans. Then again Oblivion is not a graphically intense game and that causes it too.
Can anyone tell what might be causing this and what can be done to fix it?
EDIT: This problem also occurs when playing "TES4 Oblivion" with the same kind of error codes.