EVGA

GTX 980 multi-GPU problems

Author
Pet0r
New Member
  • Total Posts : 78
  • Reward points : 0
  • Joined: 2012/11/13 04:29:11
  • Status: offline
  • Ribbons : 0
2015/05/03 05:48:43 (permalink)
Hi,
 
I've also submitted a tech support request about this as I'm sort of out of stuff to try.  I'm running 2 x GTX 980 ACX 2.0 in SLI config, and have been for the last 4 months or so.  PSU is an EVGA 1000W P2.  Last night I started to get 0x00000116 BSOD and was able to boot to Windows with SLI enabled but as soon as I did anything even close to gaming, it would happen again.  My monitor is plugged into Card #1, disabling SLI resolved the issue (except obviously now I'm only using 1 card).  Plugging into the second card with SLI disabled resulted in a BSOD.  I'm thinking that this looks easy to troubleshoot as it's obviously Card #2 which is the problem.  So this morning I did some further digging to isolate the problem, but I can't get it to ever happen with just 1 card plugged in to the system.  Any time I enable SLI and try to game, I will always crash.
 
Here's what I've tried with both cards:
- Replacing SLI bridge
- Swapping which card is in which PCIe slot
- Swapping PCIe power cables between eachother
- Clean install of all drivers
 
None of the above worked, so then I moved on to finding which specific card was the problem.  I took out Card #2 and ran Furmark against Card #1.  No problems.  I also ran BarsWF (Google it), a fairly old program but which allows me to run heavy CUDA workload and also specify a GPU on which to run it, and no issues were found.  Then I took out Card #1 and put Card #2 in the same slot with the same power cables, and re-ran both tests, no problems at all.  So at this point I figured that maybe it just needed reseating or something.  So I put Card #2 back in its own slot with its own power cables, all tests passed again.  Then I put Card #1 back in and re-connected the SLI bridge.  Instant BSOD on opening a game.  So I removed the SLI bridge and booted without it, so now both GPUs are connected to the system but no SLI available.  However BarsWF can still use both cards as it individually gives them a workload.  So I passed it a --gpu_mask 1 parameter (stress Card #1), worked without issues.  Then I tried --gpu_mask 2 and got an instant BSOD.  I then switched Card #1 and Card #2 around, this time --gpu_mask 1 gives the BSOD, which again points to Card #2 as being the problem.
 
The issue is, I can never seem to reproduce the issue when just 1 card is in the system!  I haven't done any recent BIOS updates or anything of that nature.  The nvidia drivers I'm using have been on my system since their release now (350.12).  I've also tried multiple different slots on the PSU as well to plug the cards into.  But I don't think it's a power problem as it occurs immediately when the BarsWF test is running against 1 GPU but when 2 are installed in the system (but the 2nd card isn't drawing any load).  I also feel like it's some driver issue but why would it just occur now?
 
Any help is much appreciated.
post edited by Pet0r - 2015/05/03 05:55:22
#1

4 Replies Related Threads

    Pet0r
    New Member
    • Total Posts : 78
    • Reward points : 0
    • Joined: 2012/11/13 04:29:11
    • Status: offline
    • Ribbons : 0
    Re: GTX 980 multi-GPU problems 2015/05/03 06:33:10 (permalink)
    Well, turns out the problem was actually unbelievably simple.  I replaced one of the PSU power cables (it's fully modular and was an aftermarket set of cables) and the problem is gone.  I had been switching those to rule them out so I presume I just got really unlucky when I tested it and it happened to work.  Problem is resolved now.
    #2
    bcavnaugh
    The Crunchinator
    • Total Posts : 38977
    • Reward points : 0
    • Joined: 2012/09/18 17:31:18
    • Location: USA Affiliate E5L3CTGE12 Associate 9E88QK5L7811G3H
    • Status: offline
    • Ribbons : 282
    Re: GTX 980 multi-GPU problems 2015/05/03 06:53:17 (permalink)
    Pet0r
    Well, turns out the problem was actually unbelievably simple.  I replaced one of the PSU power cables (it's fully modular and was an aftermarket set of cables) and the problem is gone.  I had been switching those to rule them out so I presume I just got really unlucky when I tested it and it happened to work.  Problem is resolved now.


    Be Glad you did not Short out your Computer and even Catch Fire.
    Never Use Aftermarket Cables that are not made for the PSU you own and never mix cables between different Brands and even the Same.
    Cable Extensions are ok for the most part.
    post edited by bcavnaugh - 2015/05/03 06:57:06

    Associate Code: 9E88QK5L7811G3H


     
    #3
    Pet0r
    New Member
    • Total Posts : 78
    • Reward points : 0
    • Joined: 2012/11/13 04:29:11
    • Status: offline
    • Ribbons : 0
    Re: GTX 980 multi-GPU problems 2015/05/03 07:28:12 (permalink)
    bcavnaugh
    Pet0r
    Well, turns out the problem was actually unbelievably simple.  I replaced one of the PSU power cables (it's fully modular and was an aftermarket set of cables) and the problem is gone.  I had been switching those to rule them out so I presume I just got really unlucky when I tested it and it happened to work.  Problem is resolved now.


    Be Glad you did not Short out your Computer and even Catch Fire.
    Never Use Aftermarket Cables that are not made for the PSU you own and never mix cables between different Brands and even the Same.
    Cable Extensions are ok for the most part.




    I was never particularly worried about shorting/catching fire - any decent PSU these days is going to have short circuit trip.  And I didn't mix and match, these were official EVGA individually braided replacement PSU cables from their store.
    #4
    bcavnaugh
    The Crunchinator
    • Total Posts : 38977
    • Reward points : 0
    • Joined: 2012/09/18 17:31:18
    • Location: USA Affiliate E5L3CTGE12 Associate 9E88QK5L7811G3H
    • Status: offline
    • Ribbons : 282
    Re: GTX 980 multi-GPU problems 2015/05/03 07:40:24 (permalink)
    Pet0r
    bcavnaugh
    Pet0r
    Well, turns out the problem was actually unbelievably simple.  I replaced one of the PSU power cables (it's fully modular and was an aftermarket set of cables) and the problem is gone.  I had been switching those to rule them out so I presume I just got really unlucky when I tested it and it happened to work.  Problem is resolved now.


    Be Glad you did not Short out your Computer and even Catch Fire.
    Never Use Aftermarket Cables that are not made for the PSU you own and never mix cables between different Brands and even the Same.
    Cable Extensions are ok for the most part.




    I was never particularly worried about shorting/catching fire - any decent PSU these days is going to have short circuit trip.  And I didn't mix and match, these were official EVGA individually braided replacement PSU cables from their store.


    OK, I would not have called EVGA Cables Aftermarket had I known.
    It is hard to guess sometimes what users are using.
     
    EVGA SuperNOVA 1000 P2 Power Supply + 100-CK-1300-B9 or 100-CR-1300-B9 or 100-CU-1300-B9 or 100-CW-1300-B9 you should have been fine.
     
    BTW with the BSOD Code of 116.....I would increase the vCore Voltage.
    Or Reduce the CPU Overclock and Memory.
    post edited by bcavnaugh - 2015/05/03 07:46:08

    Associate Code: 9E88QK5L7811G3H


     
    #5
    Jump to:
  • Back to Mobile