EVGA

New owner... with some QPI releated issues

Author
mathf
New Member
  • Total Posts : 17
  • Reward points : 0
  • Joined: 2015/02/23 16:12:14
  • Status: offline
  • Ribbons : 0
2015/04/30 09:11:54 (permalink)
Hi all,
I recently bought an used SR-2 together with 2 Xeons 5690.
I received the board yesterday, and I immediately started playing with it.
 
Here is my complete config:
- SR-2 A2
- 2x Xeon x5690
- 48GB (6*8GB) Hynix DDR3-1333 ECC (HMT31GR7BFR4C-H9)
- a poor GeForce 9800GT just for the installation.
 
First thing first, I wanted to try the board at stock frequencies, without any overclocking.
Therefore, I cleared the CMOS, load optimized defaults from the BIOS, save and exit...
...however, I was not even able to boot completely into Windows installer.
 
I have tried isolating each CPU:
- CPU0: ON - CPU1: OFF -> everything works, windows installed, and booted without any issue
- CPU0: OFF - CPU1: ON -> not able to start anything. Board posts, but windows freezes after the logo animation (logo animation is very slow btw)
... I don't believe this is normal, and I also don't believe this is coming from the CPU itself, as when I switch them, the same situation occurs.
 
Today, I experimented a little more, and found out that the only way I could boot "safely" with both CPU installed, is to manually reduce the QPI to 5.8. Once again, I don't find this normal, and I would expect that the Xeons works at stock frequency without any problem.
 
The question that I am asking myself now: 
Is the SR-2 damaged... or is this coming from the CPUs ?
#1

17 Replies Related Threads

    gordan79
    SSC Member
    • Total Posts : 531
    • Reward points : 0
    • Joined: 2013/01/27 00:17:36
    • Status: offline
    • Ribbons : 3
    Re: New owner... with some QPI releated issues 2015/05/02 12:38:44 (permalink)
    Impossible to tell without trying different CPUs or trying the CPUs in a different dual socket motherboard. I can tell you beyond any doubt that is not normal, though, so you might want to start looking into ways to get refund before it's too late (if it's not already too late).

    Supermicro X8DTH-6, 2x X5690
    Crucial 12x 8GB x4 DR 1.35V DDR3-1600 ECC RDIMMs (96GB)
    3x GTX 1080Ti
    Triple-Seat Virtualized With VGA Passthrough (KVM)
    #2
    mathf
    New Member
    • Total Posts : 17
    • Reward points : 0
    • Joined: 2015/02/23 16:12:14
    • Status: offline
    • Ribbons : 0
    Re: New owner... with some QPI releated issues 2015/05/02 13:06:52 (permalink)
    Hi Gordan,
    Refund is unfortunatly not an issue.
     
    I will get a pair of x5570 Xeons next week, therefore I will be able to check with different CPUs.
    This will eliminate a bad QPI link in one of the CPU.
     
    I will also test with different RAM sticks to check what happens.
    As said, the board is "somehow" stable at QPI 4.8 or 5.8: I was able to run a linpack64 run, with all cores and 95% of RAM used, for more than 2 hours.
    #3
    gordan79
    SSC Member
    • Total Posts : 531
    • Reward points : 0
    • Joined: 2013/01/27 00:17:36
    • Status: offline
    • Ribbons : 3
    Re: New owner... with some QPI releated issues 2015/05/02 23:29:42 (permalink)
    That is very odd indeed. If it works at 5.8 that implies a stability issue, rather than an outright fault. Things I would start with are:
    1) Make sure it's not a RAM compatibility issue. If you have some 4GB ECC RDIMMs, try with those. 8GB DIMMs are unsupported on the SR-2 and although many of us have various 8GB modules working it is not always a sure thing even when only running 6 of them. Also, what is your RAM voltage set to? The spec sheet for your DIMMs says 1.5V (and the spec sheet lists it as 1.425 - 1.575V, so you may want to bump the voltage up a bit there and see if it helps).
    2) Check your IOH voltage. Both of my SR-2s are stable at 166 bclk and 96GB of RAM at 1.25V, but anything up to 1.35V should be safe, so it might be worth trying to bump that up, unless it is already at 1.35V.
     
    Did you try swapping the RAM between the two CPUs, just to see if the fault follows the RAM?
     
    If none of that makes any difference, there is a good chance you have a duff component on the motherboard somewhere around CPU1. Capacitors these days are quite good (most manufacturers learned the lesson to not use cheap capacitors), but inductors still often fail, and on the SR-2 there are 8 for each CPU socket on the SR-2, but the end two in each set are not covered by the heatsinks (the middle 6 are, the end 2 are not) for some unfathomable reason.

    Supermicro X8DTH-6, 2x X5690
    Crucial 12x 8GB x4 DR 1.35V DDR3-1600 ECC RDIMMs (96GB)
    3x GTX 1080Ti
    Triple-Seat Virtualized With VGA Passthrough (KVM)
    #4
    mathf
    New Member
    • Total Posts : 17
    • Reward points : 0
    • Joined: 2015/02/23 16:12:14
    • Status: offline
    • Ribbons : 0
    Re: New owner... with some QPI releated issues 2015/05/03 01:23:35 (permalink)
    1) 8GB DIMMs are unsupported on the SR-2 ? Are U kiddin ? 
    If the RAM is the root cause of the problem, then the board design is just a joke.
     
    I have left the RAM stuff to default (as it is correctly recognized at startup). HWInfo64 tells me it is DDR-3 1333 9-9-9-24, and the voltage is correctly set to 1.5V
    I have also tried to lower down the RAM freq to DDR3-1066, but same problem (also this time, the timings are 7-7-7-something).
    What I will try is to force lower timings, see if it helps, and also with some regular DDR3-1600 from Samsung (this is the only spare I can get ATM).
     
    2) I was able to get the system a little more stable by increasing VTT to 1.35. 
    However, at this time, linpack was still freezing the system after 1-2 min.
    #5
    gordan79
    SSC Member
    • Total Posts : 531
    • Reward points : 0
    • Joined: 2013/01/27 00:17:36
    • Status: offline
    • Ribbons : 3
    Re: New owner... with some QPI releated issues 2015/05/03 04:59:12 (permalink)
    1) Specifically try 2T command rate.
    And yes, the board is full of design faults. The CPUs (where the MCH is) can handle 192GB of RAM each, yet the board struggles badly with 96GB of RAM between the two CPUs. I think that illustrates the extent of the fail quite well, and that's far from the only problem on boards that are working exactly as designed.
     
    Having said all that, I have not observed the specific issue you seem to be having on my SR-2s. What I have observed is that QPI destabilizes very, very shortly after 6.4GT, which means that OC-ing potential hits a wall once you start to exceed 177 bclk (because 4.8GT/s at 177 bclk actually equals 6.4GT/s since default bclk is 133). And at above 180 bclk or so very weird clock spikes, both high and low start to occur, all of which snowballs with crashtastic results very quickly.
     
    2) That kind of makes sense since CPUs "uncore" runs off VTT voltage so if there is a marginal/aged component I can see how it might help. Just out of interest, have you checked the state of thermal pads and paste under the heatsinks? If that was the problem I'd expect over-volting to make things worse, but it's worth checking if you run out of options.
     
    If all else fails (or when you decide your time is worth something and you've wasted to much of it on the SR-2), can can always get a similar SuperMicro X8 series motherboard and reuse your CPUs and RAM. The X8 series boards are going for a pittance on ebay most of the time. The X8DTH line is the most similar to the SR-2 (7x PCIe x16 slots, although IIRC each is wired only for x8, not that it makes a damn worth of difference to anything). X8DAH line has only 3 PCIe x16 slots (of which two are wired for full x16) and a few x8 slots, and a tonne of RAM sockets, so it's worth a look as well (particularly depending on what's gong cheap on ebay on the day you're looking). They might not have OC-ing features, but with a pair of X5690s you don't exactly need those desperately like you might with lower end CPUs.
     
    And note that I am saying that as one of a handful of people that have finally defeated the SR-2 and made it work for most of my requirements. The amount of time and effort it has taken to figure out the workarounds (getting 96GB of RAM to work), write the patches (e.g. for Xen HVM loader to avoid the IOMMU/VT-d bugs on NF200 bridges) and figuring out avoiding all the unfixable issues (every SAS controller I tried on my boards at best causes random lock-ups as soon as I start using VT-d with PCI passthrough for VMs, and some end up just not showing any disks attached) to make it all work passably well was so great that if I'd spent the same time working I could have just bought a top of the line off the shelf workstation filled with Nvidia Quadros and still ended up ahead. Remember that the total advantage from the SR-2 in terms of OC-ing the X5690s is going to be about 15% over stock (4GHz vs. 3.46GHz). Even with lesser CPUs, and if you manage to get it stable at 180 bclk (where clock stability falls off a cliff), you are only looking at a maximum 35% OC and that's as good as it gets with any pretence of long term stability. As one of my friends once wisely said, it takes great wisdom to recognize that it would be a good idea to quite while you're behind.
    post edited by gordan79 - 2015/05/03 05:18:58

    Supermicro X8DTH-6, 2x X5690
    Crucial 12x 8GB x4 DR 1.35V DDR3-1600 ECC RDIMMs (96GB)
    3x GTX 1080Ti
    Triple-Seat Virtualized With VGA Passthrough (KVM)
    #6
    mathf
    New Member
    • Total Posts : 17
    • Reward points : 0
    • Joined: 2015/02/23 16:12:14
    • Status: offline
    • Ribbons : 0
    Re: New owner... with some QPI releated issues 2015/05/04 00:56:05 (permalink)
    Ok, I made several tests this morning with different kind of RAM (DDR3 ECC and normal, 2 and 4 GB sticks).
    -> No luck, same problem.
     
    The funny part is that
    1. it _seems_ to work well at 4.8 (and 5.8) QPI, so the previous owner is not acknoledging that there is an issue, as he always ran it at 4.8 for overclocking
    2. even at 4.8, the board is not stable, starting from CPU1.
    #7
    gordan79
    SSC Member
    • Total Posts : 531
    • Reward points : 0
    • Joined: 2013/01/27 00:17:36
    • Status: offline
    • Ribbons : 3
    Re: New owner... with some QPI releated issues 2015/05/04 03:34:20 (permalink)
    It sounds increasingly like a faulty motherboard that just happened to be running passably stably with the exact configuration and OS the previous owner was running.
     
    When OC-ing:
    QPI to 5.8GT/s will actually be 6.4GT/s at 146MHz bclk.
    QPI to 4.8GT/s will actually be 6.4GT/s at 177Mhz bclk (this is the main (but not the only) reason why you generally hit a stability wall at about 180MHz bclk on these).
     
    For the sake of being thorough, you could set QPI to 5.8GT/s and crank the bclk to 146, adjust the CPU multiplier down to be running default or lower maximum CPU speed, and see if that destabilizes.
     
    Just out of interest, what FSB strap do you have set? Auto? Might be worth explicitly setting 1333 or 1600 just to eliminate the possibility of some internal timings being out when set to auto.

    Supermicro X8DTH-6, 2x X5690
    Crucial 12x 8GB x4 DR 1.35V DDR3-1600 ECC RDIMMs (96GB)
    3x GTX 1080Ti
    Triple-Seat Virtualized With VGA Passthrough (KVM)
    #8
    mathf
    New Member
    • Total Posts : 17
    • Reward points : 0
    • Joined: 2015/02/23 16:12:14
    • Status: offline
    • Ribbons : 0
    Re: New owner... with some QPI releated issues 2015/05/04 04:21:40 (permalink)
    Sounds to me like a faulty motherboard too.
    I will still make some additionnal tests with the x5570, once I get them, just to make sure that this issue does not come from one of the Xeons that has a broken QPI link due to overvoltage.
     
    For the RAM, I tested this morning with 4 different sticks (normal and ECC). No luck.
    Also, the timings/frequencies are correctly detected by Windows.
    #9
    gordan79
    SSC Member
    • Total Posts : 531
    • Reward points : 0
    • Joined: 2013/01/27 00:17:36
    • Status: offline
    • Ribbons : 3
    Re: New owner... with some QPI releated issues 2015/05/04 06:31:55 (permalink)
    I was talking about internal MCH timings which most tools don't report.
    An over-aged component due to over-volting is plausible. The Westmere Xeons are specced for a maximum of 1.35V core and 1.30V VTT. Any more and you are pretty much guaranteed to be damaging something and I seem to remember reading more than one post implying that equivalent i7s were expiring even without exceeding the spec, although whether that is due to motherboards overvolting and/or sensor under-reading is difficult to say. I'm running at well below those limits at 4GHz with 166 bclk.
     
    Every once in a while I spend a few seconds thinking about trying to bump up the bclk to 177 but I'd end up having to reduce the core and uncore multipliers, re-test the memory timings, and bump the core, uncore and IOH voltages right up to the limit, which seems like a huge waste of time for an extra 2.5% core speed and maybe 6% extra memory I/O at most, and that's assuming I don't have to loosen the memory timings to achieve it. OC-ing is worthwhile with an X5650 and below, but with an X5690 there is no worthwhile benefit. The only reason I got a pair (my other system uses X5650s) was to explore the limits of the Westmere cores and help establishing the bclk limits of the SR-2, In the end I stuck with them because they seem to require slightly less voltage than my X5650s and thus run slightly cooler, but that could easily just be due to them having been manufactured later on than my original X5650s. Which all leads by to my earlier point - if you have X5690s, get a decent non-OC-ing SuperMicro board, as they'll boost to 3.60GHz on all cores and 3.73GHz on 2 cores without any OC-ing. That is precisely what I'll be looking into if it turns out that Nvidia Maxwells don't play well with VT-d on my configuration, as similar issues have already been reported.

    Supermicro X8DTH-6, 2x X5690
    Crucial 12x 8GB x4 DR 1.35V DDR3-1600 ECC RDIMMs (96GB)
    3x GTX 1080Ti
    Triple-Seat Virtualized With VGA Passthrough (KVM)
    #10
    mathf
    New Member
    • Total Posts : 17
    • Reward points : 0
    • Joined: 2015/02/23 16:12:14
    • Status: offline
    • Ribbons : 0
    Re: New owner... with some QPI releated issues 2015/05/04 10:09:35 (permalink)
    Just to be crystal clear: Here I am not even thinking about overclocking. 
    I always have the stock frequencies, and I am forced to manually reduce the QPI link to get some stability!!!!
     
    The previous owner is not really acknowledging that his system (board or CPUs, don't care which one is faulty at this point), and arguing that the board is picky with BIOS settings and that he never had an issue... Of course, as he was overclocking to 4.0 GHz, with QPI freq set to 4.8.
     
    ...no need to mentioned that I am really pissed.
    #11
    gordan79
    SSC Member
    • Total Posts : 531
    • Reward points : 0
    • Joined: 2013/01/27 00:17:36
    • Status: offline
    • Ribbons : 3
    Re: New owner... with some QPI releated issues 2015/05/04 11:04:00 (permalink)
    X5690 has all core boost frequency of x27, so you can get to 4GHz at 148 bclk, which means if you set QPI to 5.8 it will yield 6.46GT/s which should certainly be within the OC tolerances. If he couldn't get it stable above 4.8 QPI something's not quite right.

    Supermicro X8DTH-6, 2x X5690
    Crucial 12x 8GB x4 DR 1.35V DDR3-1600 ECC RDIMMs (96GB)
    3x GTX 1080Ti
    Triple-Seat Virtualized With VGA Passthrough (KVM)
    #12
    mathf
    New Member
    • Total Posts : 17
    • Reward points : 0
    • Joined: 2015/02/23 16:12:14
    • Status: offline
    • Ribbons : 0
    Re: New owner... with some QPI releated issues 2015/05/06 06:31:51 (permalink)
    Some news from my side: x5570 seems works like a charm at QPI 6.4.
    The SR-2 is also booting fine only from CPU1.
     
    I am now thinking that it was not the board that was faulty, but at least one of the Xeons x5690
    #13
    gordan79
    SSC Member
    • Total Posts : 531
    • Reward points : 0
    • Joined: 2013/01/27 00:17:36
    • Status: offline
    • Ribbons : 3
    Re: New owner... with some QPI releated issues 2015/05/06 06:38:18 (permalink)
    Very interesting indeed. But I thought you said you tried both CPUs in socket 1, and they both exhibited the problem in socket 1, but both worked OK in socket 0. Was that a misinterpretation of what you had said on my part?
     
    In case it wasn't a misinterpretation and you can definitely confirm that both X5690s have problems in socket 1, then the thing to consider is that X5570 is a 4-core and X5690 is a 6-core, which means some pins on the socket are not used when X5570 is installed. Which in turn implies that the fault could be dodgy contact or RF noise or a short somewhere relating to one of those pins/traces that don't do anything on an X5570.
     
    So if you haven't tested both of your 6-cores independently in socket 1 with only one CPU installed, now would probably be a good time to try that for cross-check.

    Supermicro X8DTH-6, 2x X5690
    Crucial 12x 8GB x4 DR 1.35V DDR3-1600 ECC RDIMMs (96GB)
    3x GTX 1080Ti
    Triple-Seat Virtualized With VGA Passthrough (KVM)
    #14
    mathf
    New Member
    • Total Posts : 17
    • Reward points : 0
    • Joined: 2015/02/23 16:12:14
    • Status: offline
    • Ribbons : 0
    Re: New owner... with some QPI releated issues 2015/05/06 08:00:44 (permalink)
    Yes. I tried both x5690 in CPU0 and CPU1. Both are working properly in CPU0 socket (OCCT Linpack worked for 2h without a hickup).
    I don't know exactly what is going on when CPU0 is disabled and CPU1 is the only one in place, but I guess that some QPI connections might still be requiered... So it can also be the fact that the QPI links in (both?) the x5690 are damaged due to overvoltage for example.
     
    Correct me if I am wrong, but when CPU0 is the only one in use, no QPI link is used, right ?
    #15
    gordan79
    SSC Member
    • Total Posts : 531
    • Reward points : 0
    • Joined: 2013/01/27 00:17:36
    • Status: offline
    • Ribbons : 3
    Re: New owner... with some QPI releated issues 2015/05/06 08:10:20 (permalink)
    That sounds like a marginal pin or related trace on socket 1 to me, on one of the pins not used by the 4-core but used by the 6-core CPUs.
    If you really that determined to rule out motherboard issues, you could look into getting an X5650. They are going for next to nothing on ebay these days, and it is a 6-core 6.4GT QPI chip. If that doesn't work properly in socket 1 either but works in socket 0, then that would remove any trace of remaining doubt, if there still is any, that the fault is with the motherboard rather than the CPUs.
    The QPI connections are from each CPU to the NB (PCIe root hub) and from each CPU to the other CPU.
    That is why W series Xeons are single socket (only one QPI link) and X series are dual socket (two QPIs).
    post edited by gordan79 - 2015/05/06 08:13:25

    Supermicro X8DTH-6, 2x X5690
    Crucial 12x 8GB x4 DR 1.35V DDR3-1600 ECC RDIMMs (96GB)
    3x GTX 1080Ti
    Triple-Seat Virtualized With VGA Passthrough (KVM)
    #16
    mathf
    New Member
    • Total Posts : 17
    • Reward points : 0
    • Joined: 2015/02/23 16:12:14
    • Status: offline
    • Ribbons : 0
    Re: New owner... with some QPI releated issues 2015/06/11 03:11:42 (permalink)
    Ok! Long story short... 
    I got in touch with EVGA Europe techsupport, and they RMA'ed my board.
    I now got a E1 refurbished SR-2.
     
    With this board, I am facing a very weird issue.
    I am getting an immediate "FF" code when both CPU are physically installed with 2 "80 Plus Platinum" PSUs: 
     - Corsair AX1200i
     - Seasonic SS-1200XP3
     
    The symptoms are the same if I disable the CPU1 socket via jumpers.
    What it also worth mentioning, is that, when this occurs, the southbridge fan completly stops spinnig (CPU fans are still spinning), and it resumes spinning for 3-4 seconds once the power is shut down.
     
    Both PSU are working fine with my other computer (old Gigabyte X58 motherboard), and I can't believe that I soooo unlucky to buy 2 brand-new-high-end-top-class PSU that are both defective. Also, I was using the AX1200i with my "old" SR-2 without having experienced this issue... The only difference is that it was connected directly to the wall socket, and now it is connected to a multisocket panel at home.
     
    -> Funny part: if I connect a good-old Corsais HX1000W, everything works properly. The only minor point to mention is that I have to mannualy disable C6 state when booting from CPU1 socket only. Otherwise, Windows 7 freezes right after the boot logo animation.
     
     
    I sent a ticket to EVGA techsupport once again, and they are starting to claim that both Platinum PSUs might not be compatible with the SR-2... I kinda find this quite odd.
    #17
    gordan79
    SSC Member
    • Total Posts : 531
    • Reward points : 0
    • Joined: 2013/01/27 00:17:36
    • Status: offline
    • Ribbons : 3
    Re: New owner... with some QPI releated issues 2015/06/11 03:24:24 (permalink)
    mathf
    Ok! Long story short... 
    I got in touch with EVGA Europe techsupport, and they RMA'ed my board.
    I now got a E1 refurbished SR-2.

     
    Lucky you. They told me they were out of stock with no expected time they'll have more. I think given my RMA history they figured I'd find any faults on it within a day if they exist, and they couldn't afford the couriering costs.
     
    mathf
    With this board, I am facing a very weird issue.
    I am getting an immediate "FF" code when both CPU are physically installed with 2 "80 Plus Platinum" PSUs: 
     - Corsair AX1200i
     - Seasonic SS-1200XP3
     
    The symptoms are the same if I disable the CPU1 socket via jumpers.
    What it also worth mentioning, is that, when this occurs, the southbridge fan completly stops spinnig (CPU fans are still spinning), and it resumes spinning for 3-4 seconds once the power is shut down.
     
    -> Funny part: if I connect a good-old Corsais HX1000W, everything works properly. The only minor point to mention is that I have to mannualy disable C6 state when booting from CPU1 socket only. Otherwise, Windows 7 freezes right after the boot logo animation.

     
    Hmm... Dodgy. Sounds like marginal power circuitry on socket #2
     
    mathf
    I sent a ticket to EVGA techsupport once again, and they are starting to claim that both Platinum PSUs might not be compatible with the SR-2... I kinda find this quite odd.

     
    The thing you need to remember is that there is no such thing as a new replacement SR-2 any more, and hasn't been for at least a couple of years. All replacement are refurbs, which means they were faulty and were supposedly repaired. Unfortunately, I have never seen a fully properly working refurb motherboard - once it duffs out, it'll never work properly again. If they offer you to replace it with a different new motherboard, I suggest you take them up on it, sell your Xeons on ebay and buy a CPU that fits the new board. In the long term, if your time is worth anything, it will be by far the most sane thing to do.

    Supermicro X8DTH-6, 2x X5690
    Crucial 12x 8GB x4 DR 1.35V DDR3-1600 ECC RDIMMs (96GB)
    3x GTX 1080Ti
    Triple-Seat Virtualized With VGA Passthrough (KVM)
    #18
    Jump to:
  • Back to Mobile