EVGA

Helpful ReplyLow Memory Benchmark Woes

Page: < 1234 > Showing page 3 of 4
Author
kazu4009
New Member
  • Total Posts : 35
  • Reward points : 0
  • Joined: 2010/04/12 22:18:43
  • Location: Japan
  • Status: offline
  • Ribbons : 1
Re: Low Memory Benchmark Woes 2020/03/31 06:51:45 (permalink)
X299DARK 10980XE finally let me just use my XMP profile on 4x8GB G.Skill F4-3800C14Q-32GTZN
 
XMP 3800 CL14-16-16-36-666 1T
https://i.imgur.com/OQEbHCE.jpg
 
@4000 CL17-18-18-28-320 1T
https://i.imgur.com/qkSFTqv.jpg
#61
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/03/31 16:29:54 (permalink)
kazu4009
X299DARK 10980XE finally let me just use my XMP profile on 4x8GB G.Skill F4-3800C14Q-32GTZN
 
XMP 3800 CL14-16-16-36-666 1T
https://i.imgur.com/OQEbHCE.jpg
 
@4000 CL17-18-18-28-320 1T
https://i.imgur.com/qkSFTqv.jpg




With all due respect what that has to do with the topic?
#62
Tech_JoseC
iCX Member
  • Total Posts : 383
  • Reward points : 0
  • Joined: 2017/06/05 00:00:00
  • Location: EVGA
  • Status: offline
  • Ribbons : 2
Re: Low Memory Benchmark Woes 2020/03/31 16:49:21 (permalink)
For clarification purposes, we are looking into performance difficulties that ZoranC is experiencing. Our motherboard team is looking into this and we hope to get an answer soon. 
 
-Jose C. 
#63
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/03/31 16:55:51 (permalink)
EVGATech_JoseC
For clarification purposes, we are looking into performance difficulties that ZoranC is experiencing. Our motherboard team is looking into this and we hope to get an answer soon. 
 
-Jose C. 



Hi Jose, please notice it is not just me and not just with 32GB DIMMs. Please see following post by xuqi99 that is experiencing same issue with various DIMM kits:
 
https://forums.evga.com/FindPost/3042046
#64
EVGA_Lee
Moderator
  • Total Posts : 4247
  • Reward points : 0
  • Joined: 2016/11/04 14:43:35
  • Location: Brea, CA
  • Status: offline
  • Ribbons : 14
Re: Low Memory Benchmark Woes 2020/04/08 12:04:59 (permalink)
Alright guys, bottom line is that it's when you're using 4 channels.  Period.  It doesn't matter the brand, nor does it matter the processor brand.
 
We also asked G.Skill to look into it and they saw similar results in Performance Test to what's been reported here when they tested with a competitor's X299 motherboard and also on a TRX40 motherboard.
 
It's important to note that it's not the overall memory performance that drops when you're using 4 channels; it's just the Memory Read Uncached performance.  Otherwise, there's a notable memory performance increase overall when using the full quad channel.
 
Here is what G.Skill provided with a TRX40 board using 8GB DIMMs at 3600Mhz (2 DIMMs, 3 DIMMS, 4 DIMMS):
 
 


 
Overall, this could imply that there are issues with multiple motherboard platforms, or that the testing utility may not be testing 4 channel platforms properly.  Either way, at least you all have some new data to consider.
#65
xuqi99
New Member
  • Total Posts : 41
  • Reward points : 0
  • Joined: 2015/04/30 07:06:29
  • Location: Australia
  • Status: offline
  • Ribbons : 3
Re: Low Memory Benchmark Woes 2020/04/08 18:47:21 (permalink)
Thank you for the info. 
#66
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/04/08 19:33:40 (permalink)
EVGATech_LeeM
Alright guys, bottom line is ...



@EVGATech_LeeM Thank you for the update. I have following comments and questions, please:
 
As I am looking at the data you shared I do not feel that particular data can result in 100% reliable conclusion -SAME- issue happens with other brands and platforms too, nor in statement “… not the overall memory performance that drops when you're using 4 channels; it's just the Memory Read Uncached performance.  Otherwise, there's a notable memory performance increase overall when using the full quad channel”.
 
Reasons why I feel this way can be explained by looking at the figures (please see screenshots below), lets start with the easy ones first to get them out of the way:
 
1. For memory read cached TRX40 gets 33855 -> 33472 -> 33922 when going from x2 -> x3 -> x4. My X299 Dark gets 30378 -> 30340 -> 30310. Both practical zero change.
 
In other words: When looking at memory operations that were cached one wouldn’t immediately see anything off but caches often hide issues that are underneath so this can’t be used to make any conclusions.
 
2. For memory writes TRX40 gets 11398 -> 11575 -> 14381. X299 Dark gets 13586 -> 13563 -> 15425.
 
So TRX40 gains 26% while X299 Dark gains 13%, twice less than TRX40 did.
 
Now lets move onto ones more indicative of the issue:
 
3. For database operations TRX40 gets 7316 -> 7296 -> 10741. X299 Dark gets 6364 -> 6550 -> 6652.
 
In other words TRX40 database operations gained 47% when going from dual to quad while X299 Dark gained only 5%!
 
4. TRX40 memory read uncached gets 18273 -> 18322 -> 17471. X299 Dark gets 14589 -> 15086 -> 10337.
 
In other words TRX40 reads uncached loses only 5% while X299 Dark loses 41%!
 
5. TRX40 memory threaded 48905 -> 57822 -> 88271. X299 Dark 40722 -> 53347 -> 75821. 80% gain for TRX40, 86% for X299 Dark. Based on comments I found results from that test can’t be used for any conclusions (total memory block tested is just 256 MB).
 
In the end I do NOT see “notable memory performance increase overall when using the full quad channel”. To be specific I do see it for TRX40 but not for X299 Dark, at least not when it comes to any operations that involve reads that do not fit in the cache. 5% gain on database operations and 13% gain on writes can’t be, IMHO, called notable. Overall TRX40 seemed to have much bigger gains and much smaller losses than X299 Dark when going from dual to quad memory channel.
 
This is where I have to wonder is posting screenshots from TRX40 helping or muddying up the issue at hand. I know very little about memory architectures but it is my understanding TRX40 mbs might employ NUMA while X299 does not, which in turn might result in better memory performance from TRX40.
 
If that is correct then TRX40 can’t be used to make any relevant conclusions.
 
This leaves me with an impression that:
 
a)  Write operations on X299 Dark do get some small gains with number of channels, but not as much as TRX40. This in turn raises question how much of a gain there should be on X299.
 
b)  Any score that involves reads, especially ones that are not cached, seems to be suffering when going from dual to quad channel on X299 Dark.
 
c)  We/you should try not to introduce new variables, like NUMA, into an equation. I feel it would be much better if we were given apples to apples results, like X299 Dark with Cascade Lake-X vs. same one with Skylake-X, and X299 Dark vs. Asus or whatever.
 
Also, I’ve tried searching whether others too have experienced PassMark score drop when going from dual to quad channel and I’ve come across this thread:
 
https://www.techpowerup.com/forums/threads/workstation-ddr4-memory-benchmarks-ecc-vs-non-ecc-16-gb-vs-32-gb-single-vs-dual-vs-quad-channel-overclocked-vs-default-timings.257565/
 
That thread indicates that yes, others too have experienced that drop, with different Intel chipset and mb vendor too, -BUT NOT IN SAME EXTENT-. That thread’s poster had 9% drop, which is more in line with TRX40’s 5%, not anywhere near 41% that X299 Dark shows.
 
So if your effort in the end confirms uncached read drop is to be expected my next question will be: How MUCH of a drop is to be expected as normal, is 41% on X299 Dark normal or it is excessive?
 
Last, but not least, in my experience there are “lies, damn lies, statistics, and benchmarks” so I do not implicitly trust any benchmarking software. I have now come across statements that PassMark’s read un-cached favors single cores with higher clocks (why they would do that beats me).
 
Have you tried reaching out to PassMark? Better yet, do you know of a better memory benchmark that would either confirm what PassMark is indicating or question it?
 
In the end, what is next I can expect from you and when can I expect it please?
 
Speaking of which, could you please provide me an update on that thread about throttling I shouldn’t be having yet I do? Last update was that you are looking at it but that was a month ago and no updates since then.
 
Thank you!
 



#67
zGunBLADEz
New Member
  • Total Posts : 58
  • Reward points : 0
  • Joined: 2014/11/26 07:05:08
  • Status: offline
  • Ribbons : 1
Re: Low Memory Benchmark Woes 2020/04/09 03:00:06 (permalink)
EVGATech_LeeM
Alright guys, bottom line is that it's when you're using 4 channels.  Period.  It doesn't matter the brand, nor does it matter the processor brand.
 
We also asked G.Skill to look into it and they saw similar results in Performance Test to what's been reported here when they tested with a competitor's X299 motherboard and also on a TRX40 motherboard.
 
It's important to note that it's not the overall memory performance that drops when you're using 4 channels; it's just the Memory Read Uncached performance.  Otherwise, there's a notable memory performance increase overall when using the full quad channel.
 
Here is what G.Skill provided with a TRX40 board using 8GB DIMMs at 3600Mhz (2 DIMMs, 3 DIMMS, 4 DIMMS)
Overall, this could imply that there are issues with multiple motherboard platforms, or that the testing utility may not be testing 4 channel platforms properly.  Either way, at least you all have some new data to consider.


so what exactly im looking at there??

heres my test

b
tw may i ask why the boards/bios dont have a locked cache multi option ofr lowest and max option? my asrock one have this option...  This board goes up and down like it aint no business in the cache..
post edited by zGunBLADEz - 2020/04/09 03:02:38
#68
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/04/09 09:42:28 (permalink)
zGunBLADEz
so what exactly im looking at there??

heres my test





Thank you! You are supposed to compare quad channel results with two channel ones (taking two memory sticks out).
 
Which mb, CPU and memory kit is that, please?
#69
EVGA_Lee
Moderator
  • Total Posts : 4247
  • Reward points : 0
  • Joined: 2016/11/04 14:43:35
  • Location: Brea, CA
  • Status: offline
  • Ribbons : 14
Re: Low Memory Benchmark Woes 2020/04/09 12:37:30 (permalink)
One issue here is that I'm not nearly as familiar with this benchmark, as I've been with others in the past.  I can't say, for example, that the benchmark may be unduly influenced by factors such as CPU speed, memory latency, memory speed, memory available, or other tweaks that people often set or do to improve benchmark performance.  Assuming that the benchmark is fairly balanced, then we still have to account for both the programming and OS variables, before we get to actual performance values.
 
I preface my post by saying that because although we know that at least one performance metric used by this benchmark is peculiarly affected by adding a 4th DIMM into a quad-channel memory configuration - repeatable among different chipsets and platforms - we may not necessarily know if another factor further influences performance, such as the amount of memory used (e.g. 128GB) or other factors.  Given that we and an independent memory manufacturer have produced similar results, it suggests, as I noted, that there may be either questions about how the benchmark determines performance or that the memory controllers on quad-channel supported Intel and AMD processors appear to suffer a slight performance bottleneck in a specific metric.  For the first, this is a benchmark tool provided by a third-party manufacturer; for the second, that would be far beyond our capabilities to correct.
 
With regards to your post, and edited for responses:
 
ZoranC
 
@EVGATech_LeeM Thank you for the update. I have following comments and questions, please:
 
As I am looking at the data you shared I do not feel that particular data can result in 100% reliable conclusion -SAME- issue happens with other brands and platforms too, nor in statement “… not the overall memory performance that drops when you're using 4 channels; it's just the Memory Read Uncached performance.  Otherwise, there's a notable memory performance increase overall when using the full quad channel”.

Let's be clear, this isn't just our testing, it's also testing by G.Skill.  Although I've provided an abbreviated answer above, that does not imply that limited testing was done.  G.Skill looked at the claims being made in the thread here and noted that this performance is in line with what should be expected.  That does not mean that some people have a slight variance over their overall memory performance, but it does mean that this "low performance" is, in fact, standard performance.
 
ZoranC
Reasons why I feel this way can be explained by looking at the figures[...]
 
In the end I do NOT see “notable memory performance increase overall when using the full quad channel”. To be specific I do see it for TRX40 but not for X299 Dark, at least not when it comes to any operations that involve reads that do not fit in the cache. 5% gain on database operations and 13% gain on writes can’t be, IMHO, called notable. Overall TRX40 seemed to have much bigger gains and much smaller losses than X299 Dark when going from dual to quad memory channel.

I think you're getting too far afield of the discussion here.  You're comparing anecdotal evidence and trying to fit the numbers into a theory.  The investigation into the memory performance and the conclusions of the investigation is less about the specific numbers a person should get with certain memory speed/latency, and more about whether there is a notable and repeatable memory performance dip when adding 4 DIMMs.  Comparing two different platforms to compare the performance numbers isn't the goal here; the goal is to see whether there's a performance dip on each motherboard platform respectively compared to its own numbers when using a quad-channel configuration.  
 
ZoranC 
This is where I have to wonder is posting screenshots from TRX40 helping or muddying up the issue at hand. I know very little about memory architectures but it is my understanding TRX40 mbs might employ NUMA while X299 does not, which in turn might result in better memory performance from TRX40.
 
If that is correct then TRX40 can’t be used to make any relevant conclusions.

Again, that's irrelevant.  The question is whether there is a drop in memory performance in a specific metric (or other metrics) when a 4th DIMM is added.  Per G.Skill's testing, yes, there is.  As such, it certainly shows that this is not an issue related only to the X299 platform, or that it is an issue related only to the X299 DARK. 
 
ZoranC
This leaves me with an impression that:
 
[...] 
c)  We/you should try not to introduce new variables, like NUMA, into an equation. I feel it would be much better if we were given apples to apples results, like X299 Dark with Cascade Lake-X vs. same one with Skylake-X, and X299 Dark vs. Asus or whatever.

As I noted in my earlier post, G.Skill also tested on an MSI X299 motherboard and saw a similar drop in performance for this metric.  The addition of the TRX40 board does not introduce new variables.  If anything it supports a conclusion that the "low performance" noted in this thread is not limited to the X299 DARK, or even the X299 platform, but rather any motherboards/processors that utilize a quad-channel memory configuration.
 
ZoranC 
Also, I’ve tried searching whether others too have experienced PassMark score drop when going from dual to quad channel and I’ve come across this thread:
 
https://www.techpowerup.com/forums/threads/workstation-ddr4-memory-benchmarks-ecc-vs-non-ecc-16-gb-vs-32-gb-single-vs-dual-vs-quad-channel-overclocked-vs-default-timings.257565/
 
That thread indicates that yes, others too have experienced that drop, with different Intel chipset and mb vendor too, -BUT NOT IN SAME EXTENT-. That thread’s poster had 9% drop, which is more in line with TRX40’s 5%, not anywhere near 41% that X299 Dark shows.

As much as that thread also shows that a quad-channel configuration experiences drops in uncached read times, I would caution against using benchmark data from a different chipset, with a different software version, and OS differences.  It suggests a trend, but it's hard to tell how reliable that data is to put up against our current systems.
 
ZoranC
So if your effort in the end confirms uncached read drop is to be expected my next question will be: How MUCH of a drop is to be expected as normal, is 41% on X299 Dark normal or it is excessive?

This is a much better question.  Unfortunately, I don't have an answer, and I'm not sure if there is one to find.  As I mentioned at the very top of this post, there could be a lot of factors influencing your scores.  Overclocks, lack of overclocks, etc. could also be in play.  If you have your system overclocked, I'd suggest trying with your CPU at stock, with your memory set to XMP and see if you still have the same disparity in test results.
 
If I have some time this weekend, I may be able to do some passmark runs and post the benchmarks from my X299 DARK.  At a certain point, this may be anecdotal and could be down to a lot of different factors.  Again, I'll try to look at my benchmarks and see if there's a pattern or if our numbers are way off.  
 
ZoranC 
Last, but not least, in my experience there are “lies, damn lies, statistics, and benchmarks” so I do not implicitly trust any benchmarking software. I have now come across statements that PassMark’s read un-cached favors single cores with higher clocks (why they would do that beats me).
 
Have you tried reaching out to PassMark? Better yet, do you know of a better memory benchmark that would either confirm what PassMark is indicating or question it?

There are a number of benchmarks that prefer single cores with higher clocks.  Generally, GPU benchmarks tend to do better with pure speed (excepting the CPU tests, if included), and other CPU/Memory benchmarks often do better with single cores and higher clocks (PiFast, for example).  I think a lot of it has to do with how Windows processes tasks, and that I can't imagine that using multiple cores would reduce memory latency or improve memory processing.  The less busy the CPU is, the faster the memory can handle reads/writes.  But that would be my guess.
 
No, we did not reach out to PassMark.  For benchmarks, I can't vouch for these specifically, but HWBot recognizes these memory benchmarks:  https://hwbot.org/benchmarks/memory
 
ZoranC 
In the end, what is next I can expect from you and when can I expect it please?

I'm not sure what else we can offer you at this time.  From our end, it seems that this is less of an "issue" and more "working as intended", to a degree.  Now, if for some reason you have unusually low memory performance compared to other X299 boards, then additional troubleshooting may be needed.  That's a separate issue than what this thread originally started as, which was to confirm whether or not a performance drop occurs at a certain point when 4 DIMMS were installed.  We did, in fact confirm that.  Now, moving to whether the performance drop you're seeing is average or too low is a separate question.  Let's see what my benchmarks show and we'll go from there.
 
ZoranC 
Speaking of which, could you please provide me an update on that thread about throttling I shouldn’t be having yet I do? Last update was that you are looking at it but that was a month ago and no updates since then.

Yes.  Sorry about that.  I do have an update for you, and I'll post it later today.  
#70
zGunBLADEz
New Member
  • Total Posts : 58
  • Reward points : 0
  • Joined: 2014/11/26 07:05:08
  • Status: offline
  • Ribbons : 1
Re: Low Memory Benchmark Woes 2020/04/09 19:58:19 (permalink)
ZoranC
zGunBLADEz
so what exactly im looking at there??

heres my test





Thank you! You are supposed to compare quad channel results with two channel ones (taking two memory sticks out).
 
Which mb, CPU and memory kit is that, please?


This is a micro2 , gskill 2 sets of 16gb 3600 b dies.

What im looking at your previous settings just to let you know as i have a 32gb kit as well 3600 b dies they dont go past 4000 stable as a matter of fact i have the highest stable kit of 16gb x stick on overclock.net under 8700k an itx asus 370.

16gb kits dont overclock as good as 8gb kits so that goes to 32gb sticks as well. Also your timings are so loose trfc at 800+? All those timings aint giving you performance at all. Even at 4 chan not counting you have 3200...
#71
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/04/09 20:05:50 (permalink)
zGunBLADEz
ZoranC
zGunBLADEz
so what exactly im looking at there??

heres my test





Thank you! You are supposed to compare quad channel results with two channel ones (taking two memory sticks out).
 
Which mb, CPU and memory kit is that, please?


This is a micro2 , gskill 2 sets of 16gb 3600 b dies.

What im looking at your previous settings just to let you know as i have a 32gb kit as well 3600 b dies they dont go past 4000 stable as a matter of fact i have the highest stable kit of 16gb x stick on overclock.net under 8700k an itx asus 370.

16gb kits dont overclock as good as 8gb kits so that goes to 32gb sticks as well. Also your timings are so loose trfc at 800+? All those timings aint giving you performance at all. Even at 4 chan not counting you have 3200...



Thank you! Could you please run test of just two sticks? Also, which CPU you have in?
#72
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/04/09 23:30:53 (permalink)
@EVGATech_LeeM:
 
Thank you for your thorough reply and additional info. While there are things in it I disagree with I feel debating them wouldn’t be constructive and help progress toward main goal so I will jump over them to what I believe matters (but not necessarily immediately) …
 
EVGATech_LeeM
If you have your system overclocked, I'd suggest trying with your CPU at stock, with your memory set to XMP and see if you still have the same disparity in test results.

 
This is one of very first things I’ve checked when opening a ticket with EVGA and I’ve double-checked that today. Results are practically same when mb is on default settings (I’ve cleared CMOS to make sure). That has been communicated to EVGA tech during work on a ticket.
 
One interesting thing happened today though: I’ve swapped back in Corsair kit. System recognized memory has changed, I’ve went back into BIOS, changed memory profile to ‘default’, rebooted, changed memory profile to XMP, rebooted, and then benchmarked. Corsair benchmarks uncached reads lower than G.Skill but ratio of dual to quad channel is still practically same (40-ish percent). Then I swapped G.Skill back in and repeated memory configuration but benchmark didn’t go back to level G.Skill had before Corsair was put, it stayed lower than that through repeated runs. It went back to original level only after I cleared CMOS and redid settings.
 
It was as if something was left over from Corsair profile resulting in lower result with G.Skill until I cleared CMOS.
 
EVGATech_LeeM
ZoranC 
How MUCH of a drop is to be expected as normal, is 41% on X299 Dark normal or it is excessive?

Have you tried reaching out to PassMark?

In the end, what is next I can expect from you and when can I expect it please?

This is a much better question.  Unfortunately, I don't have an answer, and I'm not sure if there is one to find.

No, we did not reach out to PassMark.

I'm not sure what else we can offer you at this time.  From our end, it seems that this is less of an "issue" and more "working as intended", to a degree.

 
I think we have reached a point where we both can agree all the data so far indicates there will be a dip (at least in PassMark). However, I believe that doesn’t mean we should/can just leave it at that and move on.
 
I’m not a vendor of computer hardware but I am a vendor of SQL Server support services. If my customer called me saying “I’ve doubled the memory on my SQL Server and my application is telling me writes are now faster but reads are 41% slower” I wouldn’t stop at just confirming I can reproduce that elsewhere and yes, that particular application does show dip on reads elsewhere too, because that wouldn’t be addressing that customer’s concern.
 
That customer doesn’t know who he should be pointing finger at, app vendor or me, to him we are both equally innocent and equally guilty until proven differently, and to be honest I can’t assume it is not me until I find out:
 
a)   Does that app show similar amount of dip everywhere or my customer has it much worse than others with similar setup,
 
b)   Can that app be trusted, do other apps of same type report dip in similar amount
 
And if necessary I would be reaching out to app vendor “Hey, help me out get to the bottom of this because in the meantime customer is looking at both of us”.
 
That I don’t leave them without answer is what my customers love(d) about me.
 
So I am hoping next step will be to find out is my 40-ish % amount of dip “normal” across the board (for those with similar setup) and am looking forward to hearing results of that and your own benchmark.
 
In the meantime I will keep looking for other memory benchmarking software that would help either confirm or debunk PassMark’s results because having only one data point can send one on a wild goose chase.
 
Thank you!
#73
cdc-951
SSC Member
  • Total Posts : 520
  • Reward points : 0
  • Joined: 2012/04/27 02:26:30
  • Status: offline
  • Ribbons : 1
Re: Low Memory Benchmark Woes 2020/04/11 03:10:10 (permalink)
So is my 3200Mhz Gskill Samsung bdie kit affected with a 9800x in quad channel...? What is the TLDR? There is a lot to read through and seems to make me a bit dizzy
#74
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/04/11 10:35:30 (permalink)
cdc-951
So is my 3200Mhz Gskill Samsung bdie kit affected with a 9800x in quad channel...?



You should be able to easily find that out. Run PassMark memory bench on 4 sticks, note numbers, take 2 sticks out, rerun bench, compare values for read uncached before and now and let us know what you found.
#75
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/04/27 11:37:08 (permalink)
EVGATech_LeeM
Now, moving to whether the performance drop you're seeing is average or too low is a separate question.  Let's see what my benchmarks show and we'll go from there.

 
In the absence of any further news/data from EVGA ? EVGATech_LeeM I’ve tried to see what I can figure out by myself. Obviously I am not in position to find out are my “losses” in line with what other X299 setups would have or not (for that I would have to get and test them, which I don’t have money and time for) but I could at least try to figure out is PassMark Read Uncached misleading or not.
 
That is what I have been working on since the last post. Now first results are in and experience has been interesting/educational. I will try to post it as best as I can but it is going to be long(ish) writeup so I will have to break it up across more than one post and there will be breaks in between them as I try to find time to write them.
#76
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/04/27 16:51:04 (permalink)
First, I need to explain how I feel about benchmarks, and why, so one can understand why I approach them way I do.
 
I don’t trust any benchmark, neither “synthetic” nor “real life”, especially not when figure they provide is a) not in standard unit of measure that can be validated with other tools that do same, and b) is single figure for something that has at least two variables.
 
What do I mean by that? Well, can you imagine how ridiculous “Car & Driver” review would look if they just said “We rate car X as 1.3 when compared to our reference car”? Even saying “Car X has Y/Z horse power / torque” means nothing cause those that know better know their power curve can be vastly different than what that one single figure would imply.
 
And those that know even better know that even if you are given figures for every single major sub-component of car that still won’t tell you how well that car will perform under conditions -you- need, that what worked great for quarter mile straight line will probably be horrible for autocross.
 
So they take it to “real world” track and measure every single aspect of it during every stage of the run in figures that can be compared rather than providing just one at the end. Otherwise if your car had lesser score at the end you don’t know was it because of acceleration, cornering … and was it cornering overall or it was in one specific corner, what you need to fix and where to improve that score.
 
Yet many of benchmarking tools in the industry do not seem to provide that even though “machines” we are trying to benchmark are equally complex.
 
That should explain why I don’t trust them, my comment about (lack of) units of measure and having just single figure for what are 2+ variables.
 
That should also explain why I believe one has to look at both synthetic and “real world” benchmarks in each category, why focusing on subset leaves you exposed to risk of not getting full picture and reaching incorrect conclusions.
 
I believe synthetic benchmarks of single component are needed because they let you zero in on certain area without risk of data getting skewed by performance of different area. If “real world” benchmark mixes workloads of more than one type (like they always do) you are risking that good performance in some areas will hide bad performance in the others.
 
At the same time synthetic benchmarks can mislead you into rabbit’s hole because, for example, you don’t know how much of a weight every area contributes in a final real world use and will real world apps even use it in a same manner benchmark does (there are so many ways code can be written).
 
That is why I use synthetic benchmarks as starting point, to check does general health / performance of every subsystem checkout before moving onto application based benchmarks.
 
So I started looking for another synthetic memory benchmark that would either confirm PassMark Read Uncached figures or question them …
 
#77
Sultan.of.swing
Superclocked Member
  • Total Posts : 174
  • Reward points : 0
  • Joined: 2012/12/14 20:58:21
  • Status: offline
  • Ribbons : 2
Re: Low Memory Benchmark Woes 2020/04/28 22:11:48 (permalink)
I'm not familiar with how X299 memory channels are laid out as to whether they are daisy chained or if they use T-Topology?
Is the Dark Daisy or T-top?
 
I know Daisy chained DIMM Slot board put way more strain on the IMC than T-Top does.
 
Just spouting off here based on what I know with Z370/90 and how the channels are laid out.
 
Actually thinking about it the board should be 1DPC.
 
 
post edited by Sultan.of.swing - 2020/04/28 22:29:18
#78
zGunBLADEz
New Member
  • Total Posts : 58
  • Reward points : 0
  • Joined: 2014/11/26 07:05:08
  • Status: offline
  • Ribbons : 1
Re: Low Memory Benchmark Woes 2020/04/29 04:54:59 (permalink)
thing is you aint see too much gain with those timings brother. trfc is way too high is what 800?? @ 3200
what setting you have your cache at ?? 
if you dont tweak timings and cache you aint seeing too much difference

stick with aida64 for mem bench theres something wrong with passsmark. Theres no way  mem latency is that low specially on amd thats my score 30-32 on 4200MHz on a heavily tweaked ram on my mobo.....
aida64 lineary scale as sticks are added and is more consistent.
post edited by zGunBLADEz - 2020/04/29 05:18:41
#79
PINKTULIP
FTW Member
  • Total Posts : 1158
  • Reward points : 0
  • Joined: 2007/06/03 16:01:19
  • Location: EARTH
  • Status: offline
  • Ribbons : 7
Re: Low Memory Benchmark Woes 2020/04/29 09:03:36 (permalink)
zGunBLADEz
thing is you aint see too much gain with those timings brother. trfc is way too high is what 800?? @ 3200
what setting you have your cache at ?? 
if you dont tweak timings and cache you aint seeing too much difference

stick with aida64 for mem bench theres something wrong with passsmark. Theres no way  mem latency is that low specially on amd thats my score 30-32 on 4200MHz on a heavily tweaked ram on my mobo.....
aida64 lineary scale as sticks are added and is more consistent.


Passmark Memory test is not accurate and way off.

MOBO :EVGA X299 DARK 151-SX-E299-KR  BIOS :1.29 CPU : Intel Core i9-10900X Skylake-X 10-Core 3.7 GHz  LCR :Corsair Hydro Series H80i V2 GPU :SAPPHIRE NITRO+ RX 6900 XT SE MEMORY: CORSAIR Dominator Platinum SE Torque 32GB (4 x 8GB) CMD32GX4M4C3200C14T SSD 01: SAMSUNG 970 PRO M.2 1TB NVMe SSD 02: SAMSUNG 860 PRO 256GBX2 Raid 0 PSU : Seosonic Prime Titanium SSR-1000TR 1000 Watts CASE :Thermaltake (Armor+) VH6000SWA SC :Creative Sound Blaster AE-9 5.1 Channels Monitor  Acer XR382CQK  IPS 3840x1600 @ 75HZ BD [/
#80
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/04/29 13:54:11 (permalink)
zGunBLADEz
thing is you aint see too much gain with those timings brother. trfc is way too high is what 800?? @ 3200
what setting you have your cache at ?? 
if you dont tweak timings and cache you aint seeing too much difference

stick with aida64 for mem bench theres something wrong with passsmark. Theres no way  mem latency is that low specially on amd thats my score 30-32 on 4200MHz on a heavily tweaked ram on my mobo.....
aida64 lineary scale as sticks are added and is more consistent.



This is not about whether timings are perfect / am I getting as much as I might be able to but whether there is an impact on performance when going from 2 to 4 channels. It has been already shown in this thread (by different members) that behavior of PassMark is same regardless of how good memory/timings one has.
 
post edited by ZoranC - 2020/04/29 14:10:36
#81
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/04/29 14:02:09 (permalink)
PINKTULIP
Passmark Memory test is not accurate and way off.



I'm slowly getting to that, bear with me ... :)
#82
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/04/29 14:34:15 (permalink)
First one thing I looked at was SiSoft Sandra. It has more than one memory benchmark but two I focused on are ‘Memory Bandwidth’ (for obvious reason) and ‘Database Transactional’ (because it is less of a synthetic and more of real life workload that I can relate to).
 
Also SiSoft lets you run them in multitude of ways (multi threaded, single threaded, etc.).
 
Multi-threaded memory bandwidth scaled -UP- when going from 2 channels to four just like Passmark Memory Threaded did.
 
-However-, SINGLE threaded memory bandwidth -DROPPED- when going from 2 to 4 channels, just like Passmark Read Uncached did. Drop wasn’t as large though as PassMark’s, it was “only” 12%.
 
This is where I need to point out observing of system indicates PassMark’s Read Unacached is SINGLE threaded benchmark that seems to use last CPU and relatively small amount of memory.
 
Next I looked at ‘Database Transactional’. Results for both multi-threaded and multi-core variant of both regular and ‘Record Update Only’ were practically identical when going from 2 to 4 channels.
 
Yes, figures were much lower for them than for other chipsets (which is likely a different story) but they were not showing any drops induced by increase in number of channels.
 
That wrapped up SiSoft.
 
{… to be continued …}
#83
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/04/30 18:52:35 (permalink)
At this point I started feeling whatever is happening is exposed when benchmark is single threaded. Is it exposing “flaw” with benchmarks (them not being right fit for that architecture) or something else is what I want(ed) to find out but couldn’t guess (cause I am not qualified to talk about modern memory architectures).
 
I felt everything so far indicated this might be due to how benchmark is designed so I started looking for benchmarks where I would be in charge of execution parameters rather than developer picking one (or even few) “presets” for me and taking me along for a ride.
 
Findings were slim (it seems everybody loves to benchmark games and CPU but memory not so much) and almost everything was unusable to me for at least one out of following reasons:
 
1. Precompiled executables were either not working or not available (and source code, where available, would require lots of effort to get to compiled point)
2. Approaches were outdated (as in not made for scaling)
3. Setup effort  was too high
 
That left me with just one candidate. Enter Intel Memory Latency Checker …
 
{… to be continued …}
 
#84
zGunBLADEz
New Member
  • Total Posts : 58
  • Reward points : 0
  • Joined: 2014/11/26 07:05:08
  • Status: offline
  • Ribbons : 1
Re: Low Memory Benchmark Woes 2020/04/30 23:28:27 (permalink)
 
ZoranC
zGunBLADEz
thing is you aint see too much gain with those timings brother. trfc is way too high is what 800?? @ 3200
what setting you have your cache at ?? 
if you dont tweak timings and cache you aint seeing too much difference

stick with aida64 for mem bench theres something wrong with passsmark. Theres no way  mem latency is that low specially on amd thats my score 30-32 on 4200MHz on a heavily tweaked ram on my mobo.....
aida64 lineary scale as sticks are added and is more consistent.



This is not about whether timings are perfect / am I getting as much as I might be able to but whether there is an impact on performance when going from 2 to 4 channels. It has been already shown in this thread (by different members) that behavior of PassMark is same regardless of how good memory/timings one has.
 



is been showing on the same app on question. :/ like i said if is not scaling theres something wrong with it. aida64 do scale with 2-3-4 sticks added and timings
#85
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/05/01 19:06:30 (permalink)
Intel’s Memory Latency Checker is not the program you can just run, push the button, and be done several seconds (or even minutes) later, you really want to take your time and RTFM.
 
I guess you could always run each of its 7 modules at default settings and call it a day but I figure if you are type of a person that is interested in digging deep enough to reach for IMLC then you are type of a person that will not stop at running it default values and calling it a day, otherwise you would’ve just taken what PassMark is serving you as granted.
 
IMHO by the time you are done reading its manual and exploring power it offers you will have bigger appreciation for complexity of task at hand and understanding of it.
 
It offers you not only full control over duration of tests and size of memory block used for testing but also do you want hardware prefetchers enabled/disabled, patterns, use of large pages … all the way to which CPU(s) you want which part of test pinned on or not. For me it was enough to cement the belief that something like PassMark is far from sufficient to give complete and accurate picture.
 
In the end I ran total of 30 various scenarios. It took a while to run them and compile findings but once dust has settled results were:
 
Bandwidth Matrix, Maximum Bandwidth and Peak Injection Bandwidth modules were definitely noticeably scaling up, Loaded Latency was on average scaling up, C2C latency and was practically identical when going from 2 to 4 channels.
 
Idle Latency and Latency Matrix modules have shown drop when going from 2 to 4 channels. Average drop was 5% (with 2% being lowest and 9% highest) which seemed to depend on which CPUs were engaged in test.
 
With IMLC I’ve ran out of synthetic tests and started looking at “real world” ones …
 
{… to be continued …}
 
#86
TiN_EE
Yes, that TiN
  • Total Posts : 377
  • Reward points : 0
  • Joined: 2010/01/22 21:30:49
  • Location: xDevs.com
  • Status: offline
  • Ribbons : 14
Re: Low Memory Benchmark Woes 2020/05/02 13:16:44 (permalink) ☄ Helpfulby ZoranC 2020/05/02 15:30:09
T-routing topology or Daisy chain topology does not apply to 1 DPC boards. So there are no pins in DIMM that are shared, each pin goes to each pad on CPU directly, so there is only one topology for 1 DPC, which is point to point. And that is what on all 1 DPC boards, be it Z390/Z490 or X299 DARKs or other vendors.
 
ZoranC, no benchmark will give you true memory only speed. Benchmarks only can do so much to "isolate" the performance testing scope, but it's always going to be a mix of IMC,DRAM and CPU/Mesh components. What you can do is to only collect relative abstract figures and see how that changes with different settings. And yes, reported numbers MB/s in some benchmarks are still abstract and not represent actual data bitrate in/out of DRAM. As you already figured out, cache plays big role too. When you changing number of modules populated into board, you are not just changing how many memory channels are used, but also how memory is shared inside CPU. X299 CPUs have two separate IMCs, one per each side. You can do interesting experiment - remove 2 DIMMs on one side, and test abstract performance numbers. Then remove 1 DIMM per each side, leaving 1 DIMM per side in board. Test again, you should see different numbers, even though both of the setups are 2 DIMMs , dual-channel ;). Another thing - disable some cores, and do set of tests, and you will see that far cores (from IMC) will give you slower memory "speeds" than nearby cores, because of mesh transport delays. Anyway, lots of things involved, benchmarking accurately it's not even a pandora box but a whole pandora planet ;) And multithreaded benchmarks are whole different clusterfun, because now you have many cores fighting for access to same resource (IMC). And 3rd/secondary timings in BIOS have complex relation on what topology you have in system, and how IMCs access memory in orderly fashion. However there is no much exposed controls how one can finetune what happens before data reaches IMC, it's all Intel's(AMD's/etc) secret sauce. To have somewhat control on that, you have to be deep into 0-level programming and know how to talk to CPU, IMC, mesh controller, etc. You can do bunch of PhD's on the topic (no joking here).
 
I get it, that it bugs you, that there is no way to get actual performance figures, but in last 10-15 years desktop confusers got a lot more confusing. Count it a funny coincedence when two different benchmarks showing somewhat similar numbers/results, rather than true meaning of what you thinking. 20 people saying same thing while one person disagree does not mean that all 20 are correct. ;)
 
 

If you have question, please post in public forum. I do not reply PMs, so all in community can benefit the answer. 
#87
CraptacularOne
Omnipotent Enthusiast
  • Total Posts : 14533
  • Reward points : 0
  • Joined: 2006/06/12 17:20:44
  • Location: Florida
  • Status: offline
  • Ribbons : 222
Re: Low Memory Benchmark Woes 2020/05/02 17:52:34 (permalink)
TiN_EE
T-routing topology or Daisy chain topology does not apply to 1 DPC boards. So there are no pins in DIMM that are shared, each pin goes to each pad on CPU directly, so there is only one topology for 1 DPC, which is point to point. And that is what on all 1 DPC boards, be it Z390/Z490 or X299 DARKs or other vendors.
 
ZoranC, no benchmark will give you true memory only speed. Benchmarks only can do so much to "isolate" the performance testing scope, but it's always going to be a mix of IMC,DRAM and CPU/Mesh components. What you can do is to only collect relative abstract figures and see how that changes with different settings. And yes, reported numbers MB/s in some benchmarks are still abstract and not represent actual data bitrate in/out of DRAM. As you already figured out, cache plays big role too. When you changing number of modules populated into board, you are not just changing how many memory channels are used, but also how memory is shared inside CPU. X299 CPUs have two separate IMCs, one per each side. You can do interesting experiment - remove 2 DIMMs on one side, and test abstract performance numbers. Then remove 1 DIMM per each side, leaving 1 DIMM per side in board. Test again, you should see different numbers, even though both of the setups are 2 DIMMs , dual-channel ;). Another thing - disable some cores, and do set of tests, and you will see that far cores (from IMC) will give you slower memory "speeds" than nearby cores, because of mesh transport delays. Anyway, lots of things involved, benchmarking accurately it's not even a pandora box but a whole pandora planet ;) And multithreaded benchmarks are whole different clusterfun, because now you have many cores fighting for access to same resource (IMC). And 3rd/secondary timings in BIOS have complex relation on what topology you have in system, and how IMCs access memory in orderly fashion. However there is no much exposed controls how one can finetune what happens before data reaches IMC, it's all Intel's(AMD's/etc) secret sauce. To have somewhat control on that, you have to be deep into 0-level programming and know how to talk to CPU, IMC, mesh controller, etc. You can do bunch of PhD's on the topic (no joking here).
 
I get it, that it bugs you, that there is no way to get actual performance figures, but in last 10-15 years desktop confusers got a lot more confusing. Count it a funny coincedence when two different benchmarks showing somewhat similar numbers/results, rather than true meaning of what you thinking. 20 people saying same thing while one person disagree does not mean that all 20 are correct. ;)
 
 


Tin it's always a pleasure reading a clear(ish?) explanation from an actual engineer them self. Thanks for sharing 

Intel i9 14900K ...............................Ryzen 9 7950X3D
MSI RTX 4090 Gaming Trio................ASRock Phantom RX 7900 XTX
Samsung Odyssey G9.......................PiMax 5K Super/Meta Quest 3
ASUS ROG Strix Z690-F Gaming........ASUS TUF Gaming X670E Plus WiFi
64GB G.Skill Trident Z5 6800Mhz.......64GB Kingston Fury RGB 6000Mhz
MSI MPG A1000G 1000w..................EVGA G3 SuperNova 1000w
#88
ZoranC
FTW Member
  • Total Posts : 1099
  • Reward points : 0
  • Joined: 2011/05/24 17:22:15
  • Status: offline
  • Ribbons : 16
Re: Low Memory Benchmark Woes 2020/05/02 18:33:26 (permalink)
TiN_EE
… no benchmark will give you true memory only speed. Benchmarks only can do so much to "isolate" the performance testing scope, but it's always going to be a mix of IMC,DRAM and CPU/Mesh components. What you can do is to only collect relative abstract figures and see how that changes with different settings.

 
Understood. Like you said, all of them play a role which could’ve been clearly seen as figures changed as I changed CPU ratio and/or mesh ratio (which leaves me with an impression if I swapped in 10920X instead of my 10900X and left everything else same final figures would go up but behavior pattern would not change).
 
Maybe I should’ve used “benchmarking performance of memory related components” as a better term but I keep using “memory” out of habit.
 
TiN_EE
When you changing number of modules populated into board, you are not just changing how many memory channels are used, but also how memory is shared inside CPU. X299 CPUs have two separate IMCs, one per each side. You can do interesting experiment - remove 2 DIMMs on one side, and test abstract performance numbers. Then remove 1 DIMM per each side, leaving 1 DIMM per side in board. Test again, you should see different numbers, even though both of the setups are 2 DIMMs , dual-channel ;).

 
That is something I’ve started to realize as I was working on this. In the beginning I’ve done series of benchmarks with 1-2-3-4 sticks and noticed results in one series were not behaving in same manner like in others. As I was double checking why I’ve realized in one of them I’ve populated memory in incorrect order and thus that populating memory in incorrect order will not stop things from working but it will make a difference in performance.
 
TiN_EE
Another thing - disable some cores, and do set of tests, and you will see that far cores (from IMC) will give you slower memory "speeds" than nearby cores, because of mesh transport delays.

 
I’ve observed and realized that once I started reading IMLC’s manual, which in turn made me look for and read other articles on architecture of memory related components in recent generation of CPUs. Those two latency tests that showed drop (while all others scaled up) seemed to show amount of drop that depended on how far away CPU that allocated memory is from one requesting it. Lowest amount of drop (2%) seemed to be when CPUs are very close while highest one (9%) seemed to be when they are farthest away.
 
That made me check how exactly PassMark Read Uncached runs and I feel that it (so it seems) always running on very last CPU might be indicating something.
 
Time permitting I might repeat this part of ILMC cause I am curious what results would look like if I tell it to leave hardware prefetchers at default (it is my understanding ILMC otherwise turns them off for duration of test) and then go through all CPUs.
 
TiN_EE
Anyway, lots of things involved, benchmarking accurately it's not even a pandora box but a whole pandora planet ;)

 
That is why I often say that there are “lies, damn lies, statistics and benchmarks”. I believe one needs to understand -both- nature of what he is benchmarking -and- how benchmark itself does it before he can trust results and feel they are being interpreted correctly.
 
TiN_EE
And multithreaded benchmarks are whole different clusterfun, because now you have many cores fighting for access to same resource (IMC).

 
Tell me about it, especially when one can’t tell benchmark which CPU(s) to use :( However, multi-threading is direction everything is moving in (which IMHO it has to) so benchmarks should reflect nature of environment and how it will be utilized. That is why I am wondering why PassMark chose to have Read Uncached single threaded and what “uncached” means to them.
 
TiN_EE
I get it, that it bugs you, that there is no way to get actual performance figures, but in last 10-15 years desktop confusers got a lot more confusing.

 
Actually not having “actual performance figures” is not what necessarily bugs me. Me not knowing who/what to trust, not having a reliable reference point when something is under question mark is what bugs/bugged me.
 
When one has in front of him 300 servers then checking is new one OK / spotting odd duck among them is easy, even if all you have is a single tool that doesn’t use standard units of measure nor is necessarily accurate in how it works. As long as that tool is consistent in its inaccuracy and abstractness and keeps reporting same value across the board (“this machine has rating equivalent to 99 secret thingamajigs”) you will be “OK, next”. But if different model comes in and you see rating of 66 thingamajigs you will want to understand why and thus you will dig into all parts involved, especially if value doesn’t seem to make sense at all / is sharply contradicting trend of other values.
 
It has been an eternity since I last built my own machine and I haven’t kept up with industry in the meantime. X299 Dark is only X299/modern mb I have and 10900X in it is only modern CPU I have. Thus I don’t have any reference point that could tell me in which direction is a solid ground, what I can trust, when I see something that seems odd. Thanks to this ticket/thread and research I have been doing I believe I am getting a solid idea what I can trust, and what I can’t, and why.
 
TiN_EE
20 people saying same thing while one person disagree does not mean that all 20 are correct. ;)

Story of my life LOL
#89
TiN_EE
Yes, that TiN
  • Total Posts : 377
  • Reward points : 0
  • Joined: 2010/01/22 21:30:49
  • Location: xDevs.com
  • Status: offline
  • Ribbons : 14
Re: Low Memory Benchmark Woes 2020/05/02 19:21:33 (permalink)
Can't trust no one. What I can add is that, most of this memory access/routing/operation is not relevant to X299 Dark specific, but apply on total for a given platform. If you look on performance of one board with chipset X and CPU A, then you will see majority of all other boards on chipset X and CPU A doing same thing, given everything runs on same clocks/settings. Better boards give more settings available to tinker with, and better power/signal quality so bad stuff start happening later, esp when overclocking, but overall behavior is same, like you expect.

If you have question, please post in public forum. I do not reply PMs, so all in community can benefit the answer. 
#90
Page: < 1234 > Showing page 3 of 4
Jump to:
  • Back to Mobile