Let me add to this thread:
RAM doesn't really do much to me, however, I assume when you used 8GB, you either compared it with 4GB dual channel in Windows, or you ran Linux with the memory in dual channel vs 4GB in single channel!
My previous Xeon setup folded no different using 2x 2GB RAM or 2x 4GB ram under windows or Linux.
I believe your numbers are as a result of running dual channel, so the CPU can much faster retrieve the data from RAM, and send it to the GPU; which causes a reduction in latency; vs running it on single channel. A single memory stick (running single channel) is usually fast enough, but it adds latency.
Likewise, tests were done where 16GB and higher was actually slowing down the system from running less memory.
I recommend 2x2GB for DDR3 systems, and 2x4GB for DDR4 systems (they don't sell below 4GB sticks of DDR4). This setup is much better than running 1x4GB or 1x8GB of DDR3; or 1x8GB or 1x16GB of DDR4 in a system.
PCIE 3.0:
- For PCIE speeds, a modern RTX 2060 graphics card can do with a PCIE 3.0 1x to 16x riser, and get nearly 97% of it's full slot performance.
- RTX 2060 Super, as well as the older RTX 2070 cards, can also work fine on 1x, however if you had a 2x port, it would be better.
- RT 2080 and 2070 Super, need a 2x port (which, aside from a converted m.2 to 4x slot, you won't find). So for 2070 Super, 2080, 2080 Super, and 2080Ti, you probably get a 3.0 4x slot or higher.
For PCIE 2.0:
- RTX 2060 runs best off of a PCIE 2.0 4x slot or greater.
- RTX 2060 Super, or 2070 can run off of a 4x slot, but preferably use an 8x slot or greater.
- RTX 2070 Super, 2080, 2080 Super, and a 2080 Ti, need a PCIE 2.0 8x slot, or greater.
PCIE 2.0 1x slots aren't recommended for GTX or better cards! A GT 1030 is about as fast as a 2.0 1x speed slot can provide for. Even a GTX 1050 will run 10-15% slower on a PCIE 2.0 1x slot!
MultiGPUs:
It appears that most modern Intel motherboards offering multi full size slots, can only drive up to 4x GPUs.
I've tested multiple Asus, Asrock, MSI and Gigabyte LGA 1151 motherboards in the sub $200 range, for Intel 6 & 7th gen, and 8 & 9th gen CPUs, trying to get more than 4GPUs to work at a time, without success.
*Please note, I did not use any performance lowering PCIE 1x to 4x splitters, to drive multi card setups. I also didn't run multi PCIE 1x slot motherboards. Most motherboards came with 2 or 3 full size slots, and 3 PCIE 1x slots*Some motherboards don't even recognize more than 3 GPUs, and are finicky in that they sometimes do, and sometimes don't show all the GPUs upon boot!
With the limitation of 1GPU per core for most optimal results, I would recommend to stick with quad core CPUs, as most motherboards don't support more than 4 GPUs (2 to 3 in a full size slot, the remaining in a PCIE 1x slot) when running Linux.
Get a 6 core CPU, like Intel Core I5 9400F (or K), if you're running Windows; or, you can disable cores to save power matching your 1 core per GPU (+1 core for Windows).
There are some (standard ATX or extended ATX) motherboards which support more than 4GPUs, but the majority won't (not including mining motherboards filled with PCIE 1x slots), and faster GPUs, like the 2070 Super, 2080, or higher, need PCIE 4x minimum, so those multi PCIE 1x slot boards aren't really a good alternative to run these cards.
Many of these cheaper Chinese mining boards also run PCIE 2.0, so not a good option for folding!
The problem with most motherboards lies in it's use of full size PCIE slot speed (8x, 4x, 4x) configuration, often leaving a single 1x slot available for 4th GPU, before PCIE lanes are used up.
If they had created a 4x 4x 4x configuration on their full size slots, it would have been possible to drive 4 additional GPUs via 1x to 16x risers.
Sadly, this is not the case.
What's more, CPU's like the i5 9400F have no IGPs. Motherboard manufacturers could have routed an additional 8 PCIE lanes (from the CPU pins that were originally dedicated to the IGP) to a PCIE 4x slot on the board, if the CPU's IGP was either not present, or bypassed.
But they don't.
CPU:
I've also tested CPU throttling.
- While 6th and 7th gen CPUs from Intel can be throttled down, you'll see a ~10% performance penalty, running the CPU at 10% lower power consumption. In my case, this resulted in 5-6Watts of power savings. This was not worth it. My recommendation is to run the CPU at full speed, but you could disable turbo boost if you like; as Turbo Boost is the cause of most CPU power spikes; and Folding on Nvidia cards use the CPU constantly, not under spikes, anyway. Just as long as the CPU runs fast enough to drive your GPU, you should be ok.
- With Intel 9th gen CPUs, I've noticed that cutting power by 5-10W on these CPUs, can result in an infinite boot loop. Both on Gigabyte and Asus, did I mess up the system by giving the CPU less than recommended power. Normally the CPU would run at lower speeds, but the Bios allows for CPUs to run at much less than 50W (for a 65W CPU). Once you surpass the threshold, the CPU no longer gets enough power to boot the board, and you'll end up with a broken Mobo. Not recommended!
- I haven't yet gotten to the performance difference between eg: an Intel Quadcore with HT, vs a true 6 or Octacore CPU.
If you're having a quadcore with HT, running only 4GPUs, it is recommended to turn off HT for Linux. It'll allow for faster performance. It is estimated, that you could run 6 GPUs fine on a 4C/8T CPU, without speed degradation, but I haven't been able to test this claim. I guess as long as each thread can meet up each GPU's demand for processing power, it might work.
But it'd be interesting to see how the numbers are with HT enabled CPUs (eg: cut 3 cores off of a core i7, running single core + HT, and pushing 2 GPUs. Then lower the CPU frequency, until one card starts throttling).
- While it is possible to run multiple Nvidia GPUs per CPU core, it's really not recommended in Linux. Seeing that a 3,6Ghz dual core CPU could barely push 4x RTX 2060 cards, with loss of performance. The 1core per GPU still holds true (if you want to keep that 90-95% efficiency margin when compared to running 1 GPU per core). However, if you'd like to save some cost in power, you could reduce the CPU speed in the Bios to meet your most demanding GPU.
For an RTX 2080Ti @2.2M PPD, you need a 3Ghz CPU or faster.
For an RTX 2080 @1.5M PPD, you need a 2.75Ghz CPU or faster.
For an RTX 2060 @1M PPD, you need a 2Ghz CPU or faster.
For a GTX GPU @ 150-600k PPD you need a 1,7Ghz CPU or faster.
For a GTX GPU @ <150k PPD you can come by with the currently slowest CPUs (Intel Atom @1,6Ghz).
I can see the CPU usage in Linux, and also measured PPD efficiency drop drastically when dropping CPU frequency below the above standards.
If you run one RTX 2080, and the rest slower cards, you could cap your CPU to 2,8 or 3Ghz on all cores.
While it's not going to save you lots of power, it should come with a fairly unnoticeable performance penalty, and your CPU will run cooler too.
All the above findings are recorded in Lubuntu 18.10, with the Geforce 428.xx drivers, and for me, for a GPU to be running efficiently, it'll have to run at within 90-95% PPD, of when you'd run the same GPU by itself, in a PCIE 3.0 16x slot in Linux.
If the numbers are more than 10% off, I would not consider using that kind of setup, even if it saves you money on buying hardware.
Think 10% performance loss on 8 GPUs that do 1M PPD, equals 800k PPD performance loss.
You'll lose the equivalent of a small RTX 2060 on score!
Also, if you're losing 10% of speed because of using PCIE 2.0, and 10% because of using Windows, and 10% because of using a PCIE splitter, you're running at 73% efficiency.
This is not recommended!
Since the GTX 1080 Ti is faster than the RTX 2060, I would highly doubt you could run 8 GPUs through a 3.7Ghz dual core CPU like the G4620, without some serious (like 80%) performance penalty!
At best, in Linux, according to my calculations, you could run 4 GTX 1080 Tis on a dual core that is at least 3,7Ghz, but it's only theoretical.
I would perhaps like to open conversation again about running multiple cards per CPU core; to iron out this inconsistency with my own findings.
post edited by ProDigit - 2019/07/27 00:38:49