TigerDeath
GTX 670 is here... Where do I start? :)
The first thing to keep in mind about Einstein GPU tasks is that the GPU application is also CPU dependent. There is considerable traffic generated between the CPU and GPU and because of that the PCI-E bandwidth is very important and will have significant impact on performance. For hardware and software configuration, I would suggest the following.
1) Install the 670 in a native x16 slot that is set at x16. You can confirm this with GPUZ. If you have a board with PCI-E 3.0 support, you will want to set the registry option that enables PCI-E 3.0 support. The application is able to take full advantage of PCI-E 3.0 x16 bandwidth if available and especially when running multiple tasks at once (see later). Ivy Bridge with Z77 and Sandy Bridge Extreme with X79 support PCI-E 3.0.
2) For hyperthreaded CPUs, you will want to set the maximum CPUs used to 50% due to CPU dependence. This will minimize the use of hyperthreading and give the best performance for the GPU tasks. Here is how to set that.
BOINC 6.10.x - Advanced -> Preferences -> processor usage On multiprocessor systems, use at most 50% of the processors.
BOINC 6.12.x+ - Tools -> Preferences -> processor usage On multiprocessor systems, use at most 50% of the processors.
3) If you have an i7 processor, optimal overclock would be 4.0 GHz or greater. The higher frequencies can significantly improve performance with GPU tasks. 4.5 GHz or greater is even more optimal but keep an eye on your CPU temperatures.
4) After you setup an account by attaching to the project, you can login to your preferences via the Einstein website.
http://einstein.phys.uwm.edu/home.php Select Einstein@Home preferences from here. There is an important option here.
GPU utilization factor of BRP apps.
By default, this option is set to 1.0 which means that the GPU will run one task at a time. Since the GPU application does not fully take advantage of a high-end GPU like the Keplers, the GTX 670 and 680 can optimally run at three tasks at once. Beyond that, the gains are minimal. Each task requires approximately 300-350 MB of memory in Windows and 250-275MB of memory in Linux. Set the utilization factor based on the following choices:
1.0 - one task
0.5 - two tasks
0.33 - three tasks
0.25 - four tasks
5) The CUDA application runs faster in Linux than Windows but still performs very well in Windows.
6) For Windows, you can run Process Lasso for setting application priority and CPU affinity. I prefer setting the Einstein BRP4 application to high priority and to set affinity to physical cores only. Once Einstein is crunching on your GPU, the BRP4 application should show up in the Process Lasso task list and you can then right-click on it and set these options. You only have to configure for one of the running tasks and the others will update automatically. This step take a bit of work to get going and is completely optional.
There are some other small tweaks as well but these are the major ones that will help you get the most performance out of the Einstein GPU tasks.
post edited by linuxrouter - 2012/06/18 18:49:47