Re: 355.60 Last Stable Driver?
2015/11/12 08:17:14
(permalink)
Even if the P2 State is corrected for Linux it only saying that the GTX 980 Card are NOT meant to Fold either because they are to Powerful for Folding or they are Bad Cards or Faulty Cards. This is based on Most Core 21 Projects work fine on Older Cards (That Take 1.5 to 2 Times Longer to Complete) and that I can reduce the Memory to -300MHz on Newer Cards. Or is it the Code on each of the Projects as we are not really seeing this problem on older Core 18 Project that were written for the New GTX 9xx Cards? I am back on the Current Driver using -300Mhz on the Memory and running the GPU's at 1500MHz. Once I update my X99 Rig back to the Current Driver as well I will go down to -300 MHz and See. Estimated Cost for each Failed "BAD_WORK_UNIT" is about $5.00 of wasted electric energy under Xcel Energy. Only NVIDIA can explain WHY they do what they do as does Standard and we both know they are not going to do that. GTX 9xx cards are for Gaming Not for Folding in the Future Drivers? My testing show each driver new or old run fine for some Core 21 Projects but not other Core 21 Projects. We will have to make the call on allowing failed 2 or 3 hour Projects or 10 or 15 hour projects, which cost less waste to your money on? As before all we can really do is Keep Folding or just give it up altogether or move over the BOINC.
post edited by bcavnaugh - 2015/11/12 08:36:55
|
Re: 355.60 Last Stable Driver?
2015/11/12 08:44:08
(permalink)
One Thing for Sure is now we can no longer allow our Computer to Fold on their own. We must watch them very close and with Windows 10 updating the Video Drivers automatically we could see a lot more problems and issues in the Months and Years to come.
|
HK-Steve
FTW Member
- Total Posts : 1040
- Reward points : 0
- Joined: 2015/04/06 08:46:57
- Location: Switzerland
- Status: online
- Ribbons : 0
Re: 355.60 Last Stable Driver?
2015/11/12 10:12:11
(permalink)
I am hanging in there until we hit the 3 Billion for the month, then I am off crunching.... not going to put up with these losses of WU's. I have lost over 50+ WU's now.........
|
Grandpa_01
New Member
- Total Posts : 92
- Reward points : 0
- Joined: 2012/04/28 20:59:00
- Status: offline
- Ribbons : 0
Re: 355.60 Last Stable Driver?
2015/11/12 11:07:30
(permalink)
Yes it is a problem and I do believe it is a memory controller problem rather than a memory problem but that is just a guess. I believe EVGA tried to correct it in the 980 Classifieds with a memory change the early 2998 models had different memory than the latter 3998 model has and my early version 980 Classified is actually better than the 3998's so they may have gone the wrong way there. You may have to go further than -300 to get them stable I have the one card that is down to 2600 which is -400 to get any resemblance of stability. Lowering the memory speed does not appear to affect the frame time so it is no loss to do it. I also found that I could OC the cards further when I reduced the memory speed, the 2 better cards will actually fold @ 1530Mhz stable with no Bad States. They could not fold any of the WU's at that before I reduced the memory speed 1480Mhz was the best they could do. Any way Stanford is aware of the problem and there are a few WU's that have not been released to Maxwells but have been released to Kepler, Fermiand and AMD GPU's. I know they are working on trying to get a walkaround for the Maxwell's and doing what they can unfortunately this is not a Stanford problem this time and it is a little more difficult to fix, Nvidia could help by enabling P2 memory speed adjustment in xserver for Linux but so far they have ignored the request.
|
notfordman
CLASSIFIED ULTRA Member
- Total Posts : 7057
- Reward points : 0
- Joined: 2007/08/09 23:52:23
- Location: my imaginary cubicle, makin copies!
- Status: offline
- Ribbons : 14

Re: 355.60 Last Stable Driver?
2015/11/12 13:07:29
(permalink)
Thank you Grandpa_01 for the input, and Bcav. It's definitely frustrating! :) @ Scott , I am glad your doing better. Hope it stays that way!! And congratulations on becoming a Grandpa, it's a wonderful thing. :)
|
Re: 355.60 Last Stable Driver?
2015/11/12 13:21:59
(permalink)
Grandpa_01 Yes it is a problem and I do believe it is a memory controller problem rather than a memory problem but that is just a guess. I believe EVGA tried to correct it in the 980 Classifieds with a memory change the early 2998 models had different memory than the latter 3998 model has and my early version 980 Classified is actually better than the 3998's so they may have gone the wrong way there. You may have to go further than -300 to get them stable I have the one card that is down to 2600 which is -400 to get any resemblance of stability. Lowering the memory speed does not appear to affect the frame time so it is no loss to do it.
Slightly off topic, but I thought the 2998 to 3998 was a VRM not VRAM change.
If you would like to use an affiliate code to register a product, feel free use this one: >Click Here< This is not my affiliate code! This is a random users code. This code will change often, and you will never be told who it belongs to! If you would like the possibility of your code being listed above, it must be in your signature block. I will not take requests to use a specific code.
|
z999z3mystorys
CLASSIFIED Member
- Total Posts : 3210
- Reward points : 0
- Joined: 2008/11/29 06:46:22
- Location: at my current location
- Status: offline
- Ribbons : 9

Re: 355.60 Last Stable Driver?
2015/11/12 13:22:27
(permalink)
How are you guys going about lowering the P2 state? I've under-clocked the memory of my cards to -501 ( the lowest precision will allow) and still get errors. As for grandpa's questions, it all seems to be about stable memory, Fermi and Kepler have memory at 6008mhz while Maxwell is at 7010mhz My best guess is that Nvidia knows that 7010mhz is too high and unstable for compute work and Down-clocks it to 6008 for it to work, something that that more stable memory on Fermi and Kepler don't need as they are running at a lower memory speed to begin with. As for the x21 cores, some seem to over-ride Nvidia's down-clocking as I've seen my GPUs reporting the memory running at 3505mhz (7010 effective) compared to the down-clocked and stable 3004 (6008) that the x18 are running at all in all it seems to be the stability of memory at that speed, and it not downclocking to 6008 memory speed like the x18 units do. (or not needing too in the case of Fermi/Kepler) also as mentioned earlier, the P9704 work does seem to down-clock to 6008 as I have one now, anyone with other X21 core units that can double check the memory speed that they are set too?
post edited by z999z3mystorys - 2015/11/12 13:33:03

|
Grandpa_01
New Member
- Total Posts : 92
- Reward points : 0
- Joined: 2012/04/28 20:59:00
- Status: offline
- Ribbons : 0
Re: 355.60 Last Stable Driver?
2015/11/12 13:32:06
(permalink)
|
z999z3mystorys
CLASSIFIED Member
- Total Posts : 3210
- Reward points : 0
- Joined: 2008/11/29 06:46:22
- Location: at my current location
- Status: offline
- Ribbons : 9

Re: 355.60 Last Stable Driver?
2015/11/12 13:58:32
(permalink)
thanks, I'll drop it way down to start and creep it back up a bit if needed, as it doesn't seem to affect the TPF or it's P0 state that I'd be running in for games, I may just leave it fairly low in fact.
|
Re: 355.60 Last Stable Driver?
2015/11/12 18:30:36
(permalink)
With PrecisionX 16 and the current Driver 358.91 I set my GPU Clock to Run @ +1500 MHz and the Memory to -300 MHz and for the last 16 hours I have not had a single BAD_WORK_UNIT or "Bad State detected... attempting to resume from last good checkpoint". So for now I have 6 GTX 980 Graphics Card Folding without issues, Knock On Wood.
|
Re: 355.60 Last Stable Driver?
2015/11/12 18:40:35
(permalink)
I Spook to Soon: ((((But they are Projects 96xx that are failing and not 72xx))))   I will change the above from -300 to -320 or Max it out. 01:29:47:WU00:FS01:0x21:Completed 780000 out of 2000000 steps (39%) 01:31:11:WU00:FS01:0x21:Completed 800000 out of 2000000 steps (40%) 01:31:18:WU00: FS01:0x21:Bad State detected... attempting to resume from last good checkpoint01:32:42:WU00:FS01:0x21:Completed 720000 out of 2000000 steps (36%) 01:34:06:WU00:FS01:0x21:Completed 740000 out of 2000000 steps (37%) 01:35:30:WU00:FS01:0x21:Completed 760000 out of 2000000 steps (38%) 01:36:53:WU00:FS01:0x21:Completed 780000 out of 2000000 steps (39%) 01:38:17:WU00:FS01:0x21:Completed 800000 out of 2000000 steps (40%) 01:38:25:WU00: FS01:0x21:Bad State detected... attempting to resume from last good checkpoint01:39:49:WU00:FS01:0x21:Completed 720000 out of 2000000 steps (36%) 01:41:12:WU00:FS01:0x21:Completed 740000 out of 2000000 steps (37%) 01:42:36:WU00:FS01:0x21:Completed 760000 out of 2000000 steps (38%) 01:44:00:WU00:FS01:0x21:Completed 780000 out of 2000000 steps (39%) 01:45:24:WU00:FS01:0x21:Completed 800000 out of 2000000 steps (40%) 01:45:31:WU00: FS01:0x21:Bad State detected... attempting to resume from last good checkpoint01:45:31:WU00:FS01:0x21:Max number of retries reached. Aborting. 01:45:31:WU00:FS01:0x21:ERROR:Max Retries Reached 01:45:31:WU00:FS01:0x21:Saving result file logfile_01.txt 01:45:31:WU00:FS01:0x21:Saving result file log.txt 01:45:31:WU00:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT 01:45:32:WARNING: WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72) 01:45:32:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9631 run:1 clone:2 gen:43 core:0x21 unit:0x00000041ab436c9b5609bee2518f85f1 Somehow I am stuck at -501 MHz
post edited by bcavnaugh - 2015/11/16 10:31:00
|
Grandpa_01
New Member
- Total Posts : 92
- Reward points : 0
- Joined: 2012/04/28 20:59:00
- Status: offline
- Ribbons : 0
Re: 355.60 Last Stable Driver?
2015/11/12 21:37:54
(permalink)
bcavnaugh I do not think Presison X will change the P2 state memory clock you need to Install Nvidia Inspector and look at the P2 state clocks I will bet it says 3004 which is 6008Mhz.
|
notfordman
CLASSIFIED ULTRA Member
- Total Posts : 7057
- Reward points : 0
- Joined: 2007/08/09 23:52:23
- Location: my imaginary cubicle, makin copies!
- Status: offline
- Ribbons : 14

Re: 355.60 Last Stable Driver?
2015/11/12 22:48:03
(permalink)
Grandpa_01 bcavnaugh I do not think Presison X will change the P2 state memory clock you need to Install Nvidia Inspector and look at the P2 state clocks I will bet it says 3004 which is 6008Mhz.
Is it ok to have Precision running ,while using inspector? Didn't know if they might conflict. Thx!
|
Grandpa_01
New Member
- Total Posts : 92
- Reward points : 0
- Joined: 2012/04/28 20:59:00
- Status: offline
- Ribbons : 0
Re: 355.60 Last Stable Driver?
2015/11/12 23:44:34
(permalink)
notfordman
Grandpa_01 bcavnaugh I do not think Presison X will change the P2 state memory clock you need to Install Nvidia Inspector and look at the P2 state clocks I will bet it says 3004 which is 6008Mhz.
Is it ok to have Precision running ,while using inspector? Didn't know if they might conflict. Thx!
I have used them in tandem before and it appeared to work ok, I did not notice any problems with the set up but I found it easier to just use Inspector so I no longer have Presision installed so I can not answer that for the latest drivers.
|
Re: 355.60 Last Stable Driver?
2015/11/13 07:29:56
(permalink)
Grandpa_01 bcavnaugh I do not think Presison X will change the P2 state memory clock you need to Install Nvidia Inspector and look at the P2 state clocks I will bet it says 3004 which is 6008Mhz.
Correct and I use both together as well with not issues.
|
z999z3mystorys
CLASSIFIED Member
- Total Posts : 3210
- Reward points : 0
- Joined: 2008/11/29 06:46:22
- Location: at my current location
- Status: offline
- Ribbons : 9

Re: 355.60 Last Stable Driver?
2015/11/15 19:53:59
(permalink)
I'm at -500mhz on the P2 memory (2505 mhz, or 5010mhz effective) and still getting errors, just not quite as often. I guess I could drop it lower, but that seems like something that shouldn't be required. Now, who to blame, Nvidia hardware, Nvidia drivers, or stanford software? To be honest, I don't know, but it seems like something I can't fix very well. Should I try dropping the memory even lower? at some point it'll be a limiting factor, just don't know what that'll be however...
|
Re: 355.60 Last Stable Driver?
2015/11/15 20:19:27
(permalink)
I don't know what happened.. I am on 355.xx (can't say which for sure, sorry.. I'm not close enough to look yet.)
I am on client-type advanced and just sky rocketed from 1.0-1.2m to 1.7m .
I am not using the country to fold, just 4 980's in one system.
If you would like to use an affiliate code to register a product, feel free use this one: >Click Here< This is not my affiliate code! This is a random users code. This code will change often, and you will never be told who it belongs to! If you would like the possibility of your code being listed above, it must be in your signature block. I will not take requests to use a specific code.
|
Re: 355.60 Last Stable Driver?
2015/11/16 07:52:44
(permalink)
Well another Failed Project Core 21 P96xx *P9643 02:46:28:WU01:FS00:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT 02:46:29:WARNING:WU01:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72) 02:46:29:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:9643 run:1 clone:37 gen:51 core:0x21 unit:0x00000049ab436c9b5609bee43fb900b6 At lest this time it failed at 35% Bad State detected... attempting to resume from last good checkpoint and again at 35% then Failed as shown above. I now Running: Let hope going down to 2646 MHz on the Memory will work.
post edited by bcavnaugh - 2015/11/16 07:54:42
Attached Image(s)
|
Re: 355.60 Last Stable Driver?
2015/11/16 15:40:02
(permalink)
Questions asked on the Folding Forum: I have GTX 980 Graphics Cards what is the Correct Driver Version we should be using? I also have some GTX 780 Graphics Cards but have not seen any issues on these cards but what is the correct driver version to use here as well? Last I have AMD 290X Graphics Cards but have not seen any issues on these cards but what is the correct driver version to use here as well? Posted here: https://foldingforum.org/viewtopic.php?f=61&t=28284&p=280874#p280874 1100 17 Nov 2015 MST Sill No Answer from Stanford.
post edited by bcavnaugh - 2015/11/17 10:02:56
|
cokeman54
SSC Member
- Total Posts : 934
- Reward points : 0
- Joined: 2002/04/02 14:18:49
- Location: Abilene, Texas
- Status: offline
- Ribbons : 3
Re: 355.60 Last Stable Driver?
2015/11/16 19:11:08
(permalink)
I also have a lot of failed core 21's on my 980. If and when they finish, they restarted so many times my PPD is very low. If this keeps up I will be crunching this winter and not folding.
|
Re: 355.60 Last Stable Driver?
2015/11/17 10:01:14
(permalink)
I am waiting for my Rigs to Complete current PrimeGrid Task and then I will Set two rigs to the Bios Defaults that is No Overclocking on the CPU or Memory. BTY if your computer can complete LLR Tasks for days on end then you have a supper stable Bios Setup. I will then Down Clock all 6 GTX 980 Cards to the lowest setting I can select in PrecisionX 16 and set the P2 in Nvidia Inspector down to 2000MHz. Yes this is all a waste of money and waste of electricity to only wait for failed projects from Core 21. But how else can we show that our hardware IS STABLE and that or GPUs are not BAD. Yes I am made at Stanford, who is not. It is said that it is NEVER Stanford problem. OK I am done Venting.
|
HK-Steve
FTW Member
- Total Posts : 1040
- Reward points : 0
- Joined: 2015/04/06 08:46:57
- Location: Switzerland
- Status: online
- Ribbons : 0
Re: 355.60 Last Stable Driver?
2015/11/17 10:32:40
(permalink)
I hear you, shame we all have wasted so much time and $$ with electricity for these failed WU's, But Fold on we must, The cure is out there and hopefully we all can do our bit to help....
|
Re: 355.60 Last Stable Driver?
2015/11/17 11:17:45
(permalink)
|