EVGA

Titan core17 woes

Author
xfinrodx
iCX Member
  • Total Posts : 415
  • Reward points : 0
  • Joined: 2006/08/07 11:23:46
  • Location: Washington
  • Status: offline
  • Ribbons : 1
2014/10/25 13:16:11 (permalink)
I've been chugging along with some pretty small points for a while, cruised past my 1m and 2m milestone with one small server running 50%.  I was just thinking it might be nice to boost production for a while, so I downloaded the new client and revved up the Titan.  It fails each work unit at 2-5% with some BAD_WORK_UNIT message.  Borderlands TPS has also been crashing on me for no good reason.
 
I can fold all day with my CPU without any troubles and I ran memtestx86 overnight on my system ram.  I ran MemtestG80 on 6GB for over 1000 iterations, ran ComputeMark while running MemtestG80, furmark, oc scanner.... no synthetic bench seems to expose a weakness.  I'm not overclocking or anything, using current drivers and fahcontrol 7.4.4 (shouldn't matter since it's core17 that is actually doing work).
 
Anyway, any ideas?  I had to reimage my server earlier this month and forgot to put fah back on it.  Now my October production is way low, would be good to burst it back to normal with the titan, then make a really good (for me) November :-).

The chiefest of human design flaws is sleep.
#1

10 Replies Related Threads

    bcavnaugh
    The Crunchinator
    • Total Posts : 38977
    • Reward points : 0
    • Joined: 2012/09/18 17:31:18
    • Location: USA Affiliate E5L3CTGE12 Associate 9E88QK5L7811G3H
    • Status: offline
    • Ribbons : 282
    Re: Titan core17 woes 2014/10/25 13:47:00 (permalink)
    What OS?
    Are you Overclocking your CPU?
    Are you Overclocking your Memory?
    What Motherboard?
    What Bios?
    What Video Driver?
    Do you only have one card installed or two cards if two cards is SLI Disabled?
    Have you deleted the files under C:\ProgramData\FAHClient\work
    and under C:\ProgramData\FAHClient\cores\web.stanford.edu\~pande\Win32\AMD64\NVIDIA Files and Folders.
    Is your client-type set to advanced?
     
    Never ran this program before.
     
    Test iteration 3 (GPU 0, 128 MiB): 536601 errors so far
            Moving Inversions (ones and zeros): 0 errors (47 ms)
            Memtest86 Walking 8-bit: 0 errors (312 ms)
            True Walking zeros (8-bit): 0 errors (172 ms)
            True Walking ones (8-bit): 0 errors (156 ms)
            Moving Inversions (random): 0 errors (47 ms)
            Memtest86 Walking zeros (32-bit): 0 errors (639 ms)
            Memtest86 Walking ones (32-bit): 0 errors (640 ms)
            Random blocks: 63840 errors (15 ms)
            Memtest86 Modulo-20: 0 errors (796 ms)
            Logic (one iteration): 0 errors (31 ms)
            Logic (4 iterations): 0 errors (31 ms)
            Logic (shared memory, one iteration): 0 errors (16 ms)
            Logic (shared-memory, 4 iterations): 0 errors (31 ms)
     
    post edited by bcavnaugh - 2014/10/25 14:14:35

    Associate Code: 9E88QK5L7811G3H


     
    #2
    xfinrodx
    iCX Member
    • Total Posts : 415
    • Reward points : 0
    • Joined: 2006/08/07 11:23:46
    • Location: Washington
    • Status: offline
    • Ribbons : 1
    Re: Titan core17 woes 2014/10/25 14:26:11 (permalink)
    OS: Windows 8.1 Pro x64
    CPU: 4770k at 4.17ghz
    RAM: 890mhz 11-13-13-35 (2xMushkin 992104 933mhz)
    MB: Asus Z87 Pro
    BIOS: 2005 (6/4/14)
    Video Driver: 344.48
    Single GPU.
    Will delete work files and update after attempting another WU.
    My slot has no config other than paused=true; I am not doing any advanced or bigadv folding on any machine.

    The chiefest of human design flaws is sleep.
    #3
    xfinrodx
    iCX Member
    • Total Posts : 415
    • Reward points : 0
    • Joined: 2006/08/07 11:23:46
    • Location: Washington
    • Status: offline
    • Ribbons : 1
    Re: Titan core17 woes 2014/10/25 14:31:17 (permalink)
    By the way, in reference to MemtestG80 you need a patched version with modern nvidia cards.  There's a synchronization bug; see:
    https://devtalk.nvidia.com/default/topic/545320/experiences-with-evga-gtx-titan-superclocked-memtestg80-underclocking-in-linux-/
     
    (grab the binary and try it out if you're not skeered of downloading an exe.  Didn't break my machine, ymmv)
    https://github.com/ihaque/memtestG80
     
     

    The chiefest of human design flaws is sleep.
    #4
    xfinrodx
    iCX Member
    • Total Posts : 415
    • Reward points : 0
    • Joined: 2006/08/07 11:23:46
    • Location: Washington
    • Status: offline
    • Ribbons : 1
    Re: Titan core17 woes 2014/10/25 14:47:16 (permalink)
    Well, downloading the core again and clearing the work files didn't help in this case.
     
     
    *********************** Log Started 2014-10-25T21:33:50Z ***********************
    21:33:50:************************* Folding@home Client *************************
    21:33:50:      Website: http://folding.stanford.edu/
    21:33:50:    Copyright: (c) 2009-2014 Stanford University
    21:33:50:       Author: Joseph Coffland < joseph@cauldrondevelopment.com>
    21:33:50:         Args: --open-web-control
    21:33:50:       Config: C:/Program Files (x86)/FAHClient/Data/config.xml
    21:33:50:******************************** Build ********************************
    21:33:50:      Version: 7.4.4
    21:33:50:         Date: Mar 4 2014
    21:33:50:         Time: 20:26:54
    21:33:50:      SVN Rev: 4130
    21:33:50:       Branch: fah/trunk/client
    21:33:50:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
    21:33:50:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
    21:33:50:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
    21:33:50:     Platform: win32 XP
    21:33:50:         Bits: 32
    21:33:50:         Mode: Release
    21:33:50:******************************* System ********************************
    21:33:50:          CPU: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
    21:33:50:       CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
    21:33:50:         CPUs: 8
    21:33:50:       Memory: 15.94GiB
    21:33:50:  Free Memory: 13.48GiB
    21:33:50:      Threads: WINDOWS_THREADS
    21:33:50:   OS Version: 6.2
    21:33:50:  Has Battery: false
    21:33:50:   On Battery: false
    21:33:50:   UTC Offset: -7
    21:33:50:          PID: 3400
    21:33:50:          CWD: C:/Program Files (x86)/FAHClient/Data
    21:33:50:           OS: Windows 8.1 Pro
    21:33:50:      OS Arch: AMD64
    21:33:50:         GPUs: 1
    21:33:50:        GPU 0: NVIDIA:3 GK110 [GeForce GTX Titan]
    21:33:50:         CUDA: 3.5
    21:33:50:  CUDA Driver: 6050
    21:33:50:Win32 Service: false
    21:33:50:***********************************************************************
    21:33:50:<config>
    21:33:50:  <!-- Folding Core -->
    21:33:50:  <core-priority v='low'/>
    21:33:50:
    21:33:50:  <!-- Network -->
    21:33:50:  <proxy v=':8080'/>
    21:33:50:
    21:33:50:  <!-- Slot Control -->
    21:33:50:  <power v='full'/>
    21:33:50:
    21:33:50:  <!-- User Information -->
    21:33:50:  <passkey v='********************************'/>
    21:33:50:  <team v='111065'/>
    21:33:50:  <user v='xfinrodx'/>
    21:33:50:
    21:33:50:  <!-- Folding Slots -->
    21:33:50:  <slot id='0' type='GPU'>
    21:33:50:    <gpu-index v='0'/>
    21:33:50:    <paused v='true'/>
    21:33:50:  </slot>
    21:33:50:</config>
    21:33:50:Trying to access database...
    21:33:50:Successfully acquired database lock
    21:33:50:Enabled folding slot 00: PAUSED gpu:0:GK110 [GeForce GTX Titan] (by user)
    21:33:50:Set client configured
    21:33:54:14:127.0.0.1:New Web connection
    21:34:09:FS00:Unpaused
    21:34:09:WU00:FS00:Connecting to 171.67.108.200:80
    21:34:10:WU00:FS00:Connecting to 171.67.108.200:80
    21:34:10:WU00:FS00:Assigned to work server 140.163.4.233
    21:34:10:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:GK110 [GeForce GTX Titan] from 140.163.4.233
    21:34:10:WU00:FS00:Connecting to 140.163.4.233:8080
    21:34:10:WU00:FS00:Downloading 4.27MiB
    21:34:12:WU00:FS00:Download complete
    21:34:12:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:10467 run:0 clone:400 gen:48 core:0x17 unit:0x0000005a538b3db9538bc3a2faccb460
    21:34:12:WU00:FS00:Downloading core from http://web.stanford.edu/~...IDIA/Fermi/Core_17.fah
    21:34:12:WU00:FS00:Connecting to web.stanford.edu:80
    21:34:12:WU00:FS00:FahCore 17: Downloading 2.55MiB
    21:34:13:WU00:FS00:FahCore 17: Download complete
    21:34:13:WU00:FS00:Valid core signature
    21:34:13:WU00:FS00:Unpacked 8.60MiB to cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe
    21:34:14:WU00:FS00:Starting
    21:34:14:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Program Files (x86)/FAHClient/Data/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 3400 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
    21:34:14:WU00:FS00:Started FahCore on PID 1908
    21:34:14:WU00:FS00:Core PID:5872
    21:34:14:WU00:FS00:FahCore 0x17 started
    21:34:14:WU00:FS00:0x17:*********************** Log Started 2014-10-25T21:34:14Z ***********************
    21:34:14:WU00:FS00:0x17:Project: 10467 (Run 0, Clone 400, Gen 48)
    21:34:14:WU00:FS00:0x17:Unit: 0x0000005a538b3db9538bc3a2faccb460
    21:34:14:WU00:FS00:0x17:CPU: 0x00000000000000000000000000000000
    21:34:14:WU00:FS00:0x17:Machine: 0
    21:34:14:WU00:FS00:0x17:Reading tar file state.xml
    21:34:15:WU00:FS00:0x17:Reading tar file system.xml
    21:34:15:WU00:FS00:0x17:Reading tar file integrator.xml
    21:34:15:WU00:FS00:0x17:Reading tar file core.xml
    21:34:15:WU00:FS00:0x17:Digital signatures verified
    21:34:15:WU00:FS00:0x17:Folding@home GPU core17
    21:34:15:WU00:FS00:0x17:Version 0.0.52
    21:34:16:FS00:Finishing
    21:34:51:Saving configuration to config.xml
    21:34:51:<config>
    21:34:51:  <!-- Folding Core -->
    21:34:51:  <core-priority v='low'/>
    21:34:51:
    21:34:51:  <!-- Network -->
    21:34:51:  <proxy v=':8080'/>
    21:34:51:
    21:34:51:  <!-- Slot Control -->
    21:34:51:  <power v='full'/>
    21:34:51:
    21:34:51:  <!-- User Information -->
    21:34:51:  <passkey v='********************************'/>
    21:34:51:  <team v='111065'/>
    21:34:51:  <user v='xfinrodx'/>
    21:34:51:
    21:34:51:  <!-- Folding Slots -->
    21:34:51:  <slot id='0' type='GPU'>
    21:34:51:    <gpu-index v='0'/>
    21:34:51:  </slot>
    21:34:51:</config>
    21:35:44:WU00:FS00:0x17:Completed 0 out of 5000000 steps (0%)
    21:35:44:WU00:FS00:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
    21:36:06:WU00:FS00:0x17:ERROR:exception: Error invoking kernel execFFT: clEnqueueNDRangeKernel (-5)
    21:36:06:WU00:FS00:0x17:Saving result file logfile_01.txt
    21:36:06:WU00:FS00:0x17:Saving result file log.txt
    21:36:06:WU00:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
    21:36:07:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
    21:36:07:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:10467 run:0 clone:400 gen:48 core:0x17 unit:0x0000005a538b3db9538bc3a2faccb460
    21:36:07:WU00:FS00:Uploading 2.45KiB to 140.163.4.233
    21:36:07:WU00:FS00:Connecting to 140.163.4.233:8080
    21:36:07:WU00:FS00:Upload complete
    21:36:07:WU00:FS00:Server responded WORK_ACK (400)
    21:36:07:WU00:FS00:Cleaning up

    The chiefest of human design flaws is sleep.
    #5
    bison88
    Superclocked Member
    • Total Posts : 136
    • Reward points : 0
    • Joined: 2013/10/28 17:44:21
    • Status: offline
    • Ribbons : 0
    Re: Titan core17 woes 2014/10/25 15:27:07 (permalink)
    Usually most Kepler based errors are due to a faulty WU, driver issue, or OC that I've seen.  It seems like the issue only cropped up when you upgraded the client?  Might try doing a full uninstall, verifying all AppData is removed, restart, re-download, and reinstall if you haven't already.
     
    After that I would probably try downclocking the GPU, then CPU, or trying different drivers in that order.
    post edited by bison88 - 2014/10/25 15:29:40
    #6
    xfinrodx
    iCX Member
    • Total Posts : 415
    • Reward points : 0
    • Joined: 2006/08/07 11:23:46
    • Location: Washington
    • Status: offline
    • Ribbons : 1
    Re: Titan core17 woes 2014/10/25 16:05:02 (permalink)
    Only thing more aggressive than what I've done as far as wiping is concerned is reformatting the drives and going from scratch.  I'll try downclocking the gpu next.

    The chiefest of human design flaws is sleep.
    #7
    xfinrodx
    iCX Member
    • Total Posts : 415
    • Reward points : 0
    • Joined: 2006/08/07 11:23:46
    • Location: Washington
    • Status: offline
    • Ribbons : 1
    Re: Titan core17 woes 2014/10/25 16:31:57 (permalink)
    So I arbitrarily dropped my GPU clock offset slider by 100mhz.  Results?  Well it's still folding the first WU but:
    1. It's folding.
    2. Time per frame went from 8min at factory speed to 3.5min at new slow speed.  (first time an underclock improved my performance so dramatically)
     
    The thing was never even getting hot.  :-/  Oh well.  Thanks for your time and suggestions.  I'll post back if I get stuck.

    The chiefest of human design flaws is sleep.
    #8
    robbysites
    FTW Member
    • Total Posts : 1978
    • Reward points : 0
    • Joined: 2009/01/12 10:29:41
    • Location: 51°10?43.84?N 1°49?34.28?W
    • Status: offline
    • Ribbons : 2
    Re: Titan core17 woes 2014/10/25 18:30:14 (permalink)
    xfinrodx,
     
    You definitely got the right person to troubleshoot your problem. BC obviously knows what to look for. Glad you got your rig folding.
    I got a little to OC happy and just kept pushing the card and got the same error - bad work unit. Anyway I dialed it back and my GPU clock would NOT go over 750. I thought I blew my card or fragged my PSU. I set it to default turned it off, let it sit for a while and now it is buzzing along at 1188. I also just started crunching on the same machine - so I thought that might part of the problem but all is well as of now.
    Fold on!


          MY AFFILIATE CODE-000H94333W


     
    #9
    kougar
    CLASSIFIED Member
    • Total Posts : 3034
    • Reward points : 0
    • Joined: 2006/05/08 10:11:19
    • Status: offline
    • Ribbons : 22
    Re: Titan core17 woes 2014/10/27 04:33:52 (permalink)
    So Folding@home GPU and the new Borderlands game both crash on your GPU. And underclocking the GPU fixes it... that fairly definitive that the GPU isn't stable at the stock frequency. FAH and Borderlands each are particularly good at finding GPU instability in their own right. I'd suggest looking into RMA'ing the GPU. 
     
    EVGA's OC X Scanner leaves something to be desired. I can set a KNOWN unstable overclock on my Titan Black and the Scanner won't find artifacts on any of the myriad of tests it offers. Why? Because the majority of those tests underclock the GPU so you're not actually testing the OC. That defeats the entire purpose of an OC scanner.
     
    The only tests that did not underclock the GPU either weren't sensitive enough or weren't utilizing all of the GPU as they couldn't find anything after 20 minute runs. Perhaps the solution would be to run the OC Scanner overnight, but in any case the OC Scanner isn't going to give you an easy definitive result. In my case I didn't even know said OC was unstable until I got a bad WU error, F@H tends to be a great stability checker.  But if you're sure nothing is OCing your card and only underclocking fixes it, then I'd suggest an RMA.
     
     


    Have water, will cool. 
    #10
    Ranmacanada
    SSC Member
    • Total Posts : 992
    • Reward points : 0
    • Joined: 2011/09/22 10:44:47
    • Status: offline
    • Ribbons : 3
    Re: Titan core17 woes 2014/10/27 10:55:18 (permalink)
    Totally agree with Kougar.  If you have to underclock your card to get it to perform properly, then something is seriously wrong with it.  

     

    ASUS TUF GAMING X570-PLUS (WI-FI)
    AMD Ryzen 2700
    Fold for the CURE!
    EVGA 1080 FTW
    EVGA 1080Ti Hybrid

    #11
    Jump to:
  • Back to Mobile