EVGA

"How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti

Page: < 123 Showing page 3 of 3
Author
Omoeba
Superclocked Member
  • Total Posts : 134
  • Reward points : 0
  • Joined: 2020/08/19 15:41:31
  • Status: offline
  • Ribbons : 0
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/04 23:30:13 (permalink)
vulcan1978
Omoeba
vulcan1978
To anyone listening to this who intends to overclock their 3000 series card. Power delivery is going to be of utmost importance and I'm not sure single 12 pin can convey the 450W really needed for the 3090 to maintain higher clocks, especially under water with the thermal limit removed. 
 
Does anyone have any idea how much single 12 pin power cable is good for? 
 
I'm genuinely curious, 2x8pin is rated for 375w and 12 pin is basically 2x8pin because two of the pins in the 8 pin connector don't do anything. 
 
Hearing that only the FTW3 has triple 8 pin power has me leaning towards FTW3 but that's only because I'm under water. Not sure if the FTW3 air cooler can deal with 450w or so. 
 


A single 12 pin is specced for 600 watts. Although a single 8 pin is only specced for 150 watts, it can deliver up to 300 watts.





Wow, this changes a lot actually as one of the primary reasons I was considering going with EVGA was a perceived identical limit between 12 pin and 2x8 pin (375w). Now I'm hearing that only the FTW3 and up will have 3x8pin and the FTW3 is going to be at least $150 more than Founders Edition. What we learned with reference 2080 Ti FE was that it's power delivery, it's VRM, was way over-engineered (per Buildzoid's analysis) and that it was actually engineered to withstand 600W. For example, I'm running the FTW3 vbios on my FE PCB XC2 at 373w with great temps and no issues. The problem is the temperature, and the VRM was designed to withstand high operating temps due to the limitations of air cooling (something they've attempted to address direction the flow of the heat out of the case, see my point above, whereas with previous design the fans would just pump the heat off the heatsink locally and in all directions where it could and would get trapped under the GPU, especially over a hot PSU. 
 
So now I have to consider that the 12 pin of the FE card removes a potential wattage bottleneck that can only be removed if I were to opt for the FTW3 model, priced considerably higher ($1650-1700). I will be putting which ever I go with under a water block, and currently I have to determine what will have faster water block availability, the FTW3 or FE 3090 and I'm certain that the FE model will outsell all other models because of it's unique, beautiful design. 
 
XC3 is almost out because of this, I still need to figure out how much power can be safely conveyed over 2x8 pin. 
 
On and my original math is off because I just learned that 2080 Ti FE at factory clocks doesn't settle at 1860 MHz, it actually boosts up to 1890 MHz for like 5 minutes before settling down to 1750 MHz because of thermal constraint. 
 
https://youtu.be/PpDG13PrNPg?t=683
 
If I redo that again we have: 
 
13,600 Timespy GPU to 17,800 is a 31% increase. 
 
https://youtu.be/PpDG13PrNPg?t=683
 
 


The FTW3 will also likely have a better VRM than the FE.

AMD Ryzen 7 3800x
EVGA RTX 3080 FTW3 Ultra
Gigabyte X570 Aorus Master
G.Skill Ripjaws V DDR4-3600 CL16 2x16GB
Inland Performance 2TB SSD
EVGA Supernova 850 G+ PSU
Fractal Meshify C Case

#61
CraptacularOne
Omnipotent Enthusiast
  • Total Posts : 14533
  • Reward points : 0
  • Joined: 2006/06/12 17:20:44
  • Location: Florida
  • Status: offline
  • Ribbons : 222
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/04 23:48:48 (permalink)
GTXJackBauer
Sajin
The reason the cuda cores are so high is because the ampere cores can do 2 operations per clock. 10496/2 = 5248




I'm wonder with that kind of advertising, that might create some legal trouble.  So basically they're doing what Intel does but instead, they're advertising them as all physical cores, no?  Why not do something similar that intel does and say 5248 Physical Cores and 10496 Virtual Cores?  I'm kind of disappointed they took this route of advertising making many consumers think it's packed with that many physical cores.  At least they had fooled me at first.


Well it's not "technically" incorrect since they aren't just calling a "hyper threading" virtual core a core per se. What Nvidia has done with Ampere is allow the shader core to execute INT32 and FP32 instructions in parallel with independent thread scheduling instead of having the sharder only be able to execute one or the other at a time. There are 2 separate physical compute units per shader core and now that the shader cores have independent thread scheduling they can execute instructions in parallel. That's how they doubled their FP32 compute performance. Previously as seen in Turing these same compute units are present but they lack independent thread scheduling so can only ever be doing one or the other, not both at the same time per shader core as they can with Ampere. 
 
This is quite a bit different than the way hyper threading or simultaneous multiprocessing work in CPUs. The way CPUs process threads in parallel isn't "really" simultaneous since the CPU core only works on each instruction set during waiting cycles in each thread being worked on. So for instance a CPU that has 2 tasks to work on it will only work on one at a time and while waiting for memory calls or instructions to execute after issuing a command to one task it will work on the other one. This is also why 8 physical cores is always faster than 4 physical cores processing 8 threads in a multithreaded workload. 
post edited by CraptacularOne - 2020/09/04 23:52:47

Intel i9 14900K ...............................Ryzen 9 7950X3D
MSI RTX 4090 Gaming Trio................ASRock Phantom RX 7900 XTX
Samsung Odyssey G9.......................PiMax 5K Super/Meta Quest 3
ASUS ROG Strix Z690-F Gaming........ASUS TUF Gaming X670E Plus WiFi
64GB G.Skill Trident Z5 6800Mhz.......64GB Kingston Fury RGB 6000Mhz
MSI MPG A1000G 1000w..................EVGA G3 SuperNova 1000w
#62
vegajf51
SSC Member
  • Total Posts : 561
  • Reward points : 0
  • Joined: 2018/01/07 12:53:12
  • Status: offline
  • Ribbons : 1
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/04 23:51:13 (permalink)
GTXJackBauer
Sajin
The reason the cuda cores are so high is because the ampere cores can do 2 operations per clock. 10496/2 = 5248




I'm wonder with that kind of advertising, that might create some legal trouble.  So basically they're doing what Intel does but instead, they're advertising them as all physical cores, no?  Why not do something similar that intel does and say 5248 Physical Cores and 10496 Virtual Cores?  I'm kind of disappointed they took this route of advertising making many consumers think it's packed with that many physical cores.  At least they had fooled me at first.


You bring up a good point, AMD got sued for their misleading core counts on the FX series so it's always possible. Nvidia is the master of marketing though! !
post edited by vegajf51 - 2020/09/05 01:11:30
#63
Sajin
EVGA Forum Moderator
  • Total Posts : 49168
  • Reward points : 0
  • Joined: 2010/06/07 21:11:51
  • Location: Texas, USA.
  • Status: offline
  • Ribbons : 199
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 00:24:41 (permalink)
After further research it appears that the card does actually have 10496 cuda cores (fp32 cores), but the int32 cores don't match that of the fp32 cores. That is why the card is faster overall, but isn't double the speed of 2080 ti.
#64
vulcan1978
iCX Member
  • Total Posts : 284
  • Reward points : 0
  • Joined: 2014/05/25 02:18:19
  • Status: offline
  • Ribbons : 0
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 00:33:55 (permalink)
yaggaz
vulcan1978
 
https://youtu.be/KoiFJc1bw1w
 

 
"not enough shooting and too much looting"
 
OMG I feel this way away about every modern game lately.  You spend half the game bending down grabbing stuff instead of playing the rest of the game.  And then the economy makes you rich and there's nothing to buy with the trillions of money you have which makes all the looting pointless nonsense anyway, but because I'm OCD I just MUST grab it all lol
 
/end derail of current subject




Dude tell me about it, I'm a completionist at heart and am having this problem in so many titles. I'm currently going through Middle Earth: Shadow of War and I was sending orcs death threats until I wiped out the entire cadre of them in Corith Ungol and had completed every side mission, found every collectible etc and I'm at like level 25 before even doing the first siege battle in Nurnen. Like I'm already at the top of the perk list, there's hardly anything left to upgrade in the skill tree and I'm only 30% through the game. I have a really bad habit of doing this and then getting bored with the game and not finishing it because I'm 42 and basically every narrative is laughable at this point. 

8700k @ 5.1 GHz - 0 AVX @ 1.386v Dynamic Offset w/ EK Monoblock + Delid | Gigabyte Z370 Aorus Gaming 7 | EVGA 2080 Ti XC2 Ultra @ 2130 Mhz core, 7950 MHz memory @ 1.063v w/ 375W FTW3 vbios + Phanteks Glacier Block  | EK CE 420 + EK XE 360 | 2x16GB G-Skill Trident Z Royal 3600 MHz 17-20-20-38 | 2 TB Sabrent Rocket | Corsair RM1000x | Thermaltake View 71 | Alienware AW3418DW + Asus ROG Swift PG278Q (for 3D Vision) on Amazon Basics Arms | Win10 Pro 1809
 
philosophersbunker.blogspot.com
#65
vulcan1978
iCX Member
  • Total Posts : 284
  • Reward points : 0
  • Joined: 2014/05/25 02:18:19
  • Status: offline
  • Ribbons : 0
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 00:36:01 (permalink)
Sajin
After further research it appears that the card does actually have 10496 cuda cores (fp32 cores), but the int32 cores don't match that of the fp32 cores. That is why the card is faster overall, but isn't double the speed of 2080 ti.




Ok I'm still learning here, are the CUDA cores indicative of total core count, including RT cores? Because 3000 series isn't simply 50% faster tier for tier vs their predecessor, they do RT 1.9x faster to top it off. 
 
Edit: 
 
Getting ahead of myself, we still don't know how Ampere overclocks, it may only do another 20% making it only 40% faster. But even only another 10% overclock and Ampere is still 30% faster tier for tier. 
post edited by vulcan1978 - 2020/09/05 00:39:23

8700k @ 5.1 GHz - 0 AVX @ 1.386v Dynamic Offset w/ EK Monoblock + Delid | Gigabyte Z370 Aorus Gaming 7 | EVGA 2080 Ti XC2 Ultra @ 2130 Mhz core, 7950 MHz memory @ 1.063v w/ 375W FTW3 vbios + Phanteks Glacier Block  | EK CE 420 + EK XE 360 | 2x16GB G-Skill Trident Z Royal 3600 MHz 17-20-20-38 | 2 TB Sabrent Rocket | Corsair RM1000x | Thermaltake View 71 | Alienware AW3418DW + Asus ROG Swift PG278Q (for 3D Vision) on Amazon Basics Arms | Win10 Pro 1809
 
philosophersbunker.blogspot.com
#66
Sajin
EVGA Forum Moderator
  • Total Posts : 49168
  • Reward points : 0
  • Joined: 2010/06/07 21:11:51
  • Location: Texas, USA.
  • Status: offline
  • Ribbons : 199
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 00:37:31 (permalink)
vulcan1978
Sajin
After further research it appears that the card does actually have 10496 cuda cores (fp32 cores), but the int32 cores don't match that of the fp32 cores. That is why the card is faster overall, but isn't double the speed of 2080 ti.




Ok I'm still learning here, are the CUDA cores indicative of total core count, including RT cores? Because 3000 series isn't simply 50% faster tier for tier vs their predecessor, they do RT 1.9x faster to top it off. 


No.
#67
vulcan1978
iCX Member
  • Total Posts : 284
  • Reward points : 0
  • Joined: 2014/05/25 02:18:19
  • Status: offline
  • Ribbons : 0
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 00:41:58 (permalink)
Sajin
vulcan1978
Sajin
After further research it appears that the card does actually have 10496 cuda cores (fp32 cores), but the int32 cores don't match that of the fp32 cores. That is why the card is faster overall, but isn't double the speed of 2080 ti.




Ok I'm still learning here, are the CUDA cores indicative of total core count, including RT cores? Because 3000 series isn't simply 50% faster tier for tier vs their predecessor, they do RT 1.9x faster to top it off. 


No.




Wow, so what exactly is the difference in cores between Turing and Ampere? I need to watch GN's Ampere break-down. 

8700k @ 5.1 GHz - 0 AVX @ 1.386v Dynamic Offset w/ EK Monoblock + Delid | Gigabyte Z370 Aorus Gaming 7 | EVGA 2080 Ti XC2 Ultra @ 2130 Mhz core, 7950 MHz memory @ 1.063v w/ 375W FTW3 vbios + Phanteks Glacier Block  | EK CE 420 + EK XE 360 | 2x16GB G-Skill Trident Z Royal 3600 MHz 17-20-20-38 | 2 TB Sabrent Rocket | Corsair RM1000x | Thermaltake View 71 | Alienware AW3418DW + Asus ROG Swift PG278Q (for 3D Vision) on Amazon Basics Arms | Win10 Pro 1809
 
philosophersbunker.blogspot.com
#68
yaggaz
FTW Member
  • Total Posts : 1509
  • Reward points : 0
  • Joined: 2007/04/12 19:10:22
  • Status: offline
  • Ribbons : 1
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 01:07:48 (permalink)
vulcan1978
 
 
Dude tell me about it, I'm a completionist at heart and am having this problem in so many titles. I'm currently going through Middle Earth: Shadow of War and I was sending orcs death threats until I wiped out the entire cadre of them in Corith Ungol and had completed every side mission, found every collectible etc and I'm at like level 25 before even doing the first siege battle in Nurnen. Like I'm already at the top of the perk list, there's hardly anything left to upgrade in the skill tree and I'm only 30% through the game. I have a really bad habit of doing this and then getting bored with the game and not finishing it because I'm 42 and basically every narrative is laughable at this point. 




I think a lot of modern devs are scared to make the XP gains and economy challenging because they feel they'll chase of the insta-grat types of player.     I waited for six months after Skyrim was released before I played it, based on Oblivion I KNEW how easy it was in Bethsoft games to max level, become a trillionaire and have no challenge.    I created my own "Expensive Training" mod where to buy skillups cost 10x the normal amount.   Also added about 10 other mods that made the merchants rip you off and charge you a fortune, as well as ones that made rare things TRUELY rare and hard to find, and really slowed down skillups.   Then I actually enjoyed it when I played it.
 
Most favourite games of the last 10 years are ones where you can mod the difficulty levels.
 

||  CPU: Intel 10700k   ||  GPU:  evga 3080 XC3 Ultra Hybrid ||  MB: Gigabyte z490 UD AC  || RAM: 2 x 16GB 3000mhz DDR4 SDRAM  || Samsung EVO 970 Plus 2TB   ||    Dell S2417DG Monitor    ||  Soundblaster AE-7  ||  Phanteks p400a Case  ||   be Quiet! Dark Rock Slim CPU Cooler  ||  Corsair AX1600i PSU  ||  9 Fans total in system ||
#69
Hoggle
EVGA Forum Moderator
  • Total Posts : 10102
  • Reward points : 0
  • Joined: 2003/10/13 22:10:45
  • Location: Eugene, OR
  • Status: offline
  • Ribbons : 4
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 02:49:34 (permalink)
The original video is clearly adding in the frame rate counter. You hear 65% improvement to 85% improvement in the video but the frame counter that is added shows it as being only a 15% improvement. The original video shows only a 2080Ti frame rate which seems to be locked at 60FPS for making a smoother video for Youtube to upload. Even based on just 60FPS for Shadow of the Tomb Raider the 65% improvement would take you to 99FPS which is a nice improvement 85% improvement which is the max we saw would be going from 60FPS to 111FPS. Below I posted the original video that hasn't been edited so people can see the original post is clickbait.
 
https://www.youtube.com/watch?v=cWD01yUQdVA&feature=emb_logo

Use an Associates Code & SAVE 5% - 10% on your purchase. Just click on the associates banner to save, or enter the associates code at checkout on your next purchase. If you choose to use my code I want to personally say "Thank You" for using it. 
 
 
#70
IMWork87
New Member
  • Total Posts : 33
  • Reward points : 0
  • Joined: 2009/12/30 06:51:38
  • Status: offline
  • Ribbons : 0
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 05:44:12 (permalink)
Turing: 1 SM -> 64 FP32-ALUs and 64 INT32-ALUs
Ampere: 1 SM -> same as Turing but additional 64 ALUs that can also calculate FPs and INTs but not simultaneously.
 
While Turing is able to handle 64 FP32 and 64 INT32 calculations at the same time, Ampere can either handle 128 FP32 or 64FP32 and 64INT32 calculations at the same time, depending on the workload.
 
So if a game engine uses both, FP and INT, Ampere has the same performance as Turing (worst case). If an engine uses primarly FP, though, Ampere has twice the compute power compared to Turing.
That's the reason why Nvidia is talking about twice the CUDA-Cores.
 
I'd assume that the games used in the Digital Foundry Video primarly used FP calculations to show Ampere's benefits. Don't expect to see a performance gain like this in all your games.
#71
vulcan1978
iCX Member
  • Total Posts : 284
  • Reward points : 0
  • Joined: 2014/05/25 02:18:19
  • Status: offline
  • Ribbons : 0
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 09:19:58 (permalink)
IMWork87
Turing: 1 SM -> 64 FP32-ALUs and 64 INT32-ALUs
Ampere: 1 SM -> same as Turing but additional 64 ALUs that can also calculate FPs and INTs but not simultaneously.
 
While Turing is able to handle 64 FP32 and 64 INT32 calculations at the same time, Ampere can either handle 128 FP32 or 64FP32 and 64INT32 calculations at the same time, depending on the workload.
 
So if a game engine uses both, FP and INT, Ampere has the same performance as Turing (worst case). If an engine uses primarly FP, though, Ampere has twice the compute power compared to Turing.
That's the reason why Nvidia is talking about twice the CUDA-Cores.
 
I'd assume that the games used in the Digital Foundry Video primarly used FP calculations to show Ampere's benefits. Don't expect to see a performance gain like this in all your games.




Fantastic reply, thanks for adding to the discussion! I did some research and have mostly concluded the same thing, I also found a thread that really breaks things down and shows that 1 Ampere TFLOP is actually .7 TFLOP compared to a Turing TFLOP. But reading the technical aspects of this, apparently only 26% of an average game instruction stream is INT32, meaning, 74% is actually FP32 and would benefit from the new doubling of FP32 performance. 
 
Ampere also has doubled the shared memory and L1 cache performance per SM:
 
"Doubling math throughput required doubling the data paths supporting it, which is why the Ampere SM also doubled the shared memory and L1 cache performance for the SM. (128 bytes/clock per Ampere SM versus 64 bytes/clock in Turing). Total L1 bandwidth for GeForce RTX 3080 is 219 GB/sec versus 116 GB/sec for GeForce RTX 2080 Super."
 
"
    • Could you elaborate a little on these doubling of CUDA cores?

    [Tony Tamasi] One of the key design goals for the Ampere 30-series SM was to achieve twice the throughput for FP32 operations compared to the Turing SM. To accomplish this goal, the Ampere SM includes new datapath designs for FP32 and INT32 operations. One datapath in each partition consists of 16 FP32 CUDA Cores capable of executing 16 FP32 operations per clock. Another datapath consists of both 16 FP32 CUDA Cores and 16 INT32 Cores. As a result of this new design, each Ampere SM partition is capable of executing either 32 FP32 operations per clock, or 16 FP32 and 16 INT32 operations per clock. All four SM partitions combined can execute 128 FP32 operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32 operations per clock.
    Doubling the processing speed for FP32 improves performance for a number of common graphics and compute operations and algorithms. Modern shader workloads typically have a mixture of FP32 arithmetic instructions such as FFMA, floating point additions (FADD), or floating point multiplications (FMUL), combined with simpler instructions such as integer adds for addressing and fetching data, floating point compare, or min/max for processing results, etc. Performance gains will vary at the shader and application level depending on the mix of instructions. Ray tracing denoising shaders are good examples that might benefit greatly from doubling FP32 throughput.
    Doubling math throughput required doubling the data paths supporting it, which is why the Ampere SM also doubled the shared memory and L1 cache performance for the SM. (128 bytes/clock per Ampere SM versus 64 bytes/clock in Turing). Total L1 bandwidth for GeForce RTX 3080 is 219 GB/sec versus 116 GB/sec for GeForce RTX 2080 Super.
    Like prior NVIDIA GPUs, Ampere is composed of Graphics Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), Raster Operators (ROPS), and memory controllers.
    The GPC is the dominant high-level hardware block with all of the key graphics processing units residing inside the GPC. Each GPC includes a dedicated Raster Engine, and now also includes two ROP partitions (each partition containing eight ROP units), which is a new feature for NVIDIA Ampere Architecture GA10x GPUs. More details on the NVIDIA Ampere architecture can be found in NVIDIA’s Ampere Architecture White Paper, which will be published in the coming days.

https://wccftech.com/nvidia-details-geforce-rtx-30-series-graphics-cards-reddit/
 
https://www.nvidia.com/en-us/geforce/news/rtx-30-series-community-qa/
 
Nvidia Ampere teraflops and how you cannot compare them to Turing
 
https://neogaf.com/threads/nvidia-ampere-teraflops-and-how-you-cannot-compare-them-to-turing.1564257/
 
"
  • #1TL;DR 1 Ampere TF = 0.72 Turing TF, or 30TF (Ampere) = 21.6TF (Turing)

    Reddit Q&A

    To accomplish this goal, the Ampere SM includes new datapath designs for FP32 and INT32 operations. One datapath in each partition consists of 16 FP32 CUDA Cores capable of executing 16 FP32 operations per clock. Another datapath consists of both 16 FP32 CUDA Cores and 16 INT32 Cores. As a result of this new design, each Ampere SM partition is capable of executing either 32 FP32 operations per clock, or 16 FP32 and 16 INT32 operations per clock. All four SM partitions combined can execute 128 FP32 operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32 operations per clock.


    A reminder from the Turing whitepaper:
    First, the Turing SM adds a new independent integer datapath that can execute instructions concurrently with the floating-point math datapath. In previous generations, executing these instructions would have blocked floating-point instructions from issuing.


    So, Turing GPU can execute 64INT32 + 64FP32 ops per clock per SM.
    Ampere GPU can either execute 64INT32 + 64FP32 or 128FP32 ops per clock per SM.

    Which means if a game executes 0 (zero) INT32 instructions then Ampere = 2xTuring
    And if game executes 50/50 INT32 and FP32 then Ampere = Turing exactly.

    So how many INT32 are there on average?
    According to Nvidia:

    we typically see about 36 additional integer pipe instructions for every 100 floating point instructions


    Some math: 36 / (100+36) = 26%, i.e. in an average game instruction stream 26% are INT32

    So we can now calculate what will happen to both Ampere and Turing when 26% INT32 + 74% FP32 instruction streams are used.
    I have written a simple software to do that. But you can calculate an analytical upper bound easily: 74%/50% = 1.48 or +48%
    My software shows a slightly smaller number +44% (and that's because of the edge cases where you cannot distribute the last INT32 ops in a batch equally, as only one pipeline can issue INT32 per each block of 16 cores)
    So the theoretical absolute max is +48%, in practice the absolute achievable max is +44%

    Thus each 2TF of Ampere have only 1.44TF of Turing performance.

    Let's check the actual data Nvidia gave us:
    3080 = 30TF (ampere) = 21.6TF (turing) = 2.14x 2080 (10.07TF turing)
    Nvidia is even more conservative than that and gives us: 3080 = 2x2080
    3070 = 20.4TF (ampere) = 14.7TF (turing) = 1.86x 2070 (7.88TF turing)
    Nvidia is massively more conservative here giving us: 3070 = 1.6x2070
    Actually if we average the two max numbers that Nvidia gives us (they explicitly say "up to") we get to even lower theoretical max of 1 Ampere TF = 0.65 Turing TF
    Which suggests that maybe these new FP32/INT32 mixed pipelines cannot execute FP32 at full speed (or cannot execute all the instructions).
    We do know that Turing had reduced register file access in INT32 (64 vs 256 for FP32) if it's the same (and everything suggests that Ampere is just a Turing facelift) then obviously not all FP32 instruction sequences can run on these pipelines.

    Anyway a TF table:

     Ampere TFTuring TF (me)Turing TF (NV)3080 (Ampere)3021.619.53070 (Ampere)20.414.713.32080Ti (Turing)18.75 (me) or 20.7 (NV)13.513.52080 (Turing)14 (me) or 15.5 (NV)10.110.12070 (Turing)10.4 (me) or 11.5 (NV)7.57.5

    Bonus round: RDNA1 TF
    RDNA1 has no INT32 pipeline, all the INT32 instructions are handled in the main stream. Thus it's essentially almost exactly the same as Ampere, but it has no skew in the last instruction thus +48% theoretical max applies here (Ampere +2.3%)

     Ampere TFTuring TF (me)Turing TF (NV)5700XT (RDNA1)10.017.2?

    Amusingly enough 5700XT actual performance is pretty similar to 2070 and these adjusted TF numbers show exactly that (10TF vs 10-11TF)"

8700k @ 5.1 GHz - 0 AVX @ 1.386v Dynamic Offset w/ EK Monoblock + Delid | Gigabyte Z370 Aorus Gaming 7 | EVGA 2080 Ti XC2 Ultra @ 2130 Mhz core, 7950 MHz memory @ 1.063v w/ 375W FTW3 vbios + Phanteks Glacier Block  | EK CE 420 + EK XE 360 | 2x16GB G-Skill Trident Z Royal 3600 MHz 17-20-20-38 | 2 TB Sabrent Rocket | Corsair RM1000x | Thermaltake View 71 | Alienware AW3418DW + Asus ROG Swift PG278Q (for 3D Vision) on Amazon Basics Arms | Win10 Pro 1809
 
philosophersbunker.blogspot.com
#72
IMWork87
New Member
  • Total Posts : 33
  • Reward points : 0
  • Joined: 2009/12/30 06:51:38
  • Status: offline
  • Ribbons : 0
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 11:28:01 (permalink)
The figures Nvidia uses are correct from a marketing perspective, but technically it'd make more sense to say you have a base and additional a peak performance when FP-instructions kick in. But the marketing does not care about the technical perspective, they just wanna make sure numbers are higher than before.
#73
piiman
New Member
  • Total Posts : 6
  • Reward points : 0
  • Joined: 2020/09/05 11:23:35
  • Status: offline
  • Ribbons : 0
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 11:35:35 (permalink)
He was invited by Nvidia to have the first look and hands on with the card. 
#74
vulcan1978
iCX Member
  • Total Posts : 284
  • Reward points : 0
  • Joined: 2014/05/25 02:18:19
  • Status: offline
  • Ribbons : 0
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 11:39:32 (permalink)
IMWork87
The figures Nvidia uses are correct from a marketing perspective, but technically it'd make more sense to say you have a base and additional a peak performance when FP-instructions kick in. But the marketing does not care about the technical perspective, they just wanna make sure numbers are higher than before.




The way I understand it (see previous post here) is that technically yes, there is double FP32 performance per SM but not INT32, BUT only 26% of a game rendering stream is INT32 on average. So these are still very meaningful gains (double FP32 performance): https://neogaf.com/threads/nvidia-ampere-teraflops-and-how-you-cannot-compare-them-to-turing.1564257/
 
 
 
 
 
 

8700k @ 5.1 GHz - 0 AVX @ 1.386v Dynamic Offset w/ EK Monoblock + Delid | Gigabyte Z370 Aorus Gaming 7 | EVGA 2080 Ti XC2 Ultra @ 2130 Mhz core, 7950 MHz memory @ 1.063v w/ 375W FTW3 vbios + Phanteks Glacier Block  | EK CE 420 + EK XE 360 | 2x16GB G-Skill Trident Z Royal 3600 MHz 17-20-20-38 | 2 TB Sabrent Rocket | Corsair RM1000x | Thermaltake View 71 | Alienware AW3418DW + Asus ROG Swift PG278Q (for 3D Vision) on Amazon Basics Arms | Win10 Pro 1809
 
philosophersbunker.blogspot.com
#75
kevinc313
CLASSIFIED ULTRA Member
  • Total Posts : 5004
  • Reward points : 0
  • Joined: 2019/02/28 09:27:55
  • Status: offline
  • Ribbons : 22
Re: "How is everyone so dumb?" Tomb Raider Benchmark comparison with 2080 Ti 2020/09/05 11:47:41 (permalink)
Hoggle
The original video is clearly adding in the frame rate counter. You hear 65% improvement to 85% improvement in the video but the frame counter that is added shows it as being only a 15% improvement. The original video shows only a 2080Ti frame rate which seems to be locked at 60FPS for making a smoother video for Youtube to upload. Even based on just 60FPS for Shadow of the Tomb Raider the 65% improvement would take you to 99FPS which is a nice improvement 85% improvement which is the max we saw would be going from 60FPS to 111FPS. Below I posted the original video that hasn't been edited so people can see the original post is clickbait.
 
https://www.youtube.com/watch?v=cWD01yUQdVA&feature=emb_logo




The only things that can do ~60fps in that benchmark, as shown by the partially obscured FPS counter in the upper left of the DF video are an overclocked 2080 Ti, the 3080 or something comparable.  Not a 2080, stock 2080 Ti or 1080 Ti.  The framerate is not locked.
 
https://www.youtube.com/watch?v=X5CrHwlCItg
https://youtu.be/aznxnYrZxKY
https://youtu.be/MoRAxF06_Ck
 
The FPS is partially obscured in the upper left.  It's not added in, it's part of the benchmark.
 
https://youtu.be/cWD01yUQdVA?t=412
 
The 2080 being compared does 30-45 FPS in the benchmark at those settings.
post edited by kevinc313 - 2020/09/05 11:53:44
#76
Page: < 123 Showing page 3 of 3
Jump to:
  • Back to Mobile