EVGA

Gamers Nexus Ampere RT Core Architecture Deep Dive

Author
Intoxicus
iCX Member
  • Total Posts : 406
  • Reward points : 0
  • Joined: 2009/10/23 19:03:35
  • Status: offline
  • Ribbons : 0
2020/09/15 20:43:46 (permalink)
"NVIDIA Ampere (RTX 30) Architecture Deep-Dive: RT Cores, GDDR6X vs. GDDR6, & More"
https://www.youtube.com/watch?v=AmNL2Cg2OO8


A super informative video that helps us understand the architecture behind the Ampere cards. Super excited to see their benchmarks tomorrow also :)
 
 

"Humans are not rational animals, humans are rationalizing animals." -Robert A Heinlein
#1

14 Replies Related Threads

    Intoxicus
    iCX Member
    • Total Posts : 406
    • Reward points : 0
    • Joined: 2009/10/23 19:03:35
    • Status: offline
    • Ribbons : 0
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/15 21:52:45 (permalink)
    It's interesting to see how the doubled CUDA cores work that some, but not all, people are trying to call out as misleading. It is a bit more nuanced than many realize.

    What used to be only INT32 cores can now be FP32 cores when needed. Being that those formerly INT32 cores can now be FP32 cores there are twice as many potential FP32 CUDA cores. Nvidia could have tweaked their phrasing to make that bit more clear perhaps. It is accurate enough to state double CUDA cores. Especially, as Gamers Nexus mentioned, in gaming you tend to use more FP32 cores than INT 32 cores anyway.

    In case some don't know INT32 is 32 bit Integer math and FP32 is 32 bit Floating point math. Integers are whole numbers only, no decimals or fractions. Floating point means the decimal can be anywhere, which adds complexity to the calculation. Floating point is important for precise calculations(your GPU is truly a super complex math & geometry calculating machine truly) because fixed decimal points are simply too limited. 

    Judging by this breakdown of the architecture I would *speculate* that in tomorrow's benchmarks we will see significantly greater performance gains in Ray Tracing and DLSS, while standard rasterization won't see as much of a performance increase. I've been very skeptical about some of the early benchmarks floating around personally. This could explain why some of those benchmarks have shown what appear to be dubious performance increases. 

    If that speculation is correct then for those that are interested in getting above 60 fps using Ray Tracing these will be desirable cards. For those less interested in Ray Tracing the 3000 series might be less desirable. 

    Although we won't know for sure until we see benchmarks from trusted reviewers.

    Looking forward Ray Tracing appears to be the new PhysX. Physics processing was regarded as gimmick early on, and is now standard and normalized. It would seem that Ray Tracing will become the new standard, and rasterization will become old tech. Especially when both next gen consoles and AMD's RDNA2 are adopting Ray Tracing technology to keep up with Nvidia.

    "Humans are not rational animals, humans are rationalizing animals." -Robert A Heinlein
    #2
    torick
    Superclocked Member
    • Total Posts : 118
    • Reward points : 0
    • Joined: 2013/04/10 04:10:09
    • Status: offline
    • Ribbons : 0
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/15 23:55:17 (permalink)
    Intoxicus

    In case some don't know INT32 is 32 bit Integer math and FP32 is 32 bit Floating point math. Integers are whole numbers only, no decimals or fractions. Floating point means the decimal can be anywhere, which adds complexity to the calculation. Floating point is important for precise calculations(your GPU is truly a super complex math & geometry calculating machine truly) because fixed decimal points are simply too limited. 





    Why do we use INT32 at all then if FP32 covers all numbers no matter if odd,even, before and after a decimal? Just trying to learn some more. 
    #3
    Omoeba
    Superclocked Member
    • Total Posts : 134
    • Reward points : 0
    • Joined: 2020/08/19 15:41:31
    • Status: offline
    • Ribbons : 0
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/15 23:58:09 (permalink)
    torick
    Intoxicus

    In case some don't know INT32 is 32 bit Integer math and FP32 is 32 bit Floating point math. Integers are whole numbers only, no decimals or fractions. Floating point means the decimal can be anywhere, which adds complexity to the calculation. Floating point is important for precise calculations(your GPU is truly a super complex math & geometry calculating machine truly) because fixed decimal points are simply too limited. 





    Why do we use INT32 at all then if FP32 covers all numbers no matter if odd,even, before and after a decimal? Just trying to learn some more. 


    Correct me if I'm wrong but INT32 should be faster than FP32

    AMD Ryzen 7 3800x
    EVGA RTX 3080 FTW3 Ultra
    Gigabyte X570 Aorus Master
    G.Skill Ripjaws V DDR4-3600 CL16 2x16GB
    Inland Performance 2TB SSD
    EVGA Supernova 850 G+ PSU
    Fractal Meshify C Case

    #4
    Intoxicus
    iCX Member
    • Total Posts : 406
    • Reward points : 0
    • Joined: 2009/10/23 19:03:35
    • Status: offline
    • Ribbons : 0
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/16 00:06:21 (permalink)
    Yes, as I said FP32 is more complicated because it always needs to determine where the decimal point is for the result of a calculation.

    The thing is most of the math needed is FP32 by necessity, so INT32 being faster becomes a moot point ultimately. 
    Floating point is need for precise calculations for geometry and computing, it's a much bigger deal, and much more important. Integer is more limited and far less precise for what should be obvious reasons.
    Is INT32 faster? Sure.
    Is that speed useful for what we use the GPUs for? Not really.

    The cores than be either INT32 or FP32 are a bigger deal than they're getting credit for in my opinion. Being able to be flexible like that is huge all around. Instead of being stuck with a set amount of one or the other, it can now adapt to tasks as needed on a case by case basis.

    Steve from GN points out in his breakdown of the architecture that you will use FP32 more often for gaming than INT32.

    "Humans are not rational animals, humans are rationalizing animals." -Robert A Heinlein
    #5
    torick
    Superclocked Member
    • Total Posts : 118
    • Reward points : 0
    • Joined: 2013/04/10 04:10:09
    • Status: offline
    • Ribbons : 0
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/16 00:31:20 (permalink)
    So does INT32 find the whole number and then FP32 find anything behind the decimal? So if I have a number of 142.6523 triangles the INT32 side finds the 142 triangles and the FP32 will do the .6523 triangle portion? Do they work in conjunction with each other?
     
    #6
    HawkOculus
    iCX Member
    • Total Posts : 456
    • Reward points : 0
    • Joined: 2019/04/10 10:50:51
    • Status: offline
    • Ribbons : 1
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/16 02:58:49 (permalink)
    I’m tempering my expectations to be realistic. However, I do think that we are going to see quite the performance uplift in specific titles. And of course anything with ray tracing and DLSS is going to see a major jump up in performance.

    I’ve also got my popcorn ready for the inevitable cherry picking of certain games with CPU bottlenecks and the like as supposed “evidence” that these new cards aren’t as good as Nvidia claim (kevinc where you at?).
    #7
    mellowfluff
    Superclocked Member
    • Total Posts : 176
    • Reward points : 0
    • Joined: 2006/12/07 11:35:21
    • Status: offline
    • Ribbons : 0
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/16 05:21:33 (permalink)
    Digital Foundry's 4K benchmarks of Doom Eternal, should be all we need for now. 120fps and above at 4K .. you can tell Doom Eternal is set to global Ultra Nightmare graphics settings just by looking at it .. I think the perf is in the pudding .. Nvidia will come out smelling like a rose when it's all said and done... just waiting for AMD now.
    #8
    the_Scarlet_one
    formerly Scarlet-tech
    • Total Posts : 24581
    • Reward points : 0
    • Joined: 2013/11/13 02:48:57
    • Location: East Coast
    • Status: offline
    • Ribbons : 79
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/16 05:27:06 (permalink)
    torick
    So does INT32 find the whole number and then FP32 find anything behind the decimal? So if I have a number of 142.6523 triangles the INT32 side finds the 142 triangles and the FP32 will do the .6523 triangle portion? Do they work in conjunction with each other?
     


    My understanding could be incredibly skewed, as I don’t truly understand a whole heck of a lot, but I would assume FP32 would be the primary in games, because geometric shapes make up the game and they are rarely a whole number in size. Each shape would be a different size, even if only slightly, and would need the FP32 precise scaling to get it correct.

    I don’t know if both Int32 and FP32 can coexist on the same geometric shape, but if one shape was a whole number and the shape next to it is calculated using a decimal location, then maybe they can work together.

    I would also like to understand this better, so hopefully we can get someone with more knowledge to help us out :-)
    #9
    torick
    Superclocked Member
    • Total Posts : 118
    • Reward points : 0
    • Joined: 2013/04/10 04:10:09
    • Status: offline
    • Ribbons : 0
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/16 05:39:28 (permalink)
    the_Scarlet_one
    torick
    So does INT32 find the whole number and then FP32 find anything behind the decimal? So if I have a number of 142.6523 triangles the INT32 side finds the 142 triangles and the FP32 will do the .6523 triangle portion? Do they work in conjunction with each other?
     


    My understanding could be incredibly skewed, as I don’t truly understand a whole heck of a lot, but I would assume FP32 would be the primary in games, because geometric shapes make up the game and they are rarely a whole number in size. Each shape would be a different size, even if only slightly, and would need the FP32 precise scaling to get it correct.

    I don’t know if both Int32 and FP32 can coexist on the same geometric shape, but if one shape was a whole number and the shape next to it is calculated using a decimal location, then maybe they can work together.

    I would also like to understand this better, so hopefully we can get someone with more knowledge to help us out :-)

     
    I was doing some digging on google and from what I read is that both INT32 and FP32 are both needed because FP32 is limited on the size of calculations it can accomplish by the power of ten.
    #10
    Intoxicus
    iCX Member
    • Total Posts : 406
    • Reward points : 0
    • Joined: 2009/10/23 19:03:35
    • Status: offline
    • Ribbons : 0
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/16 05:57:53 (permalink)
    torick
    So does INT32 find the whole number and then FP32 find anything behind the decimal? So if I have a number of 142.6523 triangles the INT32 side finds the 142 triangles and the FP32 will do the .6523 triangle portion? Do they work in conjunction with each other?
     


    Yes & No kinda, truly it's one or the other.

    Think about it. You can switch between Int or Fp math in code. At cost of losing all the data behind the decimal if you switch to Int from FP. If you switch to FP from Int you're starting at a whole number and have no data behind the decimal to work with.

    You can store the number as a variable and then do different operations on that variable while preserving the original number to get around it.

    Say we store 4.2071066623585 as variable "flange" in RAM.

    Take "flange" and run an Fp and Int, but save the results separately as "FPflange" and "INTflange" respectively. But now you're taking up extra cycles and potentially impacting framerate to do so.

    INTflange is always INT until an FP operation runs on that variable.

    The number in INTflange before being operated on is now 4(rules of rounding apply.) If you convert an INT 4 to FP it become 4.0, you lost the 0.2071066623585 behind the decimal which changes the result of course.

    You can, but you probably don't want to depending on the math being done.

    And most of the math for games is complex geometry making that granularity behind the decimal very important to preserve.

    "Humans are not rational animals, humans are rationalizing animals." -Robert A Heinlein
    #11
    Intoxicus
    iCX Member
    • Total Posts : 406
    • Reward points : 0
    • Joined: 2009/10/23 19:03:35
    • Status: offline
    • Ribbons : 0
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/16 06:06:56 (permalink)
    torick
    the_Scarlet_one
    torick
    So does INT32 find the whole number and then FP32 find anything behind the decimal? So if I have a number of 142.6523 triangles the INT32 side finds the 142 triangles and the FP32 will do the .6523 triangle portion? Do they work in conjunction with each other?



    My understanding could be incredibly skewed, as I don’t truly understand a whole heck of a lot, but I would assume FP32 would be the primary in games, because geometric shapes make up the game and they are rarely a whole number in size. Each shape would be a different size, even if only slightly, and would need the FP32 precise scaling to get it correct.

    I don’t know if both Int32 and FP32 can coexist on the same geometric shape, but if one shape was a whole number and the shape next to it is calculated using a decimal location, then maybe they can work together.



    Yes, they can coexist but are unlikely to. Remember in the video from GN he specifically states that most gaming GPU calculations are FP32.

    The real limit is 32 bits. The 32 bits(4 bytes) per number is how many ones and zeros the number can use in binary. We use a base 10 system of math and it's converted to binary and back by the CPU or GPU to do the math. Which gets very complicated to think about converting base 10 into binary and back to base 10. Machine langauge seems like a headache to me. Blessed are those that have and can code at machine level.


    Personally I'm kinda curious why GPUs do not use FP64 and INT64 instead(64bit limit.) Whatever that reason is, it's beyond my knowledge.
    post edited by the_Scarlet_one - 2020/09/16 06:45:12

    "Humans are not rational animals, humans are rationalizing animals." -Robert A Heinlein
    #12
    torick
    Superclocked Member
    • Total Posts : 118
    • Reward points : 0
    • Joined: 2013/04/10 04:10:09
    • Status: offline
    • Ribbons : 0
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/16 06:36:29 (permalink)
    Intoxicus
    torick
    the_Scarlet_one
    torick
    So does INT32 find the whole number and then FP32 find anything behind the decimal? So if I have a number of 142.6523 triangles the INT32 side finds the 142 triangles and the FP32 will do the .6523 triangle portion? Do they work in conjunction with each other?



    My understanding could be incredibly skewed, as I don’t truly understand a whole heck of a lot, but I would assume FP32 would be the primary in games, because geometric shapes make up the game and they are rarely a whole number in size. Each shape would be a different size, even if only slightly, and would need the FP32 precise scaling to get it correct.

    I don’t know if both Int32 and FP32 can coexist on the same geometric shape, but if one shape was a whole number and the shape next to it is calculated using a decimal location, then maybe they can work together.



    Yes, they can coexist but are unlikely to. Remember in the video from GN he specifically states that most gaming GPU calculations are FP32.

    The real limit is 32 bits. The 32 bits(4 bytes) per number is how many ones and zeros the number can use in binary. We use a base 10 system of math and it's converted to binary and back by the CPU or GPU to do the math. Which gets very complicated to think about converting base 10 into binary and back to base 10. Machine langauge seems like a headache to me. Blessed are those that have and can code at machine level.


    Personally I'm kinda curious why GPUs do not use FP64 and INT64 instead(64bit limit.) Whatever that reason is, it's beyond my knowledge.

    Thank you for more information into the inner workings of GPU architect. I thing the 64 bit issue might just be on how big they want the chip size but that's just my W.A.G. on it.
    post edited by the_Scarlet_one - 2020/09/16 06:44:53
    #13
    the_Scarlet_one
    formerly Scarlet-tech
    • Total Posts : 24581
    • Reward points : 0
    • Joined: 2013/11/13 02:48:57
    • Location: East Coast
    • Status: offline
    • Ribbons : 79
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/16 06:52:12 (permalink)
    I hope you two don’t mind, I added a quote bracket to each of your posts and one was lost and it was causing a running quote. That was the only edit I made, to clean up your posts :-)
    #14
    Intoxicus
    iCX Member
    • Total Posts : 406
    • Reward points : 0
    • Joined: 2009/10/23 19:03:35
    • Status: offline
    • Ribbons : 0
    Re: Gamers Nexus Ampere RT Core Architecture Deep Dive 2020/09/16 07:09:52 (permalink)
    Thanks for the assist on the quotes :)
    I've noticed sometimes the formatting comes out whacky if you press the wuote button and it doesn't quote properly.

    Btw if anyone out there has noticed I made any errors in my explanations please correct me kindly and respectfully. It has been too long since I messed with coding or anything and I could be derping on details potentially.

    Also if anyone remembers that Tiberian Evolution mod for C&C Renegade that was my passion project while in high school. I'm super tempted to back into coding and modding. Currently too busy figuring out how to be a Content Creator to have time to learn code though.

    "Humans are not rational animals, humans are rationalizing animals." -Robert A Heinlein
    #15
    Jump to:
  • Back to Mobile