A possible miscommunication between AMD and its foundry or board partners saw some, if not all partners ship AMD Radeon HD 6790 graphics cards with more ROPs enabled than specified. The HD 6790 SKU is originally specified to have 16 of the 32 ROPs on the 40 nm Barts GPU enabled, however, batches of GPUs shipped have 24 out of 32 ROPs enabled. While not a bad thing for the user at all (since the GPU ends up with more geometry crunching power at its disposal), it has AMD red faced, as it disturbs the product lineup.
The issue surfaced when GPU-Z started reading ROP count of our samples as 24, even as AMD press deck and subsequently the product page on AMD website mentioned HD 6790 ROP count as 16. We initially dismissed it as a GPU-Z bug, but as it turns out, HD 6790 indeed has 24 ROPs enabled, if GPU-Z reads so. An ROP (raster operations processor) handles a key part of the GPU's geometry rendering. ROPs process final shader output pixels and put them into memory. 24 ROPs theoretically gives these few HD 6790 cards 50% higher geometry processing power. This isn't the first time AMD fumbled with specifications. Some of the first AMD Radeon HD 4830 graphics cards shipped with 80 stream processors less than specification; more recently, some initial Radeon HD 6850 samples had all 1120 stream processors of Barts enabled. They're supposed to have 960. AMD said that it is investigating into the matter, and could make an official statement soon.
We have asked severals AIBs to test their production boards and they all have 24 ROPs so far.
Update: We have discussed this with AMD and it looks like this was our mistake and the cards really have 16 ROPs.
The register GPU-Z looks at to calculate the number of active ROPs indicates the number of disabled ROP units using set bits. For the case of HD 6790 two bits are set, which means two disabled units. The Barts GPU has a total of 32 ROPs in 8 units, which would leave us with 24 ROPs based on the register data: (8 [total ROPs] - 2 [disabled ROPs]) * 4 [pixels per clock per ROP] = 24.
If you look at the architecture diagram above (look for the thick black box or the red box), you can see that the shader units of the GPU are split into two shader arrays and the ROPs (yellow squares next to "L2 cache") are independent from these. In reality the ROPs are located inside these shader arrays, too. As a result "two deactivated ROPs" really means "two deactivated ROPs per shader array". So the correct shader count is (4 [ROPs per shader array] - 2 [disabled ROPs per shader array]) * 2 [shader arrays] * 4 [pixels per clock per ROP] = 16 !
Thank you AMD for clearing this up.
Source
No comments:
Post a Comment