Beyond Programmable Shading: Fundamentals
All slides © 2008 Advanced Micro Devices, Inc. Used with permission.
All slides © 2008 Advanced Micro Devices, Inc. Used with permission.
1
7
ATI Radeon™ HD 4870 Computation Highlights
• >100 GB/s memory bandwidth
– 256b GDDR5 interface
• Targeted for handling thousands
of simultaneous lightweight
threads
• 800 (160x5) stream processors
– 640 (160x4) basic units
(FMAC, ADD/SUB, etc.)
~1.2 TFlops theoretical peak
– 160 enhanced transcendental units
(adds COS, LOG, EXP, RSQ, etc.)
– Support for INT/UINT in all units
(ADD/SUB, AND, XOR, NOT, OR,
etc.)
– 64-bit double precision FP support
1/5 single precision rate (~250GFlops
theoretical performance)
4 SIMDs -> 10 SIMDs
– 2.5X peak performance increase
over ATI Radeon™ 3870
– ~1.2 TFlops FP32 theoretical peak
– ~250 GFlops FP64 theoretical peak
Scratch-pad memories
– 16KB per SIMD (LDS)
– 16KB across SIMDs (GDS)
Synchronization capabilities
Compute Shader
– Launch work without rasterization
– “Linear” scheduling
– Faster thread launch
Comments to this Manuals