The Register ®

Biting the hand that feeds IT

The Register » Personal » Reviews »

Original URL: http://www.theregister.co.uk/2005/05/05/review_ati_512mb_radeon/

ATI 512MB Radeon X800 XL

By Hexus.net
Published Thursday 5th May 2005 15:45 GMT

Review ATI launched its first consumer 512MB graphics board this week, and we've been evaluating it for the past few days. Nvidia announced a 512MB part not so long ago, with a 6800 Ultra variant based on the Quadro FX 4400 hardware they've had in the professional space for a wee while. ATI's new product has no immediate pro-level pairing, and with Nvidia looking like it will bring a 512MB 6800 GT to the table soon, we're beginning to see the arrival of true half-a-gig consumer products.

ATI 512MB But why haven't we seen this kind of kit before? It's not for any technical reason. I know of engineering hardware at four of the major graphics IHVs with at least that much memory. The memory controller on nearly all of the previous generation of GPUs is able to support 512MB. Going back to the idea that Nvidia's 6800 Ultra is little more than a Quadro FX 4400 with a new BIOS and a couple of minor hardware changes, there's clearly been a need for very large frame buffers on professional-level hardware for quite some time.

Big buffers

The reason is memory pressure. Until PCI Express came along, you couldn't write data to off-card memory in anything other than full frame buffer-sized chunks, consuming CPU resources. That has downsides for professional 3D applications, given that you want to have plenty of spare CPU time for geometry processing and the preparation work that the GPU can't do for you. The CPU's time is better spent on these tasks than chucking massive frames of data around. You avoid that with big on-card frame buffers to reduce the number of times the card needs to use main memory.

Why 512MB? With a pro application like medical imaging you have a requirement to work with a couple of million vertices per frame, each defined colour at a location in 3D space. Given that the imaging application likely wants to work in 32-bit spatial precision and 32-bit colour, that's four bytes (32 bits) for each co-ordinate and another four bytes to define the colour.

A couple of million vertices at 16 bytes a vertex requires around 30MB just for the vertex data, per frame. Since it's a medical imaging application you're running, you might want to view a cross-section of the geometry using a 3D volume texture and do some sampling using a pair of smaller cubemaps. You also might want to shade the geometry too, using fragment programs which sample a decent amount of texture data. Anything larger than a 256 x 256 x 256 power of two volume texture is going to bust past 256MB, with your two million vertices worth of vertex buffers, leaving you struggling to fit texture data and the cubemaps into memory at the same time. If you're then antialiasing everything at high quality, performance goes out the window.


So far, that kind of memory pressure hasn't really exerted itself in the consumer space. Running any modern triple-A game title at 1600 x 1200 with a high amount of samples for anti-aliasing - at least four Z samples say, which is around 30MB of AA sample data to store, per frame, if you don't mask off what you don't need to sample - with a high degree of anisotropic texture filtering is possible at very interactive frame rates with the current class of high-end hardware. You're more shader-limited than anything these days, with memory pressure at even that resolution with those settings not enough to tax a 256MB board.

ATI 512MB The only way that's going to happen, unless people start playing games at larger resolutions with the same settings, is if the quality, size and number of in-game art assets increases fairly significantly. Even assuming large re-use of texture data from a GPU's texture cache, there's still very easy scope for a game to need more than the current amount of frame buffer space, per frame. The issue today is that games developers aren't loading up the hardware with everything they possibly can, because 128MB is the most common memory configuration in consumer graphics today. But what if they did, and is it starting to happen?

Consumer need

Programmable graphics processors like ATI's R4-series and Nvidia's GeForce 6-series have given developers and 3D graphics researchers the power and capability to come up with innovative new ways to construct a 3D scene, especially in terms of lighting and shadowing, in real-time with interactive frame rates. There are numerous papers on the web that discuss things like real-time radiosity, real-time global illumination (or decent approximations of it at least), shadow mapping, shadowing using spherical harmonics, shadowing using precomputed radiance transfer (PRT), which all require not only significants amounts of high-quality artist-generated data to look good, but all manner of intermediate data storage while you're building the frame to display.

Take perspective shadow mapping, a technique that's gaining favour in many new game engines, which may be combined simultaneously per frame in a game engine to light the world. You're creating new shadow map data for every frame displayed. It's not a usually a fixed workload per frame, so it can't easily be optimised. To understand that, think about why you'd calculate the light contribution for a frame for a light source that's fully occluded by both objects in the scene (it's blocked by a wall, say) and the viewing perspective (you can't physically see the light anyway so it's contribution to the frame's lighting is lessened). You only really want to use large resolution shadow maps for large area lights, such as the sun, with smaller maps useful for point or directional lights.


High-end techniques

Before we delve into numbers, a little on how perspective shadow mapping works. Using deferred shading, where you accumulate contributions to the final frame in separate render targets, before combining them for display, you render the scene from the perspective of the viewer and then again from the perspective of any light that you want to calculate a shadow contribution from, into a high resolution, high-precision render target (2000 x 2000 or 4000 x 4000 for large view sizes such as 1600 x 1200 work best for area lights, all in 32-bit colour). That's your depth map. You then sample at the intersection of the geometry in the scene from the view with the shadow map projected by the light. The result tells you whether you're in shadow at that point. When using a cube map, you render the scene onto the faces of the cube texture instead, to create your depth map.

So I set up a simple scene in Direct3D, using the algorithm from a Microsoft sample on shadow mapping and used a 2000 x 2000 shadow map for the sun, and a smaller 512 x 512 x 6 cube map to hold the shadow maps (six of them, one per face) for three small point lights (point lights are fixed). Geometry cost is less than 20,000 vertices per frame, and there's geometry to fully occlude the point lights. So it's just a simple example.

The data costs for the shadow maps are therefore relatively fixed. The 2000 x 2000 shadow map is a 16MB texture, computed once per frame for the area light, then 1.5MB each for the cube map for the point lights. Only if the engine finds the point light fully occluded does it not render the cube map contribution to shadowing. So the shadow map cost for my example wavers between 16MB and 20.5MB per frame. Add more point lights that you might be rendering, unoccluded, and data cost equals (1.5MB x point light count) + 16MB.

While that might not sound like a lot, when you then add in the cost of material data and anything like a normal map (often very large cost if you have a lot of normal-mapped geometry on screen), gloss map, environment map (can be a full-screen sized texture) or similar on top, which need to be in card memory for highest performance, you find that using shadow mapping to render your scene starts to exert memory pressure on 256MB cards.

Obviously this is just a very simple example, which doesn't take into account scaling the shadow map for resolution, re-using the shadow map per frame, using other shadowing algorithms at the same time, the number of passes taken to render (texture re-use) or render targets being combined (which are often the largest memory cost to factor in, if they're full-screen), but hopefully it lets you see how just one popular new method for shadowing imposes a fairly large fixed-cost penalty in terms of on-card data storage needed (over ten per cent of the frame buffer size for a simple area light and one point light in my small example). And you've got to store visible textures and other art assets on top of that.

Even assuming that, thanks to normal mapping, geometry size costs aren't going to massively increase in games in the near future, texture data and the methods used to render them are going to give you, in at least some titles and circumstances, significant memory pressure on a 256MB board.


Today's titles

We're not quite there yet, of course, but there are immediate, detectable benefits for games running on a 512MB board, especially if they don't quite exert memory pressure on a 256MB board all the time, but do so every now and again. Slight overflows in card memory due to memory pressure, causing the board to go out to system memory with data, are manifest as frame rate hitches, pausing gameplay for a short length of time. With a larger memory space, that hitching can be lessened, resulting in a higher minimum frame rate, slightly higher average frames per second and an smoother overall game experience. For many gamers that will be the primary benefit, over any ability to increase resolution or image quality.

Surely all that extra memory lets you use higher resolutions in your games, or more AA at the same resolution? Of course. One of the things 512MB of memory will bless one of today's fairly high-end boards with is the possible ability, especially if you're on the cusp of being shader limited, to bump up the resolution of your game and keep pretty much the same settings and frame rate. Or bump up your AA level at the same resolution. However - and this is a huge however - who realistically has the display capable of doing much more than the 1600 x 1200 that current high-end hardware is happiest with? I used to have a CRT that would cope with greater resolutions, but it neither had the screen real-estate or image quality to make me want to do so.

With LCD monitors, you're realistically limited to a 1600 x 1200 or 1680 x 1050 resolution, unless your display budget runs into the thousands, rather than the hundreds that most of us spend.

The board

That leaves a 512MB board something to ponder rather than jump on as soon as it appears. So with all that said, would you like a peek at some 512MB high-end 3D hardware? Ah, go on then.

ATI 512MB The 512MB Radeon X800 XL doesn't quite use the same PCB as the Radeon X850 Pro, but it's close, and the cooler is identical. Weighing in at 389g, slightly more than the 385g X850 PRO, the 512MB XL remains a single-slot board with almost identical thermal properties to that hardware.

Sixteen 256Mb Samsung K4J55323Q DRAMs provide the larger frame buffer on the board. Needing more than 75W, the PCI Express-based X800 XL has a six-pin power connector for extra power that can't be supplied via the slot.

Dual-DVI makes a welcome appearance and the Rage Theater on the rear of the board gives you VIVO capabilities. You use the S-Video port on the backplane for that. The cooler is an ADDA AD4512HB-E03, a 45mm ball bearing blower fan, 12mm thick. It takes a 12V DC supply, the H in 'AD4512HB' denoting high speed. Be thankful it's not a U. Finally, the 3 in 'E03' denotes control by an IC and speed sensor, so you can be sure it's controllable by the driver and other software.

At the default fan speed of 54 per cent of maximum, the blower in the X850 PRO's cooler assembly is quiet. At 100 per cent it's less irritating than that in the X800, but still rather loud. The blade design of the cooler seems to be responsible for all the audible noise as the blower shoves air across them. The pitch changes of the fan aren't that annoying to my ears, thankfully.


Performance

Half-Life 2 and Beepa's excellent Fraps 2.5.4 were combined to record framerate over a run of our custom timedemo. To exert maximum memory pressure, 1600 x 1200 (the maximum resolution of the LCD monitor used for benchmarking) with 6x AA and 16x AF was chosen, to see if 512MB of card memory has any effect on performance.

ATI 512MB Half-Life 2 framerates

The graph plots frame rate for the duration of the benchmark run, along with minimum frames per second as a constant value series. The black line, representing the 512MB board, shows the clear benefits over an otherwise identical board with half the memory size. There's enough going on during an average render of Half-Life 2 at those settings that 512MB can lessen the blow of a frame rate drop and provide a higher average frame rate and higher low frame rate.

The difference in average frame rates (68.637 for the 512MB board, 63.765 for the 256MB version) is smaller than the approx. 8fps difference in minimum frame rate, showing off one of the main reasons you'd maybe choose a larger board over the smaller one.

The X800 XL 512MB isn't faster than a 265MB X850 XT PE, however, either in lower frame rate or average frame rate. So depending on cost, a faster basic board is currently going to be a better choice for many.

Verdict

There's obviously a case to be made for 512MB consumer hardware, but I'm hoping you can see that the case to be made should be limited to certain sections of the graphics market, sections where the added costs of a board give you actual benefit, rather than none at all. Shop sensibly when considering one. The advantages will become much clearer in the future, as and when games titles start using rendering methods and art assets that make a 512MB board a much more compelling choice.

On a high end board like the X800 XL, the increased memory size will more often than not give you a smoother gameplay experience if you're running at high resolutions and with high levels of anti-aliasing applied. You therefore have to decide whether the cost for that bigger board is better spent there, rather than on something like an X850 XT PE. When you do so, consider how long you'll keep the board. If it's going to be a significant length of time, the choice makes more sense.

Overall, and I say this for the average consumer with a decent LCD rather than a very high-end CRT, 512MB hardware is something for the future rather than a purchase for today.

Review by
Hexus.net (http://www.hexus.net/)

Recent reviews

Seagate 400GB Pushbutton Backup HDD (http://www.theregister.co.uk/2005/05/04/review_seagate_400gb/)
Gigabyte GA-8N-SLi Royal nForce 4 Intel mobo (http://www.theregister.co.uk/2005/04/28/review_gigabyte_nforce_intel/)
Nokia 7710 smart phone (http://www.theregister.co.uk/2005/04/28/review_nokia_7710/)
MV Cubik GamePro small form-factor PC (http://www.theregister.co.uk/2005/04/26/review_mv_cubik_gamepro/)
Sony DSC-T7 digital camera (http://www.theregister.co.uk/2005/04/25/review_sony_dsc-t7/)

© Copyright 2008