Featured Reviews Video Cards

The GTX 960 arrives as the ASUS STRIX OC DirectCU II

Nvidia’s new Maxwell architecture arrived in mid-September with the launch of the GTX 980 and the GTX 970, replacing the Kepler GTX 770, 780 and 780 Ti.  Maxwell brought some amazing performance and energy-saving improvements at 28nm without moving to a smaller process node.  Now we have Nvidia’s third GPU in the Maxwell lineup, the GTX 960, which will replace the GTX 760 with higher performance at a lower $200 price point.

When the GTX 970 launched at $329, the GTX 760 officially dropped to $219, and Nvidia put strong pricing pressure on the competing AMD lineup as well as on their own newly EoL’d video cards last year.  In response, AMD dropped the pricing on their own cards and gave away game bundles so that today the R9 280 and the R9 285 (with a rebate) can each be had for very close to $200, with the 280X as the formerly $500 rebadged HD 7970 GE, now sitting close to $250.

102_8767And now Nvidia does it again with a faster, more efficient, and less expensive card than the GTX 760 – the GTX 960 for $199.99. It is a hard launch with many partner versions including overclocked versions for about $209, such as the ASUS STRIX GTX 960 OC DirectCU II Edition that we are reviewing today.   It will be interesting to see what AMD will do in the short term since they evidently do not have a new line-up ready as a response.

What makes Maxwell especially impressive is that the Kepler GK106 GTX 660 came out over 2 years ago for $250 on a 256-bit bus, and now we have its faster and cheaper GM206 replacement on a smaller 128-bit bus and on the same 28nm process, but showing over 1.5 times the performance!  That means that the GTX 960 is almost as fast as GTX 660 SLI, especially when it is overclocked.  Surprisingly, the GTX 960 is only rated for 125W TDP using only one PCIe connector, making it the most efficient architecture that Nvidia has ever created!  We plan to test it at the ASUS factory clocks, and then further overclocked as far as it will go on stock fan profile and fan settings.  Originally we had planned to test GTX 960 stock clock performance but neither ASUS nor EVGA overclocking utilities will allow us to set a sufficiently large negative offset to compensate for ASUS’ approximately 12% factory overclock.

ABT was invited along with the media to Nvidia’s Press Event in Monterey, California, last September for two intensive days of everything Maxwell-related.  Make sure to check out the GTX 980/970 launch article as this evaluation will be a simple recap of Maxwell architecture and a feature summary focusing on the ASUS GTX 960 OC.  As usual at BTR, special emphasis is given to performance, and we have completely updated our platform (to Devil’s Canyon) and our benchmark suite (to 28 games, including 7 new since October).

BTR specializes in bringing our readers the largest and most comprehensive benching suite anywhere, so our focus will be on the GTX 960’s frame rates in 28 modern PC games.  We will compare the GTX 960 at the ASUS factory overclock, and our own overclocked GTX 960 results to the GTX 980, the GTX 970, the GTX 760, and to the GTX 660 as well as to the R9 280X – a much faster and more expensive competitor – to see where the new Maxwell card sits in relation to performance of these other cards.  Pictured are the current cards that we have benchmarked for this evaluation.

the-cards
Left, VisionTek R9 290X; Top, GTX 980, Middle – GALAX GTX 970 EXOC, Bottom, ASUS GTX 960 OC, near-right, EVGA GTX 660 SC, far right, GTX 760

 

We use Intel’s Devil’s Canyon Haswell platform so as to not bottleneck our graphics – Core i7 4790K at 4.0GHz (with all cores Turbo synched to 4.4GHz), 2x8GB of Kingston “Beast” 2133MHz DRAM, on an ASUS Z97-E motherboard.  Our resolutions for testing are primarily at Nvidia’s target of 1920×1080, and also at 2560×1600 which is way beyond what the GTX 960 was intended for, but it will show the card’s performance under extreme stress.  First, let’s recap what’s new in Maxwell as well as unbox and summarize the features of the ASUS STRIX GTX 960 OC.

Key Features of the Maxwell GTX 960

The GeForce GTX 980, 970 and GTX 960 GPUs support all-new graphics features currently available only to Maxwell GPUs. Nvidia’s Voxel Global Illumination (VXGI) technology allows the new GPUs to render fully dynamic global illumination at playable frame rates bringing more realism and immersion to gamers.  It is not real-time ray tracing yet, but it is a good step in that direction.

PC games can also perform and look better with new anti-aliasing modes like Multi-Frame sampled Anti-Aliasing (MFAA).  MFAA combines multiple AA sample positions to produce a result that looks like higher quality anti-aliasing but with better performance. It appears to produce an image that looks similar to 4xMSAA at the performance cost of roughly 2xMSAA.  For now, MFAA is only available to the GTX 980, GTX 970 and the GTX 960 on Nvidia’s desktop GPUs.

DSR-4k-1080New GeForce Maxwell GPUs also support Dynamic Super Resolution (DSR) which is similar to driver-based SuperSampling which brings the crisp detail of 4K resolution to 1920×1080 displays. It looks great, but without a FCAT capture, cannot be shown here with Fraps.

These Maxwell GPUs retain and improve on features like ShadowPlay, which now support recording at resolutions up to 4K at 60 fps. And with the new G-SYNC displays, gamers no longer have to put up with tearing or stutter as part of the current common gaming experience.

Key Points of the Maxwell GM206 GTX 960

First, take a look at the block diagram:

gm206-block-diagMaxwell GPUs feature a new SM design that’s been tailored to improve efficiency that is partitioned into four distinct 32-CUDA core processing blocks (128 CUDA cores total per SM), each with its own dedicated resources for scheduling and instruction buffering.

To improve the efficiency of the GPU’s onboard caches, each of GM206’s SMM units features its own dedicated 96KB shared memory, while the L1/texture caching functions are combined into a 24KB pool of memory per pair of processing blocks (48KB per SMM).  The last generation Kepler GPUs had a smaller 64KB shared memory function that was also used as L1 cache.

As a result of these changes, each GM206 CUDA core is able to deliver about 1.4 times more performance per core compared to a GK106 Kepler CUDA core and two times the performance per watt.  We will be able to directly compare the performance of the ASUS GTX 960 OC with the performance of the EVGA GTX 660 SC at stock clocks in this evaluation.

New video engine

Like the GeForce GTX 980, the GeForce GTX 960 has a new display engine capable of supporting resolutions up to 5K with up to four simultaneous displays (including support for up to four 4K MST displays). GeForce GTX 960 also supports HDMI 2.0.

The GTX 960 Maxwell also ships with a NVENC encoder that adds support for H.265 encoding. H.265 compression offers bandwidth savings versus H.264 at the same quality .  Maxwell’s video encoder is supposed to improve H.264 video encode throughput by 2.5x over Kepler, including for ShadowPlay.

Because of its low power operation, some GeForce GTX 960 users may wish to use it for their home theater PCs, and one new addition that has been added to GM206 is support for H.265 (HEVC) encoding and decoding. GTX 980’s NVENC video engine offers native support for H.265 encode only, no decode, while GTX 960’s GM206 also adds native support for HDCP 2.2 content protection over HDMI.

How does the ASUS GTX 960 OC compare with its rival, AMD’s R9 280 series?

sbsThis evaluation attempts to also analyze and compare GTX 960 and R9 280X performance and we will announce a performance winner.

We expect that the 280X will be generally faster since it is priced significantly higher.  We will also look at the details to see what the new Nvidia Maxwell GTX 960 GPU brings to the table.

Before we do performance testing, let’s take a look at the GTX 960 and recap its Maxwell DX12 architecture and features.

Specifications

The GeForce GTX 960 ships with 1024 CUDA Cores and 8 SM units. The memory subsystem of the GeForce GTX 960 consists of two 64-bit memory controllers (128-bit) with 2GB of GDDR5 memory.

The base clock speed of the GeForce GTX 960 is 1126MHz. The typical Boost Clock speed is 1178MHz. The Boost Clock speed is based on an average GeForce GTX 960 card running a wide variety of games and applications. Note that the actual Boost clock will vary from game-to-game depending on conditions. GeForce GTX 960’s memory speed is 7010MHz data rate although the effective memory speed is 9300MHz.

The GeForce GTX 960 reference board measures 9.5” in length. Display outputs include one dual-link DVI, one HDMI and three DisplayPort connectors. One 6-pin PCIe power connector is required for operation.

Here are the specifications for the GTX 960:960 specs stock

Now we look at the ASUS STRIX GTX 960 OC DirectCU II specifications.  STRIX SPECSNote that the higher clocks are on not only the core, but the memory clocks are set to 7200MHz, up 190MHz over the 7010MHz of the reference GTX 960.  Increasing the memory clocks makes a noticeable performance difference for this card and we were able to achieve 8000MHz!  Here are the ASUS GTX 960 OC features:

ASUS STRIX GTX 960 DirectCU II OCWe couldn’t wait to test this card out!

GM206 Memory Subsystem

GM206 has a 128-bit memory interface with 7Gbps GDDR5 memory. In addition, GM204 has made significant enhancements to the memory compression implementation making it about one third more efficient than Kepler..

Max-memory-compressionTo reduce DRAM bandwidth demands, Nvidia GPUs make use of lossless compression techniques as data is written to memory. The bandwidth savings from this compression can be realized multiple times.

The effectiveness of Color Compression depends on which pixel ordering is chosen for the delta color calculation. Maxwell uses Nvidia’s third generation of delta color compression to improve effectiveness by offering more calculation choices.

The Maxwell GPU is able to reduce the number of bytes that have to be fetched from memory per frame by about 25% fewer bytes per frame compared with Kepler.

When combined with a G-SYNC display, the GeForce GTX 960 can deliver a gaming experience without screen tearing that currently plagues gaming when VSYNC is disabled. G-SYNC also eliminates a lot of the display stutter and reduces input lag.

IQ

Maxwell GPUs offer several new features for more flexible sampling which enable further advancements in Anti-Aliasing. Maxwell GPUs support multi-pixel programmable sampling for rasterization with extra opportunities for more flexible AA techniques in both deferred and conventional forward rendering.

mfaaROMs that were formerly used to store standard sample positions have been replaced with RAMs. The RAMs may be programmed with the standard patterns, but now the driver or application may also load the RAMs with custom positions which may vary from frame to frame or within a frame.

In a 16×16 grid per pixel, there are 256 different locations to choose from for each sample.  This sample randomization can reduce the quantization artifacts that occur with regular forms of AA.

Best of all, these freely specified sampling positions may be used in the development of effective new AA algorithms such as MFAA.

MFAA

bf4-mfaa-demoNvidia engineers have done just that so the sample patterns can be used per pixel either spatially in a single frame or interleaved across multiple frames in time.  MFAA is a new AA mode which gives the same quality as 4x MSAA but with only the performance costs of 2x MSAA, or the same image quality as 8xMSAA with the performance costs of 4xMSAA. MFAA is based on a Temporal Synthesis Filter with coverage samples per frame and per pixel.

The filter’s performance hit is low.  According to Nvidia, the typical performance advantage over MSAA is 10 to 30%.

MFAA is only available exclusively for Maxwell based GPUs on Nvidia GeForce GTX 980, 970 and GTX 960 graphics cards.  Jagged edges, and especially shimmering in motion, are quite noticeable.  For example, texture crawling in Assassin’s Creed Unity is obvious with no AA enabled.  MSAA reduces the prominence of jagged edges, but does so at a substantial performance cost.

MFAA-1Nvidia’s engineers developed MFAA to reduce this performance cost while delivering comparable image quality to MSAA by varying in interleaved fashion the sample patterns used per pixel, both spatially in a single frame, and interleaved across multiple frames over time.

By alternating AA sample patterns both temporally and spatially, 4x MFAA has the performance cost of 2x MSAA, with image quality equivalent to 4x MSAA.
By alternating AA sample patterns both temporally and spatially, 4x MFAA has the performance cost of 2x MSAA, with image quality equivalent to 4x MSAA.

Previous-generation GPUs include fixed sample patterns for anti-aliasing that are stored in Read Only Memory. If a gamer selects 2x or 4x MSAA, fixed sample patterns are used. With Maxwell, Nvidia has introduced programmable sample positions for rasterization that are stored on Random Access Memory, thus creating opportunities for new AA techniques.

MFAA w980To enable MFAA, just go to Nvidia’s control panel.  There have been some significant improvements made in the latest WHQL driver for the GTX 960’s launch that we shall explore in a follow-up article devoted to IQ and we will focus on MFAA.

Nvidia recommends that MFAA not be combined with other forms of post processing AA such as FXAA or TXAA.  And MFAA requires at least 2xMSAA to function.

And although MFAA cannot be imaged perfectly in a Fraps screenshot – the Temporal Filter is applied after the Fraps capture – we can illustrate the differences using ShadowPlay which can capture the differences between the AA settings accurately, including MFAA.  Here is our YouTube video which illustrates the varying AA levels in Assassin’s Creed Unity including TXAA and MFAA.  Pay particular attention to the wagon wheel on the right.

 

 TXAA

TXAA is a cinematic-style anti-aliasing technique designed specifically to reduce temporal aliasing (crawling and flickering in motion). TXAA is a mix of hardware AA, custom CG film style AA resolve, and a temporal filter. To filter any given pixel on the screen, TXAA uses a contribution of samples both inside and outside of the pixel in conjunction with samples from prior frames. The trade-off is blur, which for some is intolerable and for others, cinematic. This editor much prefers the mild blur of TXAA to the texture crawling and flickering while in motion without.

he performance hit of TXAA will vary from game to game and is directly correlated to the performance hit of MSAA. In Unity, TXAA takes less of a performance hit than 4xMSAA. Screen shots look better with MSAA, but playing the game with the camera in motion or even a video capture may show the advantages of TXAA if you don’t mind the blur.

NVIDIA HBAO+

To advance Screen-Space Ambient Occlusion (SSAO) tech, Nvidia’s HBAO+ looks better than the original HBAO algorithm, especially on scenes with thin objects such as grass and leaves. It is now fast enough to be used by top GPUs.

Percentage-Closer Soft Shadows (PCCS) 

Percentage-Closer Soft Shadows (PCSS) is a technique designed to simulate the natural softening of shadows that occurs over increasing distance from the occluding object. PCSS provides three notable improvements over hard shadow projections: shadow edges become progressively softer the further they are from the shadow caster, high-quality filtering reduces the prominence of aliasing, and the use of a shadow buffer allows PCSS to handle overlapping character shadows without creating “double-darkened” portions.

Dynamic Super Resolution
DSR-4k-1080Many PC gamers have used downsampling, where the GPU renders the game at a resolution higher than the screen can display, and then scales the image down to its native resolution on output to the user’s display.  This has the advantage of making the final image usually “crisper” although downsampling usually requires work and the creation of profiles for gamers to set up custom displays with the graphics driver control panel, and then adjust the display settings. While downsampling can provide a significant improvement in IQ, artifacts are sometimes observed on textures and with post processing effects.

To eliminate the artifacting and to simplify implementing downsampling, Nvidia has developed a method called Dynamic Super Resolution. Dynamic Super Resolution works just like traditional downsampling, but it has a simple on/off user control, and it uses a 13-tap Gaussian filter to eliminate the aliasing artifacts caused by the simple box filter that downsampling uses..

dsr-2

Dynamic Super Resolution can be found in the Nvidia control panel, as well as in the GeForce Experience.

GameWorks

GameWorks encompasses Nvidia’s entire library of tools freely available to game developers of every platform.  The latest developments were covered by ABT at Nvidia’s GTC 2014 (GPU Technology Conference) and include a unified Physics solver.GW---unified-flex-PhysX

Relatively new is Nvidia’s Turf Effects which simulates and renders large grass areas with full geometric representation and support for physical interaction.  It’s also scalable to work with powerful and not-so-powerful PCs.

GW---grass-works

Maxwell brings a lot to the table and the GTX 960 is no exception.  Let’s unbox the ASUS STRIX GTX 960 OC DirectCU II