Maxwell Arrives! High Performance & Efficiency takes the crown

Photo of author

100_8402

(This evaluation was originally published on AlienBabelTech by this author on September 18, 2014 as Nvidia’s NDA ended and was subsequently lost from the database in a hard drive crash in December and it has been republished and updated here)

Nvidia is using Game24 today to showcase and release their brand new Maxwell architecture, the GTX 980 and the GTX 970, to replace the Kepler GTX 770, 780 and 780 Ti. They are promising some amazing performance and energy-saving improvements at 28nm without moving to a smaller process node.

At the same time, Nvidia is introducing their GTX 980 flagship pricing at $549, down from $700 for the now discontinued GTX 780 Ti at its launch. With the GTX 970 coming in at $329 and the GTX 760 officially dropping to $219, Nvidia is putting strong pricing pressure on the competing AMD lineup as well as on their own newly EoL’d video cards. Make no mistake that the GTX 980 is the new Nvidia GTX performance flagship, but at a lower price.

What makes Maxwell especially impressive is that the Kepler GK104 GTX 680 came out 2-1/2 years ago on a 256-bit bus and now we have its replacement on the same-sized bus and on the same 28nm process, but showing over 1.6 times the performance! That means that the GTX 980 is almost as fast as GTX 680 SLI, especially when it is overclocked. Surprisingly, the GTX 980 is only rated for 165W TDP, making it the most efficient architecture that Nvidia has ever created! In fact, we will bench the 300W TDP GTX 690 at GTX 680 SLI speeds to see if the 165W GTX 980 can catch it when overclocked.

ABT was invited along with the media to Nvidia’s Press Event in Monterey, California, last week for two intensive days of everything Maxwell-related. There was so much information to digest that by the end of the event, many of the editors were under the impression that the GTX 980 wasn’t any faster than the GTX 780 Ti. In fact, Nvidia expects the GTX 980 to be overall about 8% faster than the GTX 780 Ti that it replaces, making it the fastest GPU in the world, besides being the most energy-efficient. So as to avoid this kind of confusion, this evaluation will be a simple introduction to Maxwell architecture featuring the GTX 980, and a summary of its features, with a special emphasis given to performance.

100_8420ABT specializes in bringing our readers the largest and most comprehensive 28-game benching suite anywhere, so our focus will be on the GTX 980’s frame rates in 28 modern PC games. We will compare the stock and overclocked GTX 980 to the GTX 780 Ti, GTX 780, GTX 770, GTX 680 and GTX 680 SLI (using the GTX 690) to see where the new Maxwell card sits in relation to the Kepler top cards’ performance.

We also want to see how the competition compares, and we shall bench AMD’s top card, the R9 290X at Uber clocks, as well as the R9 280X which is a rebadged HD 7970 and the current competition to the GTX 770.

Pictured from right to left are the current cards that we have benchmarked for this evaluation which include reference versions of the GTX 980, the GTX 780 Ti, the GTX 780, and the GTX 770. Below them, are pictured the red and black VisionTek R9 280X and the black and red PowerColor R9 290X PCS+. Not shown is the reference GTX 690 which stands in for GTX 680 SLI, nor the reference GTX 680. We did not have time to acquire a GTX 970 from one of Nvidia’s partners before the launch, but we will have one for review soon. The expectations are that it has about 80% of the performance of the GTX 980, and it will etail for $329, putting more strong pressure on AMD’s lineup, especially on the R9 290, as overclocked the GTX 970 should at least match the R9 290X’ performance.

We will use Intel’s Haswell platform so as to not bottleneck our graphics – Core i7 4770K at 4.0GHz, 2x8GB of Kingston “Beast” 2133MHz DRAM, on a Z77 ECS flagship Golden motherboard. Our resolutions for testing are 1920×1080 and 2560×1600. First, let’s look at what’s new in Maxwell.

Key Features of the Maxwell GTX 980

The GeForce GTX 980 and GTX 970 GPUs support all-new graphics features currently available only to the GTX 980 and the GTX 970. Nvidia’s Voxel Global Illumination (VXGI) technology allows the new GPUs to render fully dynamic global illumination at playable frame rates bringing more realism and immersion to gamers. It is not real-time ray tracing yet, but it a good step in that direction.

PC games can also perform and look better with new anti-aliasing modes like Multi-Frame sampled AA Aq(MFAA). MFAA combines multiple AA sample positions to produce a result that looks like higher quality anti-aliasing but with better performance. From just briefly looking at MFAA, it appears to produce an image that looks similar to 4xMSAA at the performance cost of roughly 2xMSAA. For now, MFAA is only available to the GTX 980 and the GTX 970.

DSR-4k-1080New GeForce Maxwell GPUs also support Dynamic Super Resolution (DSR) which is similar to driver-based SuperSampling which brings the crisp detail of 4K resolution to 1920×1080 displays. It looks great, but without a FCAT capture, cannot be shown here with Fraps. These Maxwell GPUs retain and improve on features like ShadowPlay, which now supports recording at resolutions up to 4K at 60 fps. And with the new G-SYNC displays, gamers no longer have to put up with tearing or stutter as part of the current common gaming experience.

The next generation of games will not only look better and run faster on the GeForce GTX 980, they’ll also be more immersive thanks to virtual reality headsets like the Oculus Rift. With VR Direct, Nvidia has developed a number of advancements for virtual reality reducing latency, improving image quality, and bringing a whole range of new content to VR.

This editor got to experience an Oculus Rift demo created from Unreal Engine 4’s Infiltrator assets at the Press Event that is an awesome extension of S3D. VR frame rates need to be locked to a minimum of 75 fps for fluidity – and they need to be rendered twice, once for each eye. Eventually, as the resolution increases, the grainy look will disappear allowing for more realism, but at the price of requiring extreme graphics performance from the video card or cards.

Key Points of the Maxwell GM204 GTX 980

First a look at the diagram:

GeForce_GTX_980_Block_Diagram_FINALThe GTX 980 GM204 has 64 Raster Operating Units, double Kepler GK110’s 32.

There are 5.2 billion transistors packed into the GM204’s die size of 398 mm2.

Dynamic Super Resolution (DSR) brings 4K sharpness to 1080P.

New anti-aliasing modes like Multi-Frame sampled AA provide 4xAA IQ with only a 2xAA performance penalty.

Multi-Pixel Programmable Sampling, technology improves sample randomization and reduces artifacts.

The GTX 980 has only a TDP of 165W.

GeForce GTX 980 has HDMI 2.0 support and support for 4 displays

What’s New with the Maxwell GTX 980 besides performance and efficiency?

Nvidia’s chart comparing the Kepler GK104 GTX 680 GPU with the Maxwell GM204 GTX 980 GPU is very helpful. Just two and one-half years ago, the GK104 GTX 680 was released as Nvidia’s Kepler flagship boasting good performance increases and energy efficiency over Fermi, and beating AMD’s then-flagship HD 7970 in price, energy-efficiency, and in performance.

In the meantime, months later, Nvidia released their big-die GK110 TITANs and the GTX 780 Ti series to secure the single-GPU performance crown at a much higher price. And now, the midrange-die GM204 GTX 980 takes the overall single-GPU performance crown with formerly unheard of efficiency, and at a much lower price than any TITAN or GTX 780/Ti.

680v980
The GM204 GeForce GTX 980 has doubled the SMs compared to the GK104 GPU used in the GeForce GTX 680. Because of the changes implemented in the new Maxwell SM, Nvidia engineers integrated twice the SMs without doubling the die size. With each SM containing its own dedicated PolyMorph Engine, GeForce GTX 980 also has twice the number of geometry units as the GTX 680 (and GTX 770).

Since eight texture units per SMM works best for Maxwell, the total number of texture units are the same 128 as with Kepler. Since the GeForce GTX 980 has higher clocks, the texture fill rate improves by 12%. To improve performance in high AA/high resolution gaming scenarios, Nvidia doubled the number of ROPs from Kepler’s 32 to Maxwell’s 64. Adding up the changes, to the added benefit of higher clocks, pixel fill-rate is actually more than double that of GTX 680 – 72 Gpixels/sec for GTX 980 versus 32.2 Gpixels/sec for GTX 680.

Considering the architectural changes made, we expect to see a larger performance delta in favor of the GTX 980 at 1920×1080 and at 4K resolution, but slightly lower at 2560×1600 than the large die GK110’s GTX 780 Ti with its 384-bit interface. We will be able to confirm this when ABT gets a 4K display, but we noted a slight drop at 2560×1600 compared to at 1920×1080.

How does the GTX 980 compare with its rival, AMD’s R9 290X?

This evaluation attempts to also analyze and compare GTX 980 and R9 290X performance and we will announce a performance winner. We expect that the GTX 980 will be solidly faster just as the GTX 780 Ti dominated, and we expect AMD to react with pricing cuts and game bundles which may have already started. We will also look at the details to see what the new Nvidia Maxwell GPU brings to the table.

Before we do performance testing, let’s take a look at the GTX 980 and quickly recap its new Maxwell DX12 architecture and features.

Maxwell GTX 980 Architecture and Features

We have posted just a brief introduction to Maxwell’s GTX 980 on page one. This page will expand on it, and most of the (obviously non-professional) pictures are taken from the sessions ABT attended at Nvidia’s Maxwell Press Event in Monterey, California. The professional pictures and graphs are courtesy of Nvidia’s press deck, whitepaper and reviewer’s guide.

Specifications

Here are the specifications for the GTX 980:

980-specsThe GTX 970 is cut down from the GTX 980 so performance is about 20% less, and here are its specifications:970-specs

There were quite a few changes between the GTX 680 and the GTX 980.

Here are the specifications of the Kepler GK104 GTX 680, introduced barely 2-1/2 years ago on the same 28nm process.

We can see that the memory speed and bandwidth have taken a jump while the memory configuration of the GTX 68o has remained with the same 256-bit. The vRAM has been increased from 2GB to 4GB and the core clock has gone up from 1006MHz to 1126MHz, while the power requirements have dropped from 195W down to 165W.

Here is the GTX 980 block diagram.

GeForce_GTX_980_Block_Diagram_FINAL

The SM is at the foundation of Nvidia GPUs. Almost every operation flows through the SM at some point in the rendering pipeline. Maxwell GPUs feature a new SM that has been designed to provide significantly improved performance per watt over Kepler GeForce GPUs, which were themselves a big improvement over Fermi.GeForce_GTX_980_SM_Diagram_

Compared to GPUs based on Kepler architecture, Maxwell’s new SMM design has been redesigned to improve efficiency. Each SMM contains four warp schedulers, and each warp scheduler is capable of dispatching two instructions per warp every clock. Compared to Kepler’s scheduling logic, there are a number of improvements in the scheduler to further reduce redundancy which improves energy efficiency.

Maxwell uses a completely new datapath organization. Whereas Kepler’s SM used 192 CUDA Cores in a non-power-of-two organization, the Maxwell SMM is divided into four 32-CUDA core processing blocks maintaining 128 CUDA cores total per SM, each with its own dedicated resources. Maxwell’s new datapath aligns better with warp size, saving area and power formerly wasted in managing data transfer by Kepler’s more complex datapath organization.

Compared to Kepler, the SMM’s memory hierarchy has also changed. As a result of these changes, each Maxwell CUDA core is able to deliver about 1.4x more performance per core compared to a Kepler CUDA core, and twice the performance per watt. With 33% fewer total cores per SM, but with 1.4 times the performance per core, each Maxwell SMM can deliver total per-SM performance similar to Kepler’s SMX, thus saving on die space.

PolyMorph Engine 3.0

Tessellation is DirectX 11’s key feature and it will play a bigger role in future games. With the addition of more SMs in GM204, GTX 980 also benefits from twice the Polymorph Engines, compared to GTX 680. As a result, performance on geometry-heavy workloads doubles.

GM204 Memory Subsystem

In GM204, one ROP partition contains 16 ROP units compared with eight ROP units per partition in Kepler. Each ROP can process a single color sample. With four ROP partitions, a full GM204 has 64 ROPs, twice that of GK104.

GM204 has a 256-bit memory interface with 7Gbps GDDR5 memory. GM204 also features a unified 2048KB L2 cache that is shared across the GPU. In addition, GM204 has made significant enhancements to the memory compression implementation.

Max-memory-compressionTo reduce DRAM bandwidth demands, Nvidia GPUs make use of lossless compression techniques as data is written to memory. The bandwidth savings from this compression can be realized multiple times.

The effectiveness of Delta Color Compression depends on which pixel ordering is chosen for the delta color calculation. Maxwell uses Nvidia’s third generation of delta color compression to improve effectiveness by offering more calculation choices.

The Maxwell GPU is able to reduce the number of bytes that have to be fetched from memory per frame by about 25% fewer bytes per frame compared with Kepler.

GeForce GTX 980 is also the world’s first GPU to support HDMI 2.0. HDMI 1.4, can only support 4K display at 30Hz for “444” RGB pixels, and at 60Hz for “420” YUV pixels. However, with HDMI 2.0, the GPU can now drive full-resolution “444” RGB pixels at 4K resolution at 60Hz.

100_8398The GeForce GTX 980 reference board design ships with three DisplayPort 1.2 connectors, one HDMI 2.0 connector, and one dual-link DVI connector. Up to four display heads can be driven simultaneously from one card.

When combined with a G-SYNC display, the GeForce GTX 980 delivers a gaming experience without screen tearing that currently plagues gaming when Vsync is disabled. G-SYNC also eliminates a lot of the display stutter and reduces input lag. Utilizing DisplayPort, the GeForce GTX 980 can drive up to three G-SYNC displays in Surround.

GTX 980 Maxwell also ships with a NVENC encoder that adds support for H.265 encoding. H.265 compression offers bandwidth savings versus H.264 at the same quality . Maxwell’s video encoder is supposed to improve H.264 video encode throughput by 2.5x over Kepler, enabling it to encode 4K video at 60 fps, including for ShadowPlay.

LIGHTING

In the real world, all objects are lit by a combination of direct light and indirect light. “Global illumination” (GI) is a term for lighting systems that model this effect. Without indirect lighting, scenes look harsh and artificial. However, while direct lighting is simple to compute, indirect lighting computations are highly complex and difficult to implement in real time on even powerful GPUs.

While some forms of GI have been used in many of today’s most popular games, their implementations have relied on “prebaked” lighting for performance reasons.vxgi

Because prebaked lighting is not dynamic, it’s often difficult or impossible to update the indirect light sources when in-game changes occur. Prebaked indirect lighting only models the static objects of the scene requiring a lot of artwork as workarounds.
voxels Nvidia’s new GI technology uses a voxel grid to store scene and lighting information, and a new voxel cone tracing process to gather indirect lighting from the voxel grid.

The term “voxel” is related to “pixel” – a volumetric pixel. While a pixel represents a 2D point in space, a voxel represents a small cube which is a volume of 3D space. To perform global illumination, devs dice the entire 3D space of the scene in all three dimensions, into small cubes called voxels. “Voxelization” is the process of determining the content of the scene at every voxel, analogous to “rasterization” which is the process of determining the value of a scene at a given 2D coordinate.

conetracing-reflectiveOnce a series of complicate steps are completed, involving describing voxel coverage and how the physical geometry will respond to light, light injection, and finally rasterization, the approach of calculating indirect lighting during the final rendering pass of VXGI is called cone tracing. Cone tracing is an approximation of the effect of secondary rays that are used in ray tracing methods. Using cones results in very realistic approximations of global illumination without the performance crippling hit of ray tracing.

Rendering a real-time reflection from a glossy curved surface has been always difficult in a game. Using Nvidia’s new approach, thousands of secondary rays are replaced with just a few voxel cones that are traced through the voxel grid. Using only a comparatively few scattered cones, diffuse or specular lighting can be quickly computed and rendered in real time, even of glossy and metallic surfaces.

Hardware Acceleration for VXGI – Multi-Projection and Conservative Raster

DX12-conserv-rastOne important property of VXGI is that it is very scalable. The voxelization stage is challenged by the need to analyze the same scene geometry from many views to determine coverage and lighting, called “multi-projection.” Acceleration of multi-projection is a useful capability that has been expanded in Maxwell by what Nvidia calls, “Viewport Multicast.” Maxwell can use dedicated hardware to automatically broadcast input geometry to any number of desired render targets, avoiding geometry shader overhead.

“Conservative Raster” is another feature in Maxwell that accelerates the voxelization process. With conservative rasterization, a pixel is considered covered if any part of the pixel is covered by any part of the triangle.

Hardware support for conservative raster is very helpful for the coverage phase of voxelization as fractional coverage of each voxel needs to be determined with high accuracy. Conservative raster helps the hardware to perform this calculation efficiently.

Tiled Resources

DirectX 11.2 introduced a feature called Tiled Resources that could be accelerated with either Kepler or Maxwell’s hardware feature called Sparse Texture. With Tiled Resources, only the portions of the textures required for rendering are stored in the GPU’s memory. Tiled Resources works by breaking textures down into tiles and the application determines which tiles ned to be loaded into video memory. Devs can use the same texture tile in multiple textures without any additional texture memory cost; this is referred to as aliasing which can avoid redundancy.

DX12 introduces the concept of a “Raster Ordered View,” which Maxwell supports by adding a new interlock unit in the shader with similar functionality to the unit in ROP which may improve efficiency in rendering algorithms for Transparency AA.

DirectX 12

At Editor’s Day, last week, the press had the advantage of having Microsoft’s D3D Development Lead, Max Mullen, give a technical presentation beyond what was presented anywhere earlier.DX12-feat-focus

Microsoft’s upcoming DirectX 12 API has been designed to increase CPU efficiency greater than earlier DirectX versions. This is accomplished by giving game developers more explicit control over hardware. DX12 has the potential to be much more efficient than DX11 at the cost of (a lot of) work on the part of the developer. All Fermi, Kepler, and Maxwell GPUs will fully support the DX12 API.

In addition, the DX12 release of DirectX will introduce a number of new features for graphics rendering. Microsoft has progressively disclosed some of these features, at GDC, and during Nvidia’s Editor’s conference last week.

Conservative Raster and Raster Ordered Views gives the developers control over the ordering pixel shader operations specifically supported by Maxwell. However, if they choose not to do the extra work, the devs can always fall back on the DX11.2 pathway.

IQ

Maxwell GPUs offer several new features for more flexible sampling which enable further advancements in Anti-Aliasing. Maxwell GPUs support multi-pixel programmable sampling for rasterization with extra opportunities for more flexible AA techniques in both deferred and conventional forward rendering.

mfaaROMs that were formerly used to store standard sample positions are replaced with RAMs. The RAMs may be programmed with the standard patterns, but now the driver or application may also load the RAMs with custom positions which may vary from frame to frame or within a frame.

In a 16×16 grid per pixel, there are 256 different locations to choose from for each sample. This sample randomization can reduce the quantization artifacts that occur with regular forms of AA.

Best of all, these freely specified sampling positions may be used in the development of effective new AA algorithms.

bf4-mfaa-demoNvidia engineers have done just that so the sample patterns can be used per pixel either spatially in a single frame or interleaved across multiple frames in time. Multi-Frame Sampled AA (MFAA) is a new AA technique that alternates AA sample patterns both temporally and spatially to produce a higher image quality while still offering a performance advantage over traditional MSAA. MFAA can deliver image quality approaching that of 2xAA at the performance cost of 2xAA. MFAA is still under development but looked quite promising in the demos at Editor’s Day.

Dynamic Super Resolution
DSR-4k-1080Many PC gamers have used downsampling, where the GPU renders the game at a resolution higher than the screen can display, and then scales the image down to its native resolution on output to the user’s display. This has the advantage of making the final image usually “crisper”.

Downsampling usually requires work and the creation of profiles for gamers to set up custom displays with the graphics driver control panel, and then adjust the display settings. While downsampling can provide a significant improvement in IQ, artifacts are sometimes observed on textures and with post processing effects.
dsr-1Nvidia has developed an easy method called Dynamic Super Resolution. Dynamic Super Resolution works just like traditional downsampling, but it has a simple on/off user control, and it uses a 13-tap Gaussian filter to eliminate the aliasing artifacts caused by the simple box filter that downsampling uses..

dsr-2

Dynamic Super Resolution can be found in the control panel of the GeForce Release 343 driver, as well as in the GeForce Experience.

GameWorks

GameWorks encompasses Nvidia’s entire library of tool freely available to game developers of every platform. Just like with the GeForce Experience, GameWorks just gets better and better. The latest developments were covered by ABT at Nvidia’s GTC 2014 (GPU Technology Conference) and include a unified Physics solver.GW---unified-flex-PhysX

What was new at the Editor’s Day Press Event is Nvidia’s Turf Effects which simulates and renders large grass areas with full geometric representation and support for physical interaction. It’s also scalable to work with powerful and not-so-powerful PCs.

GW---grass-works

Press Day, Monterey, California

Nvidia invited the media to Monterey, California, from Wednesday, September 12 through Friday, September 14. The weather was sunny and perfect for California’s Central Coast. It was an intensive deep dive into Maxwell architecture and Nvidia introduced the GTX 980 and the GTX 970. Besides the many hours spent in presentations, Nvidia provided top class lodgings, entertainment, excursions, food and drink. We want to thank Nvidia for their hospitality and we had a very good time as well as learned much!

PC gaming was at the fore of the Event and all of the Editors received SHIELD tablets. Nvidia thoughtfully included the SHIELD controller and stand as well as a HDMI cable so the editors were able to game in the hotel room using the big screen TV. PC Gaming is now estimated to have 335 million gamers, 200 million who game on GeForce-equipped PCs and devices. Nvidia wisely chooses to support this growing 140 billion dollar a year market, and they acknowledge professional and LAN gaming with Game24 which was held in several major cities around the globe. ABT was represented at Game24 in Indianapolis and we have a few images and comments from Dave McOwen, ABT’s co-founder, after the last Press Day-related story of the Apollo moon landings.

Finally, one presentation at Press Day cannot go without comment.

Nvidia Proves the Apollo Moon Landings were Real

Moon-1Nvidia used Maxwell to digitally rebuild one of the moon landing’s most iconic photograph; specifically Neil Armstrong’s shot of Buzz Aldrin clambering down the lunar module’s ladder years ago

The photo to the left shows Buzz Aldrin lit up against the dark shape of the lunar module behind him.

Conspiracy theorists claim that because the sun is behind the lunar module, and Aldrin is in its shadow, Aldrin must have been lit by something other than the sun. They claim there must have been another auxiliary light source and perhaps it was produced in a movie studio by the US Government.

Well, Nvidia did a faithful recreation and modeled the entire scene from the photos available and were able to prove that the photos were not faked! Nvidia’s demo team rebuilt the scene of the moon landing in Unreal Engine 4, the latest version of the game engine developed by Epic Games. They simulated how the sun’s rays, coming from behind the lander, bounced off the moon’s surface, and Armstrong’s suit, to cast light on Aldrin as he stepped off the lander.

You can read about the full details on Nvidia’s blog,

http://blogs.nvidia.com/blog/2014/09/18/debunked/

Game24

9-18-2014 Nvidia Game24 Indy location David McOwen selfieNvidia celebrated the public release of Maxwell’s GTX 980 and GTX 970 with a 24 hour celebration called Game24 which were held in cities all around the globe.

ABT’s co-founder attended the Indianapolis event and had so much fun gaming that he forgot to take pictures!

Here is what he said about the event:

The Indianapolis Gaming crew of 5 guys won all three Tournaments against Chicago and California.

Each one gets the new Maxwell Video card.

It was my first time seeing the Portable Nvidia Shield Android. Pretty impressive device and good price at $199.

They did not have any Shield Tablets with them though to see.

I had not gamed since Georgia. I got pretty tired out but surprising did pretty well.

Counter-Strike was the main game played. Was my first time with it but pretty much the same as Call of Duty and Battlefield which I used to play.

9-18-2014 Nvidia Game24 Indy location outside

9-18-2014 Nvidia Game24 Indy location

He got home pretty late but had a great time!

Let’s take a closer look at the GTX 980

A look at the cards

A good-looking card like the GTX 980 needs to arrive in a box that emphasizes it is special.100_8385

We can see that there is one dual link DVI port, three HDMI 2.0 ports. and a DisplayPort.

Here you can see the GTX 980 with its backplate cover on. It looks really clean to have a backplate and it serves the useful function of also cooling the vRAM better. 100_8398

And here is the GTX 980 with a view of the backplate. It looks much nicer than a raw PCBNVIDIA_GeForce_GTX_980_BackHere is the backplate with the cover removed for SLI when the two cards must be slotted right next to each other. Nvidia says that the small opening makes all the difference. NVIDIA_GeForce_GTX_980_BackPiece

UPDATED 01/14/2015

Yes, the tiny removable portion of the backplate does make a difference as our evaluation in January shows.

Here is the bare PCB.NVIDIA_GeForce_GTX_980_Front

Now here is the back with the cover off.NVIDIA_GeForce_GTX_980_BackPCB
Here is the GTX 980 from another angle

100_8402

Now we look from a top down look with the shroud removed.NVIDIA_GeForce_GTX_980_3Qtr

The chip itself is quite small.Maxwell_GM204_DIE_3D_V17_Fi

SLI and Tri-SLI

The GTX 980 is set up for SLI and Tri-SLI. We hope to cover SLI performance of the GTX 980 in an upcoming article.

The specifications look extraordinary with solid improvements over the Kepler-based GTX 680, GTX 770 and even over the GK110 large-die GTX 780/Ti. Let’s check out performance after we look at our test configuration on the next page.

Test Configuration – Hardware

  • Test Configuration – Hardware
  • Intel Core i7-4770K (reference 3.5GHz, HyperThreading and Turbo boost is on to 3.7GHz; overclocked to 4.0GHz; DX11 CPU graphics), supplied by Intel.
  • ECS GANK Domination Z87H3-A2X motherboard (Intel Z87 chipset, latest BIOS, PCIe 3.0 specification, CrossFire/SLI 8x+8x) supplied by ECS
  • Kingston 16GB HyperX Beast DDR3 PC2133 RAM (2×8 GB, dual-channel at 2133MHz, supplied by Kingston)
  • GeForce GTX 980, 4GB, reference clocks and also further overclocked, supplied by Nvidia under NDA
  • GeForce GTX 780 reference design, 3GB reference clocks, supplied by Nvidia
  • Nvidia GTX 780 Ti, 3 GB reference design and clocks, supplied by Nvidia
  • Nvidia GTX 770 2GB reference design and clocks, supplied by Nvidia
  • Nvidia GTX 680 2GB reference design and clocks, supplied by Nvidia
  • Nvidia GTX 690 4GB (2 per GPU) reference design, and clocks overclocked +91MHz to simulate GTX 680 SLI, supplied by Nvidia
  • PowerColor R9 290X PCS+, 4GB reference designs at Uber clocks.
  • VisionTek R9 280X, 2GB, reference clocks, supplied by VisionTek
  • Two 2TB Toshiba 7200 rpm HDDs
  • Cooler Master Silent Pro Platinum 1000W power supply unit supplied by Cooler Master
  • Thermaltake Water2.0 Pro watercooler, supplied by Thermaltake
  • Onboard Realtek Audio
  • Genius SP-D150 speakers, supplied by Genius
  • Thermaltake Overseer RX-I full tower case, supplied by Thermaltake
  • ASUS 12X Blu-ray burner
  • HP LP 3065 2560×1600 thirty-inch LCD

Test Configuration – Software

  • Nvidia GeForce 344.07 release drivers for the GTX 980, GTX 680 and 690. GeForce 343.79 used for the GTX 770/780/780 Ti. High Quality, prefer maximum performance, single display.
  • AMD 14.7 RC Beta 7 Catalyst drivers for R9 290X and 280X. High Quality – optimizations off; use application settings
  • Windows 7 64-bit; very latest updates
  • Latest DirectX
  • All games are patched to their latest versions.
  • VSync is off in the control panels.
  • AA enabled as noted in games; all in-game settings are specified with 16xAF always applied; 16xAF forced in control panel for Crysis.
  • All results show average, minimum and maximum frame rates except as noted.
  • Highest quality sound (stereo) used in all games.
  • Windows 7 64, all DX9 titles were run under the DX9 render path; DX10 titles were run under DX10 render paths; DX11 titles under DX11 render paths.

The Benchmarks

Synthetic

  • 3DMark 11
  • Firestrike – Basic & Extreme
  • Heaven 4.0
DX9
  • The Witcher 2
  • Borderlands 2
  • Aliens: Colonial Marines
DX10
  • Crysis
DX11
  • STALKER, Call of Pripyat
  • Civilization V
  • Max Payne 3
  • the Secret World
  • Sleeping Dogs
  • Sniper Elite V2
  • Hitman: Absolution
  • Far Cry 3
  • CoD: Ghosts
  • Tomb Raider: 2013
  • Crysis 3
  • BioShock: Infinite
  • Metro: Last Light
  • GRID 2
  • Battlefield 4
  • Splinter Cell: Blacklist
  • ArmA 3*
  • Total War: Rome II
  • Batman: Arkham Origins
  • Assassin’s Creed IV: Black Flag
  • Thief
  • Sniper Elite 3
  • Watch_Dogs
  • GRID: Autosport

This is the second time that we are benching with CoD: Ghosts, ArmA III, and Sniper Elite 3, and the first time with GRID: Autosport and Battlefield 4. All games tested are single player with the exception of the Secret World. We are still using Sniper Elite V2 and GRID 2 as a comparison for the last few times as they are being replaced by Sniper Elite 3 and GRID: Autosport as part of ABT’s regular 28 game benchmark suite. Before we get to the GTX 980 performance charts, let’s look at overclocking, power draw and temperatures.

Overclocking, Power Draw & Temperatures

Overclocking, Power Draw, Noise and Temperatures

Overclocking the GTX 980 is just as easy as overclocking Kepler with the same features. We did notice that adding voltage acted to stabilize our GTX 980 more than it acted with Kepler GPUs. We were able to overclock further, adding +200MHz offset to the core with complete stability, even though we did not adjust the voltage nor our fan profile.

The only issues were again with GRiD 2 and now with Grid: Autosport where we had to drop the OC a notch or increase the voltage to complete the benchmark loops. We managed +225MHz overclock on all of the other 26 game benchmarks. However, we finally settled on +200MHz offset to the core clocks and +500MHz offset on the memory clocks with complete stability, and might have been able to go +50MHz higher on the memory clocks.

January 14, 2015 notes. The next three screenshots of PrecisionX were lost.

Here is the GTX 980 at idle with stock settings; only the power and temp targets are maxed out
Here is the GTX 980 at idle with stock settings; only the power and temp targets are maxed out

Using maxed-out Heaven 3.0 looped in a window at 1920×1080, with all settings at stock values (Power and temperature targets are always maxed out), the Peak Boost observed was 1265MHz and it settled in at 1240MHz when temperatures reached 78C. Even with temperatures in our testing room a hot 80F, temperatures never reached over 82C under full load, and the fan was unnoticeable at 63%. Voltage ranged from 1.174V to 1.200V (average) to a high of 1.225V with clocks at stock values under gaming load.

Maximum temps at stock settings
Maximum temps at stock settings

Adding +200MHz to the core brought the peak boost up to 1452MHz and it stabilized at 1440MHz when the temperature exceeded 78C. The fan needed to ramp up to a still very quiet 68% to keep the core temps below 84C. Boosting the memory clocks +500MHz brought the temps up to 85C, and the fan needed to ramp to 72% where it became more audible.

Adding overvoltage
Adding overvoltage

Adding the maximum voltage of +.88V, increased the core temps to the same 85C, but now the fan needed to work harder and ramped up to 79% where it was noticeable, but not irritating.

Here is a chart that used Heaven 3.0 for testing overclocking stability. We finally settled on +200MHz offset to the core and +500MHz offset to the memory as stable for all benching and games that we tested on a very hot Summer day.overclocking

Temperature

Our ambient (room temperatures) were hot – Summerlike – 79-81F as the desert South West was experiencing a major heatwave during testing. The GTX 980 runs quite cool at stock clocks even under load. However, once the core speed increased, so did the temperatures until we hit near the 1452MHz peak on the core from boost and temperatures would rise into the mid-80sC. 85C was the highest temperature that we observed and the highest the fan reached was 79%.

You can see from the performance charts what effects increasing core speed has on the GTX 980 – from the reference speed to our own +200MHz. The performance summary charts are up next and there is a separate overclocking chart showing the manual overclocking of the GTX 980 +200MHz on the core and +500MHz on the vRAM.

We are looking forward to our follow-up article which will include further testing for both the GTX 980 and also the GTX 970.

Noise

The GTX 980 is extraordinarily quiet for a high-end flagship card and in stark contrast to the reference version of the R9 290X. The GTX 680 and the reference GTX 770 are already quiet for powerful cards, but the GTX 980 along with the GTX 780/Ti are noticeably quieter. And we had to drop our overclock on our CPU and lower our CPU fans rpm to even notice them at all. The automatic fan profiles work well and needed no tweaking using our maximum overclock on each card.

It appears that Nvidia has especially tuned the GTX 980 to be quiet but not at the expense of cooling. It will be interesting to see what cooling designs their partners implement.

Let’s head to our performance charts.

Performance summary charts & graphs

Here are the summary charts of 28 games and 3 synthetic tests. The highest settings are always chosen and it is DX11 when there is a choice; DX10 is picked above DX9, and the settings are ultra or maxed. Specific settings are listed on the Main Performance chart at the end of this page. The benches are run at 1920×1200 and 2560×1600 with separate charts devoted to overclocking as well as comparing certain cards easily against each other

All results, except for Vantage and 3DMark11, show average framerates and higher is always better. In-game settings are fully maxed out and they are identically high or ultra across all platforms. We see some very impressive results with the GTX 980.

Main Overall Summary chart

The Big Picture

In the first column of the main performance summary chart, the GTX 780 is tested followed by the GTX 780 Ti and the 290X at Uber speeds in the third column. Column four is the new Maxwell GTX 980 followed by the +200MHz core/+500MHz memory overclocked GTX 980. Next to the overclocked 980 is GTX 680 SLI represented by a GTX 690 at GTX 680 SLI clocks in column 6 followed by a GTX 680 in Column 7. Columns 8 and 9 round out the chart with the GTX 770 and the R9 280X.

BigPicAssassin’s Creed IV has a hard cap on the framerate at 62.5 fps which is why we used 4xMSAA with the most powerful cards at 1920×1080; results over 62 fps are considered capped. Watch_Dogs, although improved with 2GB cards and Ultra Textures, the GTX 690/GTX 680 SLI is completely unplayable.

There is a lot of information to digest. So we will break down this Big Picture into two smaller charts which compare the new Maxwell card with targeted cards.

GTX 980 vs. GTX 780 Ti and GTX 980 OC vs. GTX 680 SLI

In our cut-down chart, we want to look at the GTX 780 Ti in the first column versus the GTX 790 in the second where ‘Wins’ are in Bold and if there is a tie, both numbers are Bolded. The results of the first two columns are directly compared with each other as are the third and fourth columns. In the third column we see the overclocked GTX 980 versus the GTX 690 at GTX 680 SLI clocks with wins bolded as usual. The last column shows the GTX 680 performance to show a comparison with GTX 680 SLI scaling and also versus the GTX 980.980vtiv680sliNvidia’s estimate of the GTX 980 as being approximately +8% faster across the board over the GTX 780 Ti appear to be correct from our own benches as the GTX 980 wins far more benches than it loses, and the performance delta will probably widen in Maxwell’s favor as drivers are optimized. It is also notable that the GTX 980 wins more of the newest DX11 games.

GTX 980 vs. R9 290X-Uber vs. GTX 780 Ti

Finally, we compare the R9 290X at Uber clocks in the first column with the GTX 980 in the second column, and with the GTX 780 Ti in the third. This time only the wins (and ties) between the Radeon and the GTX 980 are in Bold.

980-290x-ti

It is an absolute blowout and the R9 290X is left in the dust by the GTX 980. There are only three games out of 28 where the Radeon even manages to tie. And we can see that the overclocked GTX 980 scales very well with increasing clockspeeds and no overclocked watercooled 290X will manage to make up the deficit. We can only assume that unless AMD has a strong reply in a successor to the R9 290X, they will have to cut prices. And this is especially so if the $329 GTX 970 beats the R9 290 and factory overclocked versions match the 290X performance.

Let’s head for our conclusion.

Conclusion

This has been quite an enjoyable, if far too short, 4-day exploration for us in evaluating our new GTX 980 since Maxwell’s Editor’s Day in Monterey, California, this last week. It did very well performance-wise comparing it to the the GTX 780 Ti where it brings higher performance for a much lower $549 launch price than the GTX 780 Ti’s $699 launch price and even the GTX 780’s $649 pricing before it. We are totally impressed with the cool-running Maxwell GM204 chip that has such outstanding overclockability and a good price. It slots right above the GTX 780 Ti and far above the R9 290X or the GTX 780, and it offers more advantages than just price.

We see good overclockability with quietness at stock voltage and fan profile from the reference design GTX 980. The GTX 980 is a better value overall than the GTX 780 Ti even if you are looking at performance and probable price drops on the older and less energy-efficient GK110 cards.

Pros

  • Price – for $549 the GTX 980 is a versatile flagship card that is designed for heavy DX11 and DX12 gaming; it is much cheaper than this generation’s GTX 780 Ti flagship although it has more performance and new features. It also runs away from the Uber-clocked R9 290X.
  • TDP and power draw is superb at 165W. Performance per watt is almost two times the GTX 680 which debuted as Nvidia’s flagship just over two years ago. Unlike the R9 290X and the GTX 780s, the GTX 980 only needs two six-pin PCIe connectors which still allows for good overclocking.
  • Overclockability is excellent – GPU Boost works as advertised and voltage controls seem to be more effective with Maxwell than with Kepler.
  • The reference design cooling is quiet and efficient; the card and well-ventilated case stay cool even well-overclocked on a hot Summer day.
  • It is possible to use three of these cards for extreme Tri-SLI performance without needing a massive PSU
  • 3D Vision 2 and PhysX enhance gaming immersion and both are improved using the GTX 980 compared to the current generation. VR becomes possible.
  • GameWorks brings new features to gaming including realistic grass.
  • New MFAA allows for high performance without jaggies
  • DSR allows 4K crispness to come to 1080p
  • New ShadowPlay allows live streaming uploads to 60fps for 4K resolutions
  • G-Sync displays reduce and eliminate stuttering while retaining the advantages of minimizing tearing.
  • The GTX 980 is the fastest single-GPU video card – period!

Cons

  • None

The Verdict:

  • If you are buying a flagship video card right now and looking for the highest performance, the GTX 980 is a great value and bang-for-buck gaming video card that will stand tall even among the fastest dual-GPU cards of the last generation with just a bit more overclocking. When a great value is offered like this in a flagship card, and it launches solidly faster than its competitor, we feel it deserves ABT’s highest award – the “Kick Ass” award.

We do not know what the future will bring, but the GTX 980 brings an excellent top-performer to the GeForce family. With great features like GameWorks and the GeForce Experience, you can be assured of immersive gaming by picking this card for 1080P, 4K, or even higher resolutions including for Surround, 3D Vision Surround, or even VR.

If you currently game on an older generation video card, you will do yourself a big favor by upgrading. The move to a GTX 980 will give you better visuals on the DX11 and DX12 pathways and you are no doubt thinking of SLI or even Tri-SLI if you want to get ultimate gaming performance.

Of course, AMD offers their own set of features including Eyefinity, GCN 2.0, and Mantle. However, we expect that they will be forced to drop pricing on the reference and stock-clocked R9 290s and offer aggressive game bundles. There is also the possibility of a successor to the 290X which would have to make up a lot of ground to be competitive with the GTX 980, and we can’t help but look to large die successors to the GK110 and Kepler-based TITANs.

Stay tuned, there is a lot coming from us at ABT. Next up is a long delayed Tt eSports DRACONIUM aluminum mouse pad review and the Kingston 256GB mSATA SSD. We are also working on acquiring a GTX 970 for evaluation shortly. And don’t forget to check our forums! Our tech discussions are becoming among the best to be found anywhere!!