NVIDIA CUDA Force P2 State – Performance Analysis (Off vs. On)

Photo of author

NVIDIA ‘CUDA – Force P2 State’ Feature Performance Analysis (Off vs. On) — 15 games benchmarked using an RTX 3080

Several years ago, NVIDIA added a driver feature called ‘CUDA – Force P2 State’ which is the default setting. This performance analysis uses an RTX 3080 to showcase 15 PC games using this driver feature, off versus on, with our latest recommended GeForce Game Ready driver, and the latest version of Windows 10.

NVIDIA CUDA - Force P2 State feature & Nvidia Profile Inspector
The Nvidia Profile Inspector developed by Orbmu2k showing the CUDA Force P2 State driver feature and its available settings.

This driver feature it’s not accessible through the NVIDIA Control Panel, but you can use Nvidia Profile Inspector, a reliable third-party driver editor, to tweak its default setting. The current default is on, but you can change it to off globally or on a per-game profile basis using this useful tool.

Our testing platform is a recent install of Windows 10 64-bit Pro Edition, an i9-9900K with stock clocks, a Gigabyte Z390 AORUS PRO motherboard, and 32GB of Kingston DDR4 3333MHz. The games tested, settings, hardware, GeForce drivers, and Windows 10 build are identical except for the off versus on ‘CUDA – Force P2 State’ setting we compare.

Official Background on NVIDIA ‘CUDA – Force P2 State’

We asked NVIDIA about this feature and they replied:

[…] Basically, we added this p-state because running at max memory clocks for some CUDA applications can cause memory errors when running HUGE datasets. Think DL apps, oil exploration use cases, etc where you are crunching large numbers and it would error out with full memory clocks. These are the types of apps you really shouldn’t be running on GeForce anyway but since there are a lot of folks who do and were running into this issue we created this new mode for them.

It’s basically like a poor man’s version of ECC memory. That’s how we described it way back when…

[…] And if you’re gaming w/CUDA (say for instance using PhysX) it will give you full clocks. So gamers shouldn’t be affected by this mode.

We would like to thank NVIDIA for providing us with details about this driver feature. So let’s go to our analysis.

Before offering the performance data and charts of each different analysis scenario, it’s important to describe both the hardware and software configuration used in our testing as well as the analysis methodology.

Benching Methodology

Test Configuration – Hardware

  • Intel Core i9-9900K (Hyper-Threading/Turbo boost on; stock settings)
  • Gigabyte Z390 AORUS PRO motherboard (Intel Z390 chipset, v.F9 BIOS)
  • Kingston HyperX Predator 32GB DDR4 (2×16GB, dual-channel at 3333 MHz CL16)
  • Gigabyte AORUS GeForce RTX 3080 MASTER 10GB (rev. 1.0); v.F2 VBIOS, stock clocks
  • Samsung 500GB SSD 960 EVO NVMe M.2
  • Seagate 2TB Desktop SSHD SATA 3.1
  • Seagate 2TB FireCuda SATA 3.1
  • Corsair RM750x, 750W 80PLUS Gold power supply unit
  • ASUS ROG Swift PG279Q 27? IPS 2560 x 1440 165Hz 4ms G-Sync Monitor (G-Sync off, Fixed Refresh Rate on)

Test Configuration – Software

  • NVIDIA GeForce 460.89 drivers; High Quality & prefer maximum performance (on a per-game profile-basis); fixed refresh rate (globally).
  • V-Sync application controlled in the control panel, V-Sync off in-game.
  • AA and AF as noted in games; all in-game settings are specified.
  • Windows 10 64-bit Pro edition, latest updates v20H2, Game Mode, Game DVR & Game Bar features off.
  • GIGABYTE tools not installed.
  • Latest DirectX
  • All 15 games are patched to their latest versions at the time of publication.
  • 3DMark’s suite and UNIGINE Superposition benchmark, the latest version
  • Basemark GPU benchmark, v.1.1
  • UNIGINE Superposition, v.1.1
  • CapFrameX (CX), the latest version
  • RivaTuner Statistics Server (RTSS), the latest version
  • ISLC (Purge Standby List) before each benchmark.
  • Nvidia Profile Inspector, the latest version

GeForce Driver Suite-related

  • Standard Game Ready drivers are used.
  • The display driver is installed.
  • The latest version of PhysX is installed.

Hybrid & Non-Synthetic Tests-related

  • Single run per test.

Game Benchmarks-related

  • The corresponding built-in benchmark sequence is used.

Frametimes Capture & Analysis tool-related

  • CapFrameX is used for capturing and analyzing the relevant performance numbers obtained from each recorded built-in benchmark sequence.
  • Consecutive runs until detecting 3 valid runs (no outliers) that can be aggregated by CapFrameX using the following method:
    • Aggregate excluding outliers:
      • Outlier metric: Third, P0.2 (0.2% FPS percentile).
      • Outlier percentage: 3% (the % the FPS of an entry can differ from the median of all entries before counting as an outlier).
  • To compare and value the results and aggregated records in terms of percentages of gain/loss, we set the following thresholds to consider a certain % value as significant (not within the margin of error) for our benchmarking purposes:
    • Score/Avg FPS > 3% when valuing hybrid & non-synthetic benchmarks;
    • Avg FPS > 3% when valuing raw performance;
    • P1/P0.2 > 3% when valuing frame times consistency, after applying our custom formula

{[(LowPercentileFPS_2 / AvgFPS_2) / (LowPercentileFPS_1 / AvgFPS_1)] – 1} x 100

    • Adaptive STDEV (the standard deviation of values compared to moving average) > 3% when valuing frame times consistency.

Benchmark Suite: 15 PC Games, 4 Hybrid & 4 Non-Synthetic Tests

Hybrid Tests (3DMark)

  • Fire Strike Ultra
  • Time Spy Extreme
  • DirectX Raytracing feature test
  • Port Royal

Non-Synthetic Tests

  • Basemark GPU
  • UNIGINE Superposition
  • Neon Noir (Benchmark)
  • Boundary: Raytracing Benchmark

DX11 Games

  • Borderlands 3 (DX11)
  • Deus Ex: Mankind Divided (DX11)
  • Far Cry New Dawn
  • Tom Clancy’s Ghost Ghost Recon Breakpoint (DX11)
  • Neon Noir (Loop Mode)

DX12 Games

  • Borderlands 3 (DX12)
  • Cyberpunk 2077 (DX12)
  • Horizon Zero Dawn
  • Metro: Exodus (DX12)
  • Tom Clancy’s The Division 2 (DX12)
  • Watch Dogs: Legion (DX12)
  • Shadow of the Tomb Raider (DX12)
  • Godfall (NEW addition)

Vulkan Games

  • Tom Clancy’s Ghost Recon Breakpoint (VK)
  • Strange Brigade (VK)
  • Wolfenstein Youngblood
  • Quake 2 RTX (v.1.4.0)

NVIDIA Control Panel settings

Here are the global NVIDIA Control Panel settings:

Nvidia Control Panel – Global 3D Settings.

Both ‘High Quality’ value for texture filtering-quality setting and ‘Prefer maximum performance’ for power management mode are set on a per-game or program profile-basis via Manage 3D Settings > Program settings tab.

The Performance Summary Charts with 15 Games

Below are the summary charts of 15 games and 4 hybrid and 4 non-synthetic benchmarks used to compare the games’ performance changes with ‘CUDA Force P2 State’ disabled and enabled, using the AORUS RTX 3080 MASTER. We list the graphics settings on the charts, and we run each built-in game benchmark’s sequence at 2560×1440, except for Borderlands 3 and Far Cry New Dawn which are tested at 150% resolution scaling. You may click on each chart to open a pop-up for best viewing.

Results give average framerates, and higher is better. We display the low FPS percentiles (P1 and P0.2) below the corresponding averages. This time, we include the corresponding adaptive STDEV values and consider the percentages of gain/loss in the adaptive STDEV values as an additional criterion to value frametimes stability, and loss is better. We use CapFrameX to record frametimes over time to visualize and convert them into their corresponding average FPS, P1, P0.2, and adaptive STDEV values. We also show percentages of gain/loss in both raw performance (average FPS) and, when applicable, in frametimes consistency or stability between the different testing scenarios. To calculate the gains or losses in stability based on P1 and P0.2 FPS percentiles we applied our custom formula:

{[(LowPercentileFPS_2 / AvgFPS_2) / (LowPercentileFPS_1 / AvgFPS_1)] – 1} x 100

We mark significant performance changes (higher than 3%) in bold and use purple or orange font for the significant improvements or regressions respectively.

NVIDIA CUDA - Force P2 State & Hybrid Benchmarks
Hybrid Benchmarks.
NVIDIA CUDA- Force P2 State & Non-synthetic Benchmarks
Non-Synthetic Benchmarks.
NVIDIA CUDA - Force P2 State & DX11 Built-in Game Benchmarks
DirectX 11 Games – Built-in benchmarks.
NVIDIA CUDA - Force P2 State & DX12 Games
DirectX 12 Games – Built-in benchmarks (except Cyberpunk 2077, tested using the BTR custom sequence).
Vulkan Games – Built-in benchmarks.
DirectX Raytracing Games – Built-in benchmarks (except Cyberpunk 2077, tested using the BTR custom sequence).
Vulkan Raytracing Games – Built-in benchmarks.

Notes on NVIDIA ‘CUDA – Force P2 State’ performance (Off vs. On)

From the charts, we see no significant performance changes between ‘CUDA – Force P2 State’, off versus on, for both the hybrid and non-synthetic tests.

For the game benchmarks, although there aren’t significant differences in raw performance with ‘CUDA – Force P2 State’ disabled, there is a mix of significant performance improvements and regressions in frametimes stability that affect different games and 3D API scenarios. The games that show significant improvements are Borderlands 3 (DX12 API mode), Metro Exodus (DX12 API mode, with and without RT), Cyberpunk 2077 (with RT features on), and Quake 2 RTX (KHR Vulkan API mode). On the other hand, the games that present significant regressions are Deus Ex: Mankind Divided (DX11 API mode), Watch Dogs: Legion (DX12 API mode), Godfall, and Wolfenstein Youngblood (with RT features off). The rest of the games show no significant changes in terms of frametimes consistency.

Disclaimer

Please be aware that the following results, notes, and the corresponding ‘CUDA- Force P2 State’ setting recommendation are valid for similar Ampere gaming rigs using GeForce Game Ready 460.89 driver and Windows 10 v20H2. Its representativeness, applicability, and usefulness on different GPU architectures, testing benches, GPU drivers and MS Windows 10 versions may vary.

Conclusion

Based on our mixed and somewhat inconsistent results and findings, we recommend keeping the NVIDIA ‘CUDA – Force P2 State’ driver feature globally enabled (default) for gaming. You can always disable this feature on a per-game profile basis after running your own tests if you find any significant benefit in frametimes consistency. But pay special attention to the changes in the adaptive STDEV values and not only the usual lows.

Let’s play!

***

Rodrigo González (aka “RodroG”) is an enthusiast gamer and tech reviewer interested especially in shooter games, open-world role-playing games, and software and hardware benchmarking. He is the author of the NVIDIA WHQL Driver Performance Benchmarks Series and founder and moderator of the r/allbenchmarks community on Reddit.