Original Link: https://www.anandtech.com/show/6973/nvidia-geforce-gtx-780-review
NVIDIA GeForce GTX 780 Review: The New High End
by Ryan Smith on May 23, 2013 9:00 AM ESTAs the two year GPU cycle continues in earnest, we’ve reached the point where NVIDIA is gearing up for their annual desktop product line refresh. With the GeForce 600 series proper having launched over a year ago, all the way back in March of 2012, most GeForce 600 series products are at or are approaching a year old, putting us roughly halfway through Kepler’s expected 2 year lifecycle. With their business strongly rooted in annual upgrades, this means NVIDIA’s GPU lineup is due for a refresh.
How NVIDIA goes about their refreshes has differed throughout the years. Unlike the CPU industry (specifically Intel), the GPU industry doesn’t currently live on any kind of tick-tock progression method. New architectures are launched on new process nodes, which in turn ties everything to the launch of those new process nodes by TSMC. Last decade saw TSMC doing yearly half-node steps, allowing incremental fab-driven improvements every year. But with TSMC no longer doing half-node steps as of 40nm, this means fab-drive improvements now come only every two years.
In lieu of new process nodes and new architectures, NVIDIA has opted to refresh based on incremental improvements within their product lineups. With the Fermi generation, NVIDIA initially shipped most GeForce 400 Fermi GPUs with one or more disabled functional units. This helped to boost yields on a highly temperamental 40nm process, but it also left NVIDIA an obvious route of progression for the GeForce 500 series. With the GeForce 600 series on the other hand, 28nm is relatively well behaved and NVIDIA has launched fully-enabled products at almost every tier, leaving them without an obvious route of progression for the Kepler refresh.
So where does NVIDIA go from here? As it turns out NVIDIA’s solution for their annual refresh is essentially the same: add more functional units. NVIDIA of course doesn’t have more functional units to turn on within their existing GPUs, so instead they’re doing the next best thing, acquiring more functional units by climbing up the GPU ladder itself. And with this in mind, this brings us to today’s launch, the GeForce GTX 780.
The GeForce GTX 780 is the follow-up to last year’s GeForce GTX 680, and is a prime example of refreshing a product line by bringing in a larger, more powerful GPU that was previously relegated to a higher tier product. Whereas GTX 680 was based on a fully-enabled GK104 GPU, GTX 780 is based on a cut-down GK110 GPU, NVIDIA’s monster GPU first launched into the prosumer space with GTX Titan earlier this year. Going this route doesn’t offer much in the way of surprises since GK110 is a known quantity, but as we’ll see it allows NVIDIA to improve performance while slowly bringing down GPU prices.
GTX Titan | GTX 780 | GTX 680 | GTX 580 | |
Stream Processors | 2688 | 2304 | 1536 | 512 |
Texture Units | 224 | 192 | 128 | 64 |
ROPs | 48 | 48 | 32 | 48 |
Core Clock | 837MHz | 863MHz | 1006MHz | 772MHz |
Shader Clock | N/A | N/A | N/A | 1544MHz |
Boost Clock | 876Mhz | 900Mhz | 1058MHz | N/A |
Memory Clock | 6GHz GDDR5 | 6GHz GDDR5 | 6GHz GDDR5 | 4GHz GDDR5 |
Memory Bus Width | 384-bit | 384-bit | 256-bit | 384-bit |
VRAM | 6GB | 3GB | 2GB | 1.5GB |
FP64 | 1/3 FP32 | 1/24 FP32 | 1/24 FP32 | 1/8 FP32 |
TDP | 250W | 250W | 195W | 244W |
Transistor Count | 7.1B | 7.1B | 3.5B | 3B |
Manufacturing Process | TSMC 28nm | TSMC 28nm | TSMC 28nm | TSMC 40nm |
Launch Price | $999 | $649 | $499 | $499 |
As the first of the desktop GeForce 700 lineup, GeForce GTX 780 is in almost every sense of the word a reduced price, reduced performance version of GTX Titan. This means that on the architectural side we’re looking at the same GK110 GPU, this time with fewer functional units. Titan’s 14 SMXes have been reduced to just 12 SMXes, reducing the shader count from 2688 to 2304, and the texture unit count from 224 to 192.
At the same time because NVIDIA has gone from disabling 1 SMX (Titan) to disabling 3 SMXes, GTX 780’s GPC count is now going to be variable thanks to the fact that GK110 packs 3 SMXes to a GPC. GTX 780 cards will either have 5 GPCs or 4 GPCs depending on whether the 3 disabled SMXes are all in the same GPC or not. This is nearly identical to what happened with the GTX 650 Ti, and as with the GTX 650 Ti it’s largely an intellectual curiosity since the difference in GPCs won’t notably impact performance. But it is something worth pointing out.
Moving on with our Titan comparison, much to our surprise NVIDIA has not touched the ROP/memory blocks at all (something they usually do), meaning GTX 780 comes with all 48 ROPs tied to a 384bit memory bus just as Titan does. Clockspeeds aside, this means that GTX 780 maintains Titan’s ROP/memory throughput rather than taking a performance hit, which bodes well for ROP and memory-bound scenarios. Note however that while the memory bus is the same width, NVIDIA has dropped Titan’s massive 6GB of RAM for a more conservative 3GB, giving GTX 780 the same memory bandwidth while giving it less RAM overall.
As for clockspeeds, clockspeeds have actually improved slightly, thanks to the fact that fewer SMXes need to be powered. Whereas GTX Titan had a base clockspeed of 837MHz, GTX 780 is 2 bins higher at 863MHz, with the boost clock having risen from 876MHz to 900MHz. Memory clocks meanwhile are still at 6GHz, the same as Titan, giving GTX 780 the full 288GB/sec of memory bandwidth to work from.
Taken in altogether, when it comes to theoretical performance GTX 780 should have 88% of Titan’s shading, texturing, and geometry performance, and 100% of Titan’s memory bandwidth. Meanwhile on the ROP side of matters, we actually have an interesting edge case where thanks to GTX 780’s slightly higher clockspeeds, its theoretical ROP performance exceeds Titan’s by about 3%. In practice this doesn’t occur – the loss of the SMXes is far more significant – but in ROP-bound scenarios GTX 780 should be able to stay close to Titan.
For better or worse, power consumption is also going to be very close between GTX 780 and Titan. Titan had a 250W TDP and so does GTX 780, so there won’t be much of a decrease in power consumption despite the decrease in performance. This is more atypical of NVIDIA since lower tier products usually have lower TDPs, but ultimately it comes down to leakage, binning, and the other factors that dictate how GPU tiers need to be structured so that NVIDIA can harvest as many GPUs as possible. On the other hand the fact that the TDP is still 250W (with the same +6% kicker) means that GTX 780 should have a bit more TDP headroom than Titan since GTX 780 has fewer SMXes and RAM chips to power.
On a final note from a feature/architecture standpoint there are a couple of differences between the GTX 780 and GTX Titan that buyers will want to be aware of. Even though Titan is being sold under the GeForce label, it was essentially NVIDIA’s first prosumer product, crossing over between gaming and compute. GTX 780 on the other hand is a pure gaming/consumer part like the rest of the GeForce lineup, meaning NVIDIA has stripped it of Titan’s marquee compute feature: uncapped double precision (FP64) performance. As a result GTX 780 can offer 90% of GTX Titan’s gaming performance, but it can only offer a fraction of GTX Titan’s FP64 compute performance, topping out at 1/24th FP32 performance rather than 1/3rd like Titan. Titan essentially remains as NVIDIA’s entry-level compute product, leaving GTX 780 to be a high-end gaming product.
Meanwhile, compared to the GTX 680 which it will be supplanting, the GTX 780 should be a big step up in virtually every way. As NVIDIA likes to put it, GTX 780 is 50% more of everything than GTX 680. 50% more SMXes, 50% more ROPs, 50% more RAM, and 50% more memory bandwidth. In reality due to the clockspeed differences the theoretical performance difference isn’t nearly as large – we’re looking at just a 29% increase in shading/texturing/ROP performance – but this still leaves GTX 780 as being much more powerful than its predecessor. The tradeoff of course being that with a 250W TDP versus GTX 680’s 195W TDP, GTX 780 also draws around 28% more power; without a process node improvement, performance improvements generally come about by moving along the power/performance curve.
Moving on to pricing and competitive positioning, it unfortunately won’t just be GTX 780’s performance that’s growing. As we’ve already seen clearly with the launch of GTX Titan, GK110 is in a class of its own as far as GPUs go; AMD simply doesn’t have a GPU big enough to compete on raw performance. Consequently NVIDIA is under no real pricing pressure and can price GTX 780 wherever they want. In this case GTX 780 isn’t just 50% more hardware than the GTX 680, but it’s about 50% more expensive too. NVIDIA will be pricing the GTX 780 at $650, $350 below the GTX Titan and GTX 690, and around $200-$250 more than the GTX 680. This has the benefit of bringing Titan-like performance down considerably, but as an x80 card it’s priced well above its predecessor, which launched back at the more traditional price point of $500. NVIDIA is no stranger to the $650 price point – they initially launched the GTX 280 there back in 2008 – but this is the first time in years they’ll be able to hold that position.
At $650, the GTX 780 is more of a gap filler than it is a competitor. Potential Titan buyers will want to pay close attention to the GTX 780 since it offers 90% of Titan’s gaming performance, but that’s about it for GTX 780’s competition. Above it the GTX 690 and Radeon HD 7990 offer much better gaming performance for much higher prices (AFR issues aside), and the next-closest card below GTX 780 will be the GTX 680 and Radeon HD 7970 GHz Edition, for which GTX 780 is 20%+ faster. As a cheaper Titan this is a solid price, but otherwise it’s still somewhat of a luxury card compared to the GTX 680 and its ilk.
Meanwhile as far as availability goes this will be a standard hard launch. And unlike GTX Titan and GTX 690 all of NVIDIA’s usual partners will be participating, so there will be cards from a number of companies available from day one, with semi-custom cards right around the corner.
Finally, looking at GTX 780 as an upgrade path, NVIDIA’s ultimate goal here isn’t to sell the card as an upgrade to existing GTX 680 owners, but rather as with past products the upgrade path is targeted at those buying video cards at 2+ year intervals. GTX 580 is 2.5 years old, while GTX 480 and GTX 280 are older still. A $650 won’t move GTX 680 owners, but with GTX 780 in some cases doubling GTX 580’s performance NVIDIA believe it may very well move Fermi owners, and they’re almost certainly right.
May 2013 GPU Pricing Comparison | |||||
AMD | Price | NVIDIA | |||
AMD Radeon HD 7990 | $1000 | GeForce GTX Titan/GTX 690 | |||
$650 | GeForce GTX 780 | ||||
Radeon HD 7970 GHz Edition | $450 | GeForce GTX 680 | |||
Radeon HD 7970 | $390 | ||||
$350 | GeForce GTX 670 | ||||
Radeon HD 7950 | $300 |
Meet The GeForce GTX 780
As we previously mentioned, the GTX 780 is very much a Titan Mini in a number of ways. This goes for not only the architecture, features, and performance, but as it turns out it will be the case for the design too. For the reference GTX 780 NVIDIA will be straight-up reusing the GTX Titan’s board design, from the PCB to the cooler, and everything in between.
As a result the reference GTX 780 inherits all of the great things about the GTX Titan’s design. We won’t go into significant detail here – please read our GTX Titan review for a full breakdown and analysis of Titan’s design – but in summary this means we’re looking at a very well built blower design almost entirely constructed out of metal. GTX 780 is a 10.5” long card composed of a cast aluminum housing, a nickel-tipped heatsink, an aluminum baseplate, and a vapor chamber providing heat transfer between the GPU and the heatsink. The end result is that the reference GTX 780 like Titan before it is an extremely quiet card despite the fact that it’s a 250W blower design, while it also maintains the solid feel and eye-catching design of GTX Titan.
Drilling down, the PCB is also a re-use from Titan. It’s the same GK110 GPU mounted on the same PCB with the same 6+2 phase power design. This is part of the reason that GTX 780 has the same TDP as GTX Titan, while at the same time giving GTX 780 as much or more TDP headroom than Titan itself. Using the same PCB also means that GTX 780 has the same 6pin + 8pin power requirement and the same display I/O configuration of 2x DL-DVI, 1x HDMI, 1x DisplayPort 1.2.
Also being carried over from Titan is GPU Boost 2.0, which was first introduced there and has since been added to additional products (many GeForce 700M products already have it). GPU Boost is essentially a further min-maxed turbo scheme that more closely takes into account temperatures and GPU leakage characteristics to determine what boost bins can be used while staying below TDP. It’s more temperature dependent than the original GPU Boost and as a result more variable, but in cooler situations it allows tapping into that thermal headroom to hit higher clockspeeds and greater performance, TDP allowing. At the same time this means GTX 780 also gains GPU Boost 2.0’s temperature target functionality, which allows users to cap boost by temperature as well as TDP. As with Titan this limit is 80C by default, with the idea being that adjusting the limit is a proxy for adjusting the performance of the card and the amount of noise it generates.
Meet The GeForce GTX 780, Cont
With all of that said, GTX 780 does make one notable deviation from GTX Titan. NVIDIA has changed their stock fan programming for GTX 780, essentially slowing down the fan response time to even out fluctuations in fan speeds. NVIDIA has told us that they’ve found that next to loud fans in general, the second most important factor in fan noise becoming noticeable is rapidly changing fan speeds, with the changing pitch and volume drawing attention to the card. Slowing down the response time in turn will in theory keep the fan speed from spiking so much, or quickly dropping (i.e. loading screen) only to have to immediately jump back up again.
In our experience fan response times haven’t been an issue with Titan or past NVIDIA cards, and we’d be hard pressed to tell the difference between GTX 780 and Titan. With that said there’s nothing to lose from this change, GTX 780 doesn’t seem to be in any way worse for it, so in our eyes there’s no reason for NVIDIA not to go ahead with the change.
On that note, since this is purely a software(BIOS) change, we asked NVIDIA about whether this could be backported to the hardware equivalent Titan. The answer is fundamentally yes, but because NVIDIA doesn’t have a backup BIOS system, they aren’t keen on using BIOS flashing any more than necessary. So an official (or even unofficial) update from NVIDIA is unlikely, but given the user community’s adept BIOS modding skills it’s always possible a 3rd party could accomplish this on their own.
Moving on, unlike Titan and GTX 690, NVIDIA will be allowing partners to customize GTX 780, making this the first line of GK110 cards to allow customization. Potential buyers that were for whatever reason disinterested in Titan due to its blower will find that NVIDIA’s partners are already putting together more traditional open air cooler coolers for GTX 780. We can’t share any data about them yet – today is all about the reference card – but we already have one such card in-hand with EVGA’s GeForce GTX 780 ACX.
The reference GTX 780 sets a very high bar in terms of build quality and performance, so it will be interesting to see what NVIDIA’s partners can come up with. With NVIDIA testing and approving all designs under their Greenlight program, all custom cards have to meet or beat NVIDIA’s reference card in factors such as noise and power delivery, which for GTX 780 will not be an easy feat. However because of this requirement it means NVIDIA’s partners can deviate from NVIDIA’s reference design without buyers needing to be concerned that custom cards are significantly worse than then reference cards, something that benefits NVIDIA’s partners by their being able to attest to the quality of their products (“it got through Greenlight”), and benefitting buyers by letting them know they’re getting something that will be as good as the reference GTX 780, regardless of the specific make or model.
On that note, since we’re talking about card construction let’s quickly dive into overclocking. Overclocking is essentially unchanged from GTX Titan, especially since everything so far is using the reference PCB. The maximum power target remains at 106% (265W) and the maximum temperature target remains at 95C. Buyers will be able to adjust these as they please through Precision X and other tools, but no more than they already could on Titan, which means overclocking is fairly locked down.
Overvolting is also supported in a Titan-like manner, and once again is at the discretion of the card’s partner. By default GTX 780 has a maximum voltage of 1.1625v, with approved overvolting allowing the card to be pushed to 1.2v. This comes in the form of higher boost bins, so enabling overvolting is equivalent to unlocking a +13MHz bin and a +26MHz bin and their requisite voltages. However this also means that those voltages aren’t typically reached with overclocking and overvolting only has a minimal effect, as most overclocking attempts are going to hit TDP limits before they hit the unlocked boost bins.
GeForce Clockspeed Bins | ||||
Clockspeed | GTX Titan | GTX 780 | ||
1032MHz | N/A | 1.2v | ||
1019MHz | 1.2v | 1.175v | ||
1006MHz | 1.175v | 1.1625v | ||
992MHz | 1.1625v | 1.15v | ||
979MHz | 1.15v | 1.137v | ||
966MHz | 1.137v | 1.125v | ||
953MHz | 1.125v | 1.112v | ||
940MHz | 1.112v | 1.1v | ||
927MHz | 1.1v | 1.087v | ||
914MHz | 1.087v | 1.075v |
Software: GeForce Experience, Out of Beta
Along with the launch of the GTX 780 hardware, NVIDIA is also using this opportunity to announce and roll out new software. Though they are (and always will be) fundamentally a hardware company, NVIDIA has been finding that software is increasingly important to the sales of their products. As a result the company has taken on several software initiatives over the years, both on the consumer side and the business side. To that end the products launching today are essentially a spearhead as part of a larger NVIDIA software ecosystem.
The first item on the list is GeForce Experience, NVIDIA’s game settings advisor. You may remember GeForce Experience from the launch of the GTX 690, which is when GeForce Experience was first announced. The actual rollout of GeForce Experience was slower than NVIDIA projected, having gone from an announcement to a final release in just over a year. Never the less, there is a light at the end of the tunnel and with version 1.5, GeForce Experience is finally out of beta and is being qualified as release quality.
So what is GeForce Experience? GFE is in a nutshell NVIDIA’s game settings advisor. The concept itself is not new, as games have auto-detected hardware and tried to set appropriate settings, and even NVIDIA has toyed with the concept before with their Optimal Playable Settings (OPS) service. The difference between those implementations and GFE comes down to who’s doing the work of figuring this out, and how much work is being done.
With OPS NVIDIA was essentially writing out recommended settings by hand based on human play testing. That process is of course slow, making it hard to cover a wide range of hardware and to get settings out for new games in a timely manner. Meanwhile with auto-detection built-in to games the quality of the recommendations is not a particular issue, but most games based their automatic settings around a list of profiles, which means most built-in auto-detection routines were fouled up by newer hardware. Simply put, it doesn’t do NVIDIA any good if a graphical showcase game like Crysis 3 selects the lowest quality settings because it doesn’t know what a GTX 780 is.
NVIDIA’s solution of choice is to take on most of this work themselves, and then move virtually all of it to automation. From a business perspective this makes great sense for NVIDIA as they already have the critical component for such a service, the hardware. NVIDIA already operates large GPU farms in order to test drivers, a process that isn’t all that different from what they would need to do to automate the search for optimal settings. Rather than regression testing and looking for errors, NVIDIA’s GPU farms can iterate through various settings on various GPUs in order to find the best combination of settings that can reach a playable level of performance.
By iterating through the massive matrix of settings most games offer, NVIDIA’s GPU farms can do most of the work required. What’s left for humans is writing test cases for new games, something again necessary for driver/regression testing, and then identifying which settings are more desirable from a quality perspective so that those can be weighted and scored in the benchmarking process. This means that it’s not entirely a human-free experience, but having a handful of engineers writing test cases and assigning weights is a much more productive use of time than having humans test everything by hand like it was for OPS.
Moving on, all of this feeds into NVIDIA’s GFE backend service, which in turn feeds the frontend in the form of the GFE client. The GFE client has a number of features (which we’ll get into in a moment), but for the purposes of GFE its primary role is to find games on a user’s computer, pull optimal settings from NVIDIA, and then apply those settings as necessary. All of this is done through a relatively straightforward UI, which lists the detected games, the games’ current settings, and NVIDIA’s suggested settings.
The big question of course is whether GFE’s settings are any good, and in short the answer is yes. NVIDIA’s settings are overall reasonable, and more often than not have closely matched the settings we use for benchmarking. I’ve noticed that they do have a preference for FXAA and other pseudo-AA modes over real AA modes like MSAA, but at this point that’s probably a losing battle on my part given the performance hit of MSAA.
For casual users NVIDIA is expecting this to be a one-stop solution. Casual users will let GFE go with whatever it thinks are the best settings, and as long as NVIDIA has done their profiling right users will get the best mix of quality at an appropriate framerate. For power users on the other hand the expectation isn’t necessarily that those users will stick with GFE’s recommended settings, but rather GFE will provide a solid baseline to work from. Rather than diving into a new game blindly, power users can start with GFE’s recommended settings and then turn things down if the performance isn’t quite high enough, or adjust some settings for others if they favor a different tradeoff in quality. On a personal note this exactly matches what I’ve been using GFE for since the earlier betas landed in our hands, so it seems like NVIDIA is on the mark when it comes to power users.
With all of that said, GeForce Experience isn’t going to be a stand-alone game optimization product but rather the start of a larger software suite for consumers. GeForce Experience has already absorbed the NVIDIA Update functionality that previously existed as a small optional install in NVIDIA’s drivers. It’s from here that NVIDIA is going to be building further software products for GeForce users.
The first of these expansions will be for SHIELD, NVIDIA’s handheld game console launching next month. One of SHIELD’s major features is the ability to stream PC games to the console, which in turn requires a utility running on the host PC to provide the SHIELD interface, control mapping, and of course video encoding and streaming. Rather than roll that out as a separate utility, that functionality will be built into future versions of GeForce Experience.
To that end, with the next release of drivers for the GTX 780 GeForce Experience will be bundled with NVIDIA’s drivers, similar to how NVIDIA Update is today. Like NVIDIA Update it will be an optional-but-default item, so users can opt out of it, but if the adoption is anything like NVIDIA Update then the expectation is that most users will end up installing GFE.
It would be remiss of us to not point out the potential for bloat here, but we’ll have to see how this plays out. In terms of file size GeForce Experience is rather tiny at 11MB (versus 169MB for the 320.14 driver package), so after installer overhead is accounted for it should add very little to the size of the GeForce driver package. Similarly it doesn’t seem to have any real appetite for system resources, but this is the wildcard since it’s subject to change as NVIDIA adds more functionality to the client.
Software, Cont: ShadowPlay and "Reason Flags"
Along with providing the game optimization service and SHIELD’s PC client, GeForce Experience has another service that’s scheduled to be added this summer. That service is called ShadowPlay, and not unlike SHIELD it’s intended to serve as a novel software implementation of some of the hardware functionality present in NVIDIA’s latest hardware.
ShadowPlay will be NVIDIA’s take on video recording, the novel aspect of it coming from the fact that NVIDIA is basing the utility around Kepler’s hardware H.264 encoder. To be straightforward video recording software is nothing new, as we have FRAPS, Afterburner, Precision X, and other utilities that all do basically the same thing. However all of those utilities work entirely in software, fetching frames from the GPU and then encoding them on the CPU. The overhead from this is not insignificant, especially due to the CPU time required for video encoding.
With ShadowPlay NVIDIA is looking to spur on software developers by getting into video recording themselves, and to provide superior performance by using hardware encoding. Notably this isn’t something that was impossible prior to ShadowPlay, but for some reason recording utilities that use NVIDIA’s hardware H.264 encoder have been few and far between. Regardless, the end result should be that most of the overhead is removed by relying on the hardware encoder, minimally affecting the GPU while freeing up the CPU, reducing the amount of time spent on data transit back to the CPU, and producing much smaller recordings all at the same time.
ShadowPlay will feature multiple modes. Its manual mode will be analogous to FRAPS, recording whenever the user desires it. The second mode, shadow mode, is perhaps the more peculiar mode. Because the overhead of recording with the hardware H.264 encoder is so low, NVIDIA wants to simply record everything in a very DVR-like fashion. In shadow mode the utility keeps a rolling window of the last 20 minutes of footage, with the goal being that should something happen that the user decides they want to record after the fact, they can simply pull it out of the ShadowPlay buffer and save it. It’s perhaps a bit odd from the perspective of someone who doesn’t regularly record their gaming sessions, but it’s definitely a novel use of NVIDIA’s hardware H.264 encoder.
NVIDIA hasn’t begun external beta testing of ShadowPlay yet, so for the moment all we have to work from is screenshots and descriptions. The big question right now is what the resulting quality will be like. NVIDIA’s hardware encoder does have some limitations that are necessary for real-time encoding, so as we’ve seen in the past with qualitative looks at NVIDIA’s encoder and offline H.264 encoders like x264, there is a quality tradeoff if everything has to be done in hardware in real time. As such ShadowPlay may not be the best tool for reference quality productions, but for the YouTube/Twitch.tv generation it should be more than enough.
Anyhow, ShadowPlay is expected to be released sometime this summer. But since 95% of the software ShadowPlay requires is also required for the SHIELD client, we wouldn’t be surprised if ShadowPlay was released shortly after a release quality version of the SHIELD client is pushed out, which may come as early as June alongside the SHIELD release.
Reasons: Why NVIDIA Cards Throttle
The final software announcement from NVIDIA to coincide with the launch of the GTX 780 isn’t a software product in and of itself, but rather an expansion of NVIDIA’s 3rd party hardware monitoring API.
One of the common questions/complaints about GPU Boost that NVIDIA has received over the last year is about why a card isn’t boosting as high as it should be, or why it suddenly drops down a boost bin or two for no apparent reason. For technically minded users who know the various cards’ throttle points and specifications this isn’t too complex – just look at the power consumption, GPU load, and temperature – but that’s a bit much to ask of most users. So starting with the recently released 320.14 drivers, NVIDIA is exposing a selection of flags through their API that indicate what throttle point is causing throttling or otherwise holding back the card’s clockspeed. There isn’t an official name for these flags, but “reasons” is as good as anything else, so that’s what we’re going with.
The reasons flags are a simple set of 5 binary flags that NVIDIA’s driver uses to indicate why it isn’t increasing the clockspeed of the card further. These flags are:
- Temperature Limit – the card is at its temperature throttle point
- Power Limit – The card is at its global power/TDP limit
- Voltage Limit – The card is at its highest boost bin
- Overvoltage Max Limit – The card’s absolute maximum voltage limit (“if this were to occur, you’d be at risk of frying your GPU”)
- Utilization Limit – The current workload is not high enough that boosting is necessary
As these are simple flags, it’s up to 3rd party utilities to decide how they want to present these flags. EVGA’s Precision X, which is NVIDIA’s utility of choice for sampling new features to the press, simply records the flags like it does the rest of the hardware monitoring data, and this is likely what most programs will do.
With the reason flags NVIDIA is hoping that this will help users better understand why their card isn’t boosting as high as they’d like to. At the same time the prevalence of GPU Boost 2.0 and its much higher reliance on temperature makes exposing this data all the more helpful, especially for overclockers that would like to know what attribute they need to turn up to unlock more performance.
Our First FCAT & The Test
First announced back at the end of March, FCAT has been something of a bewildering experience for us. NVIDIA has actually done a great job on the software, but between picky games, flaky DVI cables, and dead SSDs (we killed an Intel enterprise grade SSD 910 with FCAT) things have not gone quite to plan, pushing back our intended use of FCAT more than once. In any case, with most of the kinks worked out we’re ready to start integrating it into our major GPU reviews.
For the time being we’re putting FCAT on beta status, as we intend to try out a few different methods of presenting data to find something that’s meaningful, useful, and legible. To that end we’d love to get your feedback in our comments section so that we can further iterate on our presentation and data collection.
We’ve decided to go with two metrics for our first run with FCAT. The first metric is rather simple: 95th percentile frametimes. For years we’ve done minimum framerates (when practical), which are similar in concept, so this allows us to collect similar stats at the end of the rendering pipeline while hopefully avoiding some of the quirkiness that comes from looking at minimum framerates within games themselves. The 95th percentile frametime is quite simply the amount of time it takes to render the slowest 5% of frames. If a game or video card is introducing significant one-off stuttering by taking too long to render some frames, this will show us.
This is primarily meant to capture single-GPU issues, but in practice with AMD having fixed the bulk of their single-GPU issues months ago, we don’t actually expect much. None the less it’s a good way of showing that nothing interesting is happening in those situations.
Our second metric is primarily focused on multi-GPU setups, and is an attempt to quantize the wild frametime variations seen at times with multi-GPU setups, which show up as telltale zigzag lines in frametime graphs.
In this metric, which for the moment we’re calling Delta Percentages, we’re collecting the deltas (differences) between frametimes, averaging that out, and then running the delta average against the average frametime of the entire run. The end result of this process is that we can measure whether sequential frames are rendering in roughly the same amount of time, while controlling for performance differences by looking at the data relative to the average frametime (rather than as absolute time).
In general, a properly behaving single-GPU card should have a delta average of under 3%, with the specific value depending in part on how variable the workload is throughout any given game benchmark. 3% may sound small, but since we’re talking about an average it means it’s weighed against the entire run. The higher the percentage the more unevenly frames are arriving, and exceeding 3% is about where we expect players with good eyes to start noticing a difference. Alternatively in a perfectly frame metered situation, such as v-sync enabled with a setup that can always hit 60fps, then this would be a flat 0%, representing the pinnacle of smoothness.
Moving on, we’ll be running FCAT against 6 of our 10 games for the time being: Sleeping Dogs, Hitman: Absolution, Total War: Shogun 2, Battlefield 3, Bioshock, and Crysis 3. The rest of our games are either highly inconsistent or generally fussy, introducing too much variance into our FCAT results.
Finally, due to the amount of additional time it takes to put together FCAT results, we’re going to primarily publish FCAT results with major product launches and major driver updates. Due to how frame metering works, the only time frame consistency significantly changes is either with the introduction of new architectures/GPUs, or with the introduction of significant driver changes, so those are the scenarios we’ll be focusing on.
The Test
NVIDIA’s launch drivers for the GTX 780 are 320.18, drivers that are essentially identical to the public 320.14 drivers released last week.
CPU: | Intel Core i7-3960X @ 4.3GHz |
Motherboard: | EVGA X79 SLI |
Power Supply: | Antec True Power Quattro 1200 |
Hard Disk: | Samsung 470 (256GB) |
Memory: | G.Skill Ripjaws DDR3-1867 4 x 4GB (8-10-9-26) |
Case: | Thermaltake Spedo Advance |
Monitor: | Samsung 305T |
Video Cards: |
AMD Radeon HD 7970 GHz Edition AMD Radeon HD 7990 NVIDIA GeForce GTX 580 NVIDIA GeForce GTX 680 NVIDIA GeForce GTX 690 NVIDIA GeForce GTX 780 NVIDIA GeForce GTX Titan |
Video Drivers: |
NVIDIA ForceWare 320.14 NVIDIA ForceWare 320.18 AMD Catalyst 13.5 Beta 2 |
OS: | Windows 8 Pro |
DiRT: Showdown
As always, starting off our benchmark collection is our racing benchmark, DiRT: Showdown. DiRT: Showdown is based on the latest iteration of Codemasters’ EGO engine, which has continually evolved over the years to add more advanced rendering features. It was one of the first games to implement tessellation, and also one of the first games to implement a DirectCompute based forward-rendering compatible lighting system. At the same time as Codemasters is by far the most prevalent PC racing developers, it’s also a good proxy for some of the other racing games on the market like F1 and GRID.
DiRT: Showdown’s lighting system continues to befuddle us at times. Though GK10x Kepler parts generally have mediocre compute performance in pure compute tasks, NVIDIA’s DirectCompute performance has otherwise proven to be appropriately fast, except in the case of DiRT. The fact of the matter is that DiRT is easy enough to run even with its advanced lighting system that there’s no reason not to use it on a card like the GTX 780 at any single-monitor resolution, but doing so does put the GTX 780 in a bad light relative to AMD’s best cards. Nor does this put GK110 in a particularly good light, as its compute enhancements don’t bring it much of an advantage here beyond what the larger number of shaders affords.
Like Titan before it, the GTX 780 falls slightly behind AMD’s Radeon HD 7970 GHz Edition, the only such benchmark where this occurs. The end result being that the GTX 780 trails the 7970GE by about 7%, and the GTX Titan by 6%. Otherwise we’ve seen Titan (and will see GTX 780) do much better in virtually every other benchmark.
Total War: Shogun 2
Our next benchmark is Shogun 2, which is a continuing favorite to our benchmark suite. Total War: Shogun 2 is the latest installment of the long-running Total War series of turn based strategy games, and alongside Civilization V is notable for just how many units it can put on a screen at once. Even 2 years after its release it’s still a very punishing game at its highest settings due to the amount of shading and memory those units require.
Compared to DiRT, Shogun 2 immediately presents us with a scenario that sees GTX 680 doing very well. At 51.6fps at 2560, the GTX 780 is some 34% ahead of the GTX 680, 39% ahead of the 7970GE, and only 11% behind Titan. Compared to the GTX 680 tier cards this is actually better than the GTX 780 will get on average, and like Titan before it the lead will depend on just what aspect of the GPU any given game is pushing the most. At the same time 11% is fairly consistent for how far behind Titan GTX 780 will trail, reinforcing its position as a budget Titan card.
As an aside, this is by far the best game for the GTX 780 versus the old Fermi based GTX 580. Here we see the GTX 780 flat-out double the performance of the GTX 580. NVIDIA may have a tough time selling the GTX 780 on its price tag, but in cases like this the performance gains over last generation hardware are very remarkable.
Hitman: Absolution
The third game in our lineup is Hitman: Absolution. The latest game in Square Enix’s stealth-action series, Hitman: Absolution is a DirectX 11 based title that though a bit heavy on the CPU, can give most GPUs a run for their money. Furthermore it has a built-in benchmark, which gives it a level of standardization that fewer and fewer benchmarks possess.
Hitman is another game the 7970GE does fairly well at, leading to it to ultimately nip at the heels of the GTX 780, but ultimately the GTX 780 hits its role as a gap filler, coming in between the GTX sub-$500 class cards and the $1000 class cards. The best news here for the GTX 780 is that it trails the bigger Titan by just 4%, while the lead over the GTX 680 stands at 27%, and the lead over the 7970GE stands at only 7%.
Minimum framerates aren’t quite as good for the GTX 780, though they’re still more than respectable. Here the GTX 780 still looks good against Titan, trailing by 10%, but now the lead over the 7970GE stands at just 4%.
Sleeping Dogs
Another Square Enix game, Sleeping Dogs is one of the few open world games to be released with any kind of benchmark, giving us a unique opportunity to benchmark an open world game. Like most console ports, Sleeping Dogs’ base assets are not extremely demanding, but it makes up for it with its interesting anti-aliasing implementation, a mix of FXAA and SSAA that at its highest settings does an impeccable job of removing jaggies. However by effectively rendering the game world multiple times over, it can also require a very powerful video card to drive these high AA modes.
Sleeping Dogs is another game where the 780 fills out a gap, but falls closer to the 7970GE than NVIDIA would like to see. At 64.4fps it’s fast enough to crash past 60fps at 2560 with high AA, but this means it’s narrower win for the GTX 780, beating the GTX 680 by 23% but the 7970GE by just 7%. Meanwhile the GTX 780 trails the GTX Titan by 12%.
The minimum framerates, though not bad on their own, do not do the GTX 780 any favors here, and we see the GTX 780 fall behind the 7970GE here by over 10%. Interestingly this is just about an all-around worst case scenario for the GTX 780, which has the GTX 780 trailing the GTX Titan by almost the full 15% theoretical shader/texture performance gap, and the lead over the GTX 680 is only 10%. Sleeping Dogs use of SSAA in higher anti-aliasing modes is very hard on the shaders, and this is a prime example of what GTX 780’s weak spot is going to be relative to GTX Titan.
Crysis: Warhead
Up next is our legacy title for 2013, Crysis: Warhead. The stand-alone expansion to 2007’s Crysis, at over 4 years old Crysis: Warhead can still beat most systems down. Crysis was intended to be future-looking as far as performance and visual quality goes, and it has clearly achieved that. We’ve only finally reached the point where single-GPU cards have come out that can hit 60fps at 1920 with 4xAA.
Thankfully for NVIDA, Crysis: Warhead is the last of the games where we see the GTX 780 struggle to get well ahead of the 7970GE. Once more the performance advantage is just 8% over AMD’s best single-GPU card at 2560, with both cards hitting the mid-40s for frames per second. On the other hand Crysis benefits greatly from the extra memory bandwidth and ROPs of GTX 780 compared to the GTX 680, leading to a 37% performance gap in favor of the GTX 780. Finally, the GTX 780 once more trails the GTX Titan by about 11%.
When it comes to minimum framerates things actually improve a bit for the GTX 780. Against the 7970GE it retains its 8% lead, but versus the GTX 680 it’s now a sizable 47% lead.
Far Cry 3
The next game in our benchmark suite is Far Cry 3, Ubisoft’s island-jungle action game. A lot like our other jungle game Crysis, Far Cry 3 can be quite tough on GPUs, especially with MSAA and improved alpha-to-coverage checking thrown into the mix. On the other hand it’s still a bit of a pig on the CPU side, and seemingly inexplicably we’ve found that it doesn’t play well with HyperThreading on our testbed, making this the only game we’ve ever had to disable HT for to maximize our framerates.
Moving on to the back-half of our games, we’re starting to hit the games that traditionally favor NVIDIA’s architectures over AMD’s. Case in point, the GTX 780 has a very solid 23% lead over the 7970GE. Meanwhile the Titan gap is once more around 10%, and this is another Title where the GTX 780 does particularly well relative to the GTX 580, clearing the last generation frontrunner by 77%. Far Cry 3 ends up being an excellent example of where the GTX 780 becomes a filler card; it cleanly fills the gap between the GTX 680 and GTX Titan.
Battlefield 3
Our final action game of our benchmark suite is Battlefield 3, DICE’s 2011 multiplayer military shooter. Its ability to pose a significant challenge to GPUs has been dulled some by time and drivers, but it’s still a challenge if you want to hit the highest settings at the highest resolutions at the highest anti-aliasing levels. Furthermore while we can crack 60fps in single player mode, our rule of thumb here is that multiplayer framerates will dip to half our single player framerates, so hitting high framerates here may not be high enough.
Battlefield 3 is another game that NVIDIA traditionally does well in, despite the fact that both sides have wrung out some rather impressive performance increases over the last year. At 2560 the GTX 780 enjoys a 33% lead over the 7970GE, a 27% lead over the GTX 680, and a massive 85% lead over the GTX 580. Furthermore this is fast enough to get it past 60fps at 2560, which means our minimum framerates should dip no lower than the mid-30s even in the most hectic multiplayer maps.
Civilization V
Our other strategy game, Civilization V, gives us an interesting look at things that other RTSes cannot match, with a much weaker focus on shading in the game world and a much greater focus on creating the geometry needed to bring such a world to life. In doing so it uses a slew of DirectX 11 technologies, including tessellation for said geometry, driver command lists for reducing CPU overhead, and compute shaders for on-the-fly texture decompression.
Civilization V is another game that not only tends to favor NVIDIA video cards, but strongly favors GK110 cards in particular. As a result the lead over the 7970GE is an incredible 47%, and the lead over the GTX 680 is right next to it at 46%. Though admittedly we’ve reached a point where the difference is almost academic, since even a GTX 680 can hit 60fps at 2560.
Bioshock Infinite
Bioshock Infinite is Irrational Games’ latest entry in the Bioshock franchise. Though it’s based on Unreal Engine 3 – making it our obligatory UE3 game – Irrational had added a number of effects that make the game rather GPU-intensive on its highest settings. As an added bonus it includes a built-in benchmark composed of several scenes, a rarity for UE3 engine games, so we can easily get a good representation of what Bioshock’s performance is like.
Bioshock is another strong showing for the GTX 780, both against the 7970GE and the GTX 580. In the case of the former the GTX 780 leads by 30%, while against the GTX 580 it leads by 96%, falling just short of doubling the GTX 580’s performance again. Overall the framerate of 61.9fps makes this the slowest card that can do 60fps at 2560 at the game’s highest settings, and one of the only two single-GPU cards that can perform such a feat.
Crysis 3
Our final benchmark in our suite needs no introduction. With Crysis 3, Crytek has gone back to trying to kill computers, taking back the “most punishing game” title in our benchmark suite. Only in a handful of setups can we even run Crysis 3 at its highest (Very High) settings, and that’s still without AA. Crysis 1 was an excellent template for the kind of performance required to driver games for the next few years, and Crysis 3 looks to be much the same for 2013.
Even with just FXAA and High quality settings, Crysis 3 quashes any hope of running at 2560 with a single card at the game’s higher quality settings. 53.1fps is plenty playable, but GTX 780 users would need to give up a bit more if they want to push the averages above 60fps. Meanwhile looking at our percentages it’s another strong showing for the GTX 780, with the GTX 780 leading the GTX 680 by 30% and the 7970GE by 28%.
Synthetics
As always we’ll also take a quick look at synthetic performance, though as GTX 780 is just another GK110 card, there shouldn't be any surprises here. These tests are mostly for comparing cards from within a manufacturer, as opposed to directly comparing AMD and NVIDIA cards. We’ll start with 3DMark Vantage’s Pixel Fill test.
Pixel fill is traditionally bound by ROP and memory throughput, but with enough of both the bottleneck can shift back to the shader blocks. In this case that’s exactly what happens, with the GTX 780 trailing GTX Titan by about the theoretical difference between the two cards. On the other hand it’s very odd to see the GTX 680 get so close to the GTX 780 in this test, given the fact that the latter is more powerful in virtually every way possible.
Moving on, we have our 3DMark Vantage texture fillrate test, which does for texels and texture mapping units what the previous test does for ROPs.
Unlike pixel fill, texel fill is right where we expected it to come in compared to cards both above and below the GTX 690.
Finally we’ll take a quick look at tessellation performance with TessMark.
NVIDIA’s tessellation performance is strongly coupled to their SMX count, so the high number of SMXes (12) on the GTX 780 helps it keep well ahead of the pack. In fact we’re a bit surprised it didn’t fall behind GTX Titan by more than what we’re seeing. On the other hand the lead over the GTX 580 is right where we’d expect it to be, showcasing the roughly trebled geometry performance of GTX 780 over GTX 580.
Compute
Jumping into compute, we should see a mix of results here, with some tests favoring the GK110 based GTX 780’s more compute capable design, while other tests will punish it for not being a fast FP64 card like GTX Titan.
As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.
Civilization V’s DirectCompute performance is looking increasingly maxed out at the high end. At 402fps the GTX 780 may as well be tied with GTX Titan. On the other hand it’s a reminder that while we don’t always see NVIDIA do well in our more pure compute tests, it can deliver where it matters for games with DirectCompute.
Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.
NVIDIA has never done well at LuxMark, and GTX 780 won’t change that. It’s greatly faster than GTX 680 and that’s about it. Kepler parts, including GK110, continue to have trouble with our OpenCL benchmarks, as evidenced by the fact that GTX 780 doesn’t beat GTX 580 by nearly as much as the generational improvements should lead to. GK110 is a strong compute GPU, but not in ways that LuxMark is going to benefit.
Our 3rd benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.
GTX 780 still struggles some at compute with CLBenchmark, but less so than with LuxMark. 7970GE is the clear winner here in both tests, while GTX 780 stays remarkably close to GTX Titan in performance. The fluid simulation in particular makes GTX 780 look good on a generational basis, more than doubling GTX 580’s performance.
Moving on, our 4th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home has moved exclusively to OpenCL this year with FAHCore 17.
The Folding@Home group recently pushed out a major core update(FAHBench 1.2.0), which we’ve rerun on a number of cards and is reflected in our results. Unfortunately this version also broke single precision implicit on AMD GPUs and AMD’s latest drivers, so we only have NVIDIA GPUs for that section.
In any case, despite the fact that this is an OpenCL benchmark this is one of the cases where NVIDIA GPUs do well enough for themselves in single precision mode, with GTX 780 surpassing 7970GE, and falling behind only GTX Titan and the 7990. GTX 780 doesn’t necessarily benefit from GK110’s extra compute functionality, but it does see a performance improvement over GTX 680 that’s close to the theoretical difference in shader performance. Meanwhile in double precision mode, the lack of an uncapped double precision mode for GTX 780 means that it brings up the bottom of the charts compared to Titan and its 1/3 FP64 rate. Compute customers looking for a bargain NVIDIA card (relatively speaking) will need to stick with Titan.
Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.
SystemCompute shows very clear gains over both the GTX 680 and GTX 580, while trailing the GTX Titan as expected. However like Titan, both trail the 7970GE.
Power, Temperature, & Noise
As always, last but not least is our look at power, temperature, and noise. Next to price and performance of course, these are some of the most important aspects of a GPU, due in large part to the impact of noise. All things considered, a loud card is undesirable unless there’s a sufficiently good reason – or sufficiently good performance – to ignore the noise.
GTX 780 comes into this phase of our testing with a very distinct advantage. Being based on an already exceptionally solid card in the GTX Titan, it’s guaranteed to do at least as well as Titan here. At the same time because its practical power consumption is going to be a bit lower due to the fewer enabled SMXes and fewer RAM chips, it can be said that it has Titan’s cooler and a lower yet TDP, which can be a silent (but deadly) combination.
GeForce GTX 780 Voltages | ||||
GTX 780 Max Boost | GTX 780 Base | GTX 780 Idle | ||
1.1625v | 1.025v | 0.875v |
Unsurprisingly, voltages are unchanged from Titan. GK110’s max safe load voltage is 1.1625v, with 1.2v being the maximum overvoltage allowed by NVIDIA. Meanwhile idle remains at 0.875v, and as we’ll see idle power consumption is equal too.
Meanwhile we also took the liberty of capturing the average clockspeeds of the GTX 780 in all of the games in our benchmark suite. In short, although the GTX 780 has a higher base clock than Titan (863MHz versus 837MHz), the fact that it only goes to one higher boost bin (1006MHz versus 993MHz) means that the GTX 780 doesn’t usually clock much higher than GTX Titan under load; for one reason or another it typically settles at the boost bin as the GTX Titan on tests that offer consistent work loads. This means that in practice the GTX 780 is closer to a straight-up harvested GTX Titan, with no practical clockspeed differences.
GeForce GTX Titan Average Clockspeeds | ||||
GTX 780 | GTX Titan | |||
Max Boost Clock | 1006MHz | 992MHz | ||
DiRT:S |
1006MHz
|
992MHz | ||
Shogun 2 |
966MHz
|
966MHz | ||
Hitman |
992MHz
|
992MHz | ||
Sleeping Dogs |
969MHz
|
966MHz | ||
Crysis |
992MHz
|
992MHz | ||
Far Cry 3 |
979MHz
|
979MHz | ||
Battlefield 3 |
992MHz
|
992MHz | ||
Civilization V |
1006MHz
|
979MHz |
Idle power consumption is by the book. With the GTX 780 equipped, our test system sees 110W at the wall, a mere 1W difference from GTX Titan, and tied with the 7970GE. Idle power consumption of video cards is getting low enough that there’s not a great deal of difference between the latest generation cards, and what’s left is essentially lost as noise.
Moving on to power consumption under Battlefield 3, we get our first real confirmation of our earlier theories on power consumption. Between the slightly lower load placed on the CPU from the lower framerate, and the lower power consumption of the card itself, GTX 780 draws 24W less at the wall. Interestingly this is exactly how much our system draws with the GTX 580 too, which accounting for lower CPU power consumption means that video card power consumption on the GTX 780 is down compared to the GTX 580. GTX 780 being a harvested part helps a bit with that, but it still means we’re looking at quite the boost in performance relative to the GTX 580 for a simultaneous decrease in video card power consumption.
Moving along, we see that power consumption at the wall is higher than both the GTX 680 and 7970GE. The former is self-explanatory: the GTX 780 features a bigger GPU and more RAM, but is made on the same 28nm process as the GTX 680. So for a tangible performance improvement within the same generation, there’s nowhere for power consumption to go but up. Meanwhile as compared to the 7970GE, we are likely seeing a combination of CPU power consumption differences and at least some difference in video card power consumption, though this doesn’t make it possible to specify how much of each.
Switching to FurMark and its more pure GPU load, our results become compressed somewhat as the GTX 780 moves slightly ahead of the 7970GE. Power consumption relative to Titan is lower than what we expected it to be considering both cards are hitting their TDP limits, though compared to GTX 680 it’s roughly where it should be. At the same time this reflects a somewhat unexpected advantage for NVIDIA; despite the fact that GK110 is a bigger and logically more power hungry GPU than AMD’s Tahiti, the power consumption of the resulting cards isn’t all that different. Somehow NVIDIA has a slight efficiency advantage here.
Moving on to idle temperatures, we see that GTX 780 hits the same 30C mark as GTX Titan and 7970GE.
With GPU Boost 2.0, load temperatures are kept tightly in check when gaming. The GTX 780’s default throttle point is 80C, and that’s exactly what happens here, with GTX 780 bouncing around that number while shifting between its two highest boost bins. Note that like Titan however this means it’s quite a bit warmer than the open air cooled 7970GE, so it will be interesting to see if semi-custom GTX 780 cards change this picture at all.
Whereas GPU Boost 2.0 keeps a lid on things when gaming, it’s apparently a bit more flexible on FurMark, likely because the video card is already heavily TDP throttled.
Last but not least we have our look at idle noise. At 38dB GTX 780 is essentially tied with GTX Titan, which again comes at no great surprise. At least in our testing environment one would be hard pressed to tell the difference between GTX 680, GTX 780, and GTX Titan at idle. They’re essentially as quiet as a card can get without being silent.
Under BF3 we see the payoff of NVIDIA’s fan modifications, along with the slightly lower effective TDP of GTX 780. Despite – or rather because – it was built on the same platform as GTX Titan, there’s nowhere for idle noise to go down. As a result we have a 250W blower based card hitting 48.1dB under load, which is simply unheard of. At nearly a 4dB improvement over both GTX 680 and GTX 690 it’s a small but significant improvement over NVIDIA’s previous generation cards, and even Titan has the right to be embarrassed. Silent it is not, but this is incredibly impressive for a blower. The only way to beat something like this is with an open air card, as evidenced by the 7970GE, though that does comes with the usual tradeoffs for using such a cooler.
Because of the slightly elevated FurMark temperatures we saw previously, GTX 780 ends up being a bit louder than GTX Titan under FurMark. This isn’t something that we expect to see under any non-pathological workload, and I tend to favor BF3 over FurMark here anyhow, but it does point to there being some kind of minor difference in throttling mechanisms between the two cards. At the same time this means that GTX 780 is still a bit louder than our open air cooled 7970GE, though not by as large a difference as we saw with BF3.
Overall the GTX 780 generally meets or exceeds the GTX Titan in our power, temp, and noise tests, just as we’d expect for a card almost identical to Titan itself. The end result is that it maintains every bit of Titan’s luxury and stellar performance, and if anything improves on it slightly when we’re talking about the all-important aspects of load noise. It’s a shame that coolers such as 780’s are not a common fixture on cheaper cards, as this is essentially unparalleled as far as blower based coolers are concerned.
At the same time this sets up an interesting challenge for NVIDIA’s partners. To pass Greenlight they need to produce cards with coolers that function as good or as better than the reference GTX 780 in NVIDIA’s test environment. This is by no means impossible, but it’s not going to be an easy task. So it will be interesting to see what partners cook up, especially with the obligatory dual fan open air cooled models.
Final Thoughts
NVIDIA is primarily pitching the GeForce GTX 780 as the next step in their high-end x80 line of video cards, a role it fits into well. At the same time however I can’t help but to keep going back to GTX Titan comparisons due to the fact that the GTX 780 is by every metric a cut-down GTX Titan card. Whether this is a good thing or not is open to debate, but with NVIDIA’s emergence into the prosumer market with GTX Titan and the fact that there’s now a single-GPU video card above the traditionally top-tier x80 card, this complicates things as compared to past x80 card launches.
Anyhow, we’ll start with the obvious: the GeForce GTX 780 is a filler card whose most prominent role will be filling the game between sub-$500 cards and this odd prosumer/luxury/ultra-enthusiast market that has taken root above $500. If there’s to be a $1000 single-GPU card in NVIDIA’s product stack then it’s simply good business to have something between that and the sub-$500 market, and that something is the GTX 780.
For the small number of customers that can afford a card in this price segment, the GTX 780 is an extremely strong contender. In fact it’s really the only contender – at least as far as single-GPU cards go – as AMD won’t directly be competing with GK110. The end result is that with the GTX 780 delivering an average of 90% of Titan’s gaming performance for 65% of the price, this is by all rights the Titan Mini, the cheaper video card Titan customers have been asking for. From that perspective the GTX 780 is nothing short of an amazing deal for the level of performance offered, especially since it maintains the high build quality and impressive acoustics that helped to define Titan.
On the other hand, as an x80 card the GTX 780 is pretty much a tossup. The full generational performance improvement is absolutely there, as the GTX 780 beats the last-generation GTX 580 by an average of 80%. NVIDIA knows their market well, and for most buyers in a 2-3 year upgrade cycle this is the level of performance necessary to spur on an upgrade.
The catch comes down to pricing. $650 for the GTX 780 makes all the sense in the world from NVIDIA’s perspective – GTX Titan sales have exceeded NVIDIA’s expectations – so between that and Tesla K20 sales the GK110 GPU is in high demand right now. At the same time the performance of the GTX 780 is high enough that AMD can’t directly compete with the card, leaving NVIDIA without competition and free to set prices as they would like, and this is exactly what they have done.
This doesn’t make GTX 780 a bad card, and on the contrary it’s probably a better card than any x80 card before it, particularly when it comes to build quality. But it’s $650 for a product tier that for the last 5 years was a $500 product tier. To that end no one likes a price increase, ourselves included. Ultimately some fraction of the traditional x80 market will make the jump to $650, and for the rest there will be the remainder of the GeForce 700 family or holding out for the eventual GeForce 800 family.
Moving on, it’s interesting to note that with the launch of Titan and now the GTX 780, the high-end single-GPU market looks almost exactly like it did back in 2011. The prices have changed, but otherwise we’ve returned to unchallenged NVIDIA domination of the high end, with AMD fighting the good fight at lower price points. The 22% performance advantage that the GTX 780 enjoys over the Radeon HD 7970GHz Edition cements NVIDIA’s performance lead, while the price difference between the cards means that the 7970GE is still a very strong contender in its current $400 market and a clear budget-saving spoiler like the 6970 before it.
Finally, to bring things to a close we turn our gaze towards the future of the rest of the GeForce 700 family. The GTX 780 is the first of the GeForce 700 family but it clearly won’t be the last. A cut-down GK110 card as GTX 780 was the logical progression for NVIDIA, but what to use to replace GTX 670 is a far murkier question as NVIDIA has a number of good choices at their disposal. Mull that over for a bit, and hopefully we’ll be picking up the subject soon.