NVIDIA Pascal Architecture Review. Getting to know the GPU GP104 of the nvidia pascal card

ParameterMeaning
Chip code nameGP104
Production technology16nm FinFET
Number of transistors7.2 billion
Core area314 mm²
Architecture
DirectX hardware support
Memory bus
1607 (1733) MHz
Computing blocks20 Streaming Multiprocessors including 2560 IEEE 754-2008 floating point scalar ALUs;
Texturing blocks160 texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Monitor support
GeForce GTX 1080 Reference Graphics Specifications
ParameterMeaning
Core frequency1607 (1733) MHz
2560
Number of texture blocks160
Number of blending blocks64
Effective memory frequency10000 (4×2500) MHz
Memory typeGDDR5X
Memory bus256-bit
Memory8 GB
320 GB/s
about 9 teraflops
103 gigapixels/s
257 gigatexels/s
TirePCI Express 3.0
Connectors
power usageup to 180 W
Extra foodOne 8-pin connector
2
Recommended price$599-699 (USA), 54990 RUB (Russia)

The new model of the GeForce GTX 1080 video card received a logical name for the first solution of the new GeForce series - it differs from its direct predecessor only in a changed generation number. The novelty not only replaces the top-end solutions in the current line of the company, but also became the flagship of the new series for some time, until the Titan X was released on even more powerful GPUs. Below it in the hierarchy is also the already announced model GeForce GTX 1070, based on a stripped-down version of the GP104 chip, which we will consider below.

The suggested prices for Nvidia's new graphics card are $599 and $699 for regular and Founders Edition (see below), respectively, which is a pretty good deal considering the GTX 1080 is ahead of not only the GTX 980 Ti, but also the Titan X. Today, the new product is the best solution in terms of performance on the single-chip video card market without any questions, and at the same time it is cheaper than the most powerful video cards of the previous generation. So far, the GeForce GTX 1080 has essentially no competitor from AMD, so Nvidia was able to set a price that suits them.

The video card in question is based on the GP104 chip, which has a 256-bit memory bus, but the new type of GDDR5X memory operates at a very high effective frequency of 10 GHz, which gives a high peak bandwidth of 320 GB / s - which is almost on par with the GTX 980 Ti with 384 -bit bus. The amount of memory installed on a video card with such a bus could be 4 or 8 GB, but it would be stupid to set a smaller amount for such a powerful solution in modern conditions, so the GTX 1080 got 8 GB of memory, and this amount is enough to run any 3D- applications with any quality settings for several years to come.

The GeForce GTX 1080 PCB is understandably quite different from the company's previous PCBs. The value of typical power consumption for new items is 180 watts - slightly higher than the GTX 980, but noticeably lower than the less powerful Titan X and GTX 980 Ti. The reference board has the usual set of connectors for connecting image output devices: one Dual-Link DVI, one HDMI and three DisplayPort.

Founders Edition reference design

Even with the announcement of the GeForce GTX 1080 in early May, a special edition of the video card called Founders Edition was announced, which has a higher price than regular video cards from the company's partners. In fact, this edition is the reference design of the card and cooling system, and it is produced by Nvidia itself. You can have different attitudes towards such options for video cards, but the reference design developed by the company's engineers and manufactured using quality components have their fans.

But whether they will pay several thousand rubles more for a video card from Nvidia itself is a question that only practice can answer. In any case, at first it will be the reference video cards from Nvidia that will appear on sale at an increased price, and there is not much to choose from - this happens with every announcement, but the reference GeForce GTX 1080 is different in that it is planned to be sold in this form throughout its life. life, until the release of next-generation solutions.

Nvidia believes that this edition has its merits even over the best works of partners. For example, the two-slot design of the cooler makes it easy to assemble both gaming PCs of a relatively small form factor and multi-chip video systems based on this powerful video card (even despite the three- and four-chip mode not recommended by the company). The GeForce GTX 1080 Founders Edition has some advantages in the form of an efficient cooler using a evaporative chamber and a fan that pushes heated air out of the case - this is Nvidia's first such solution that consumes less than 250 watts of power.

Compared to the company's previous reference product designs, the power circuit has been upgraded from four-phase to five-phase. Nvidia also talks about the improved components on which the new product is based, electrical noise has also been reduced to improve voltage stability and overclocking potential. As a result of all the improvements, the power efficiency of the reference board has increased by 6% compared to the GeForce GTX 980.

And in order to differ from the "ordinary" models of the GeForce GTX 1080 and outwardly, an unusual "chopped" case design was developed for the Founders Edition. Which, however, probably also led to the complication of the shape of the evaporation chamber and radiator (see photo), which may have been one of the reasons for paying $100 extra for such a special edition. We repeat that at the beginning of sales, buyers will not have much choice, but in the future it will be possible to choose both a solution with their own design from one of the company's partners, and performed by Nvidia itself.

New generation of Pascal graphics architecture

The GeForce GTX 1080 video card is the company's first solution based on the GP104 chip, which belongs to the new generation of Nvidia's Pascal graphics architecture. Although the new architecture is based on the solutions worked out in Maxwell, it also has important functional differences, which we will write about later. The main change from a global point of view was the new technological process, according to which the new graphics processor was made.

The use of the 16 nm FinFET process technology in the production of GP104 graphics processors at the factories of the Taiwanese company TSMC made it possible to significantly increase the complexity of the chip while maintaining a relatively low area and cost. Compare the number of transistors and the area of ​​the GP104 and GM204 chips - they are close in area (the chip of the novelty is even physically smaller), but the Pascal architecture chip has a significantly larger number of transistors, and, accordingly, execution units, including those providing new functionality.

From an architectural point of view, the first gaming Pascal is very similar to similar solutions of the Maxwell architecture, although there are some differences. Like Maxwell, Pascal architecture processors will have different configurations of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The SM multiprocessor is a highly parallel multiprocessor that schedules and runs warps (warps, groups of 32 instruction streams) on CUDA cores and other execution units in the multiprocessor. You can find detailed information about the design of all these blocks in our reviews of previous Nvidia solutions.

Each of the SM multiprocessors is paired with the PolyMorph Engine, which handles texture sampling, tessellation, transformation, vertex attribute setting, and perspective correction. Unlike the company's previous solutions, the PolyMorph Engine in the GP104 chip also contains a new Simultaneous Multi-Projection block, which we will discuss below. The combination of the SM multiprocessor with one Polymorph Engine is traditionally called TPC - Texture Processor Cluster for Nvidia.

In total, the GP104 chip in the GeForce GTX 1080 contains four GPC clusters and 20 SM multiprocessors, as well as eight memory controllers combined with 64 ROPs. Each GPC cluster has a dedicated rasterization engine and includes five SMs. Each multiprocessor, in turn, consists of 128 CUDA cores, 256 KB register file, 96 KB shared memory, 48 KB L1 cache, and eight TMU texture units. That is, in total, GP104 contains 2560 CUDA cores and 160 TMU units.

Also, the graphics processor on which the GeForce GTX 1080 is based contains eight 32-bit (as opposed to the 64-bit previously used) memory controllers, which gives us a final 256-bit memory bus. Eight ROPs and 256 KB of L2 cache are tied to each of the memory controllers. That is, in total, the GP104 chip contains 64 ROPs and 2048 KB of L2 cache.

Thanks to architectural optimizations and a new process technology, the first gaming Pascal has become the most energy efficient GPU ever. Moreover, there is a contribution to this both from one of the most advanced technological processes 16 nm FinFET, and from the architecture optimizations carried out in Pascal, compared to Maxwell. Nvidia was able to increase the clock speed even more than they expected when moving to a new process technology. The GP104 runs at a higher frequency than a hypothetical GM204 made using the 16nm process. To do this, Nvidia engineers had to carefully check and optimize all the bottlenecks of previous solutions that prevent overclocking above a certain threshold. As a result, the new GeForce GTX 1080 runs at over 40% higher clock speeds than the GeForce GTX 980. But that's not all there is to the GPU clock changes.

GPU Boost 3.0 Technology

As we well know from previous Nvidia graphics cards, they use GPU Boost hardware technology in their GPUs, designed to increase the operating clock speed of the GPU in modes where it has not yet reached its power consumption and thermal limits. Over the years, this algorithm has undergone many changes, and the third generation of this technology is already used in the Pascal architecture video chip - GPU Boost 3.0, the main innovation of which is a finer setting of turbo frequencies, depending on voltage.

If you remember the principle of operation of previous versions of the technology, then the difference between the base frequency (the guaranteed minimum frequency value below which the GPU does not fall, at least in games) and the turbo frequency was fixed. That is, the turbo frequency has always been a certain number of megahertz above the base. GPU Boost 3.0 introduced the ability to set turbo frequency offsets for each voltage separately. The easiest way to understand this is with an illustration:

On the left is the GPU Boost of the second version, on the right - the third, which appeared in Pascal. A fixed difference between the base and turbo frequencies did not allow to reveal the full capabilities of the GPU, in some cases, GPUs of previous generations could work faster at the set voltage, but a fixed excess of the turbo frequency did not allow this. In GPU Boost 3.0, this feature appeared, and the turbo frequency can be set for each of the individual voltage values, completely squeezing all the juice out of the GPU.

Handy utilities are required to manage overclocking and set the turbo frequency curve. Nvidia itself does not do this, but helps its partners create such utilities to facilitate overclocking (within reasonable limits, of course). For example, the new functionality of GPU Boost 3.0 has already been revealed in EVGA Precision XOC, which includes a dedicated overclocking scanner that automatically finds and sets the non-linear difference between base frequency and turbo frequency at different voltages by running a built-in performance and stability test. As a result, the user gets a turbo frequency curve that perfectly matches the capabilities of a particular chip. Which, moreover, can be modified as you like in manual mode.

As you can see in the screenshot of the utility, in addition to information about the GPU and the system, there are also settings for overclocking: Power Target (defines typical power consumption during overclocking, as a percentage of the standard), GPU Temp Target (maximum allowed core temperature), GPU Clock Offset (exceeding the base frequency for all voltage values), Memory Offset (exceeding the frequency of the video memory over the default value), Overvoltage (additional opportunity to increase the voltage).

The Precision XOC utility includes three overclocking modes: Basic, Linear, and Manual. In the main mode, you can set a single overclock value (fixed turbo frequency) over the base one, as was the case for previous GPUs. Linear mode allows you to set the frequency ramp from the minimum to the maximum voltage values ​​for the GPU. Well, in manual mode, you can set unique GPU frequency values ​​\u200b\u200bfor each voltage point on the graph.

The utility also includes a special scanner for automatic overclocking. You can either set your own frequency levels or let Precision XOC scan the GPU at all voltages and find the most stable frequencies for each point on the voltage and frequency curve fully automatically. During the scanning process, Precision XOC incrementally increases the frequency of the GPU and checks its operation for stability or artifacts, building an ideal frequency and voltage curve that will be unique to each specific chip.

This scanner can be customized to your own requirements by setting the time interval to test each voltage value, the minimum and maximum frequency to be tested, and its step. It is clear that in order to achieve stable results, it would be better to set a small step and a decent duration of testing. During testing, unstable operation of the video driver and the system may be observed, but if the scanner does not freeze, it will restore operation and continue to find the optimal frequencies.

New type of video memory GDDR5X and improved compression

So, the power of the GPU has grown significantly, and the memory bus has remained only 256-bit - will the memory bandwidth limit the overall performance and what can be done about it? It seems that the promising second-generation HBM is still too expensive to manufacture, so other options had to be looked for. Ever since the introduction of GDDR5 memory in 2009, Nvidia engineers have been exploring the possibilities of using new types of memory. As a result, developments have come to the introduction of a new memory standard GDDR5X - the most complex and advanced standard to date, giving a transfer rate of 10 Gbps.

Nvidia gives an interesting example of just how fast this is. Only 100 picoseconds elapse between transmitted bits - during this time, a beam of light will travel a distance of only one inch (about 2.5 cm). And when using GDDR5X memory, the data-receiving circuits have to choose the value of the transmitted bit in less than half of this time before the next one is sent - this is just so you understand what modern technology has come to.

Achieving this speed required the development of a new I/O system architecture that required several years of joint development with memory chip manufacturers. In addition to the increased data transfer rate, energy efficiency has also increased - GDDR5X memory chips use a lower voltage of 1.35 V and are manufactured using new technologies, which gives the same power consumption at a 43% higher frequency.

The company's engineers had to rework the data transmission lines between the GPU core and memory chips, paying more attention to preventing signal loss and signal degradation all the way from memory to GPU and back. So, in the illustration above, the captured signal is shown as a large symmetrical "eye", which indicates good optimization of the entire circuit and the relative ease of capturing data from the signal. Moreover, the changes described above have led not only to the possibility of using GDDR5X at 10 GHz, but also should help to get a high memory bandwidth on future products using the more familiar GDDR5 memory.

Well, we got more than 40% increase in memory bandwidth from the use of the new memory. But isn't that enough? To further increase memory bandwidth efficiency, Nvidia continued to improve the advanced data compression introduced in previous architectures. The memory subsystem in the GeForce GTX 1080 uses improved and several new lossless data compression techniques designed to reduce bandwidth requirements - already the fourth generation of on-chip compression.

Algorithms for data compression in memory bring several positive aspects at once. Compression reduces the amount of data written to memory, the same applies to data transferred from video memory to the second level cache, which improves the efficiency of using the L2 cache, since a compressed tile (a block of several framebuffer pixels) has a smaller size than an uncompressed one. It also reduces the amount of data sent between different points, like the TMU texture module and the framebuffer.

The data compression pipeline in the GPU uses several algorithms, which are determined depending on the "compressibility" of the data - the best available algorithm is selected for them. One of the most important is the delta color compression algorithm. This compression method encodes the data as the difference between consecutive values ​​instead of the data itself. The GPU calculates the difference in color values ​​between the pixels in a block (tile) and stores the block as some average color for the entire block plus data on the difference in values ​​for each pixel. For graphic data, this method is usually well suited, since the color within small tiles for all pixels often does not differ too much.

The GP104 GPU in the GeForce GTX 1080 supports more compression algorithms than previous Maxwell chips. Thus, the 2:1 compression algorithm has become more efficient, and in addition to it, two new algorithms have appeared: a 4:1 compression mode, suitable for cases where the difference in the color value of the pixels of a block is very small, and an 8:1 mode, which combines a constant 4:1 compression of 2×2 pixel blocks with 2x delta compression between blocks. When compression is not possible at all, it is not used.

However, in reality, the latter happens very infrequently. This can be seen from the example screenshots from the game Project CARS, which Nvidia cited to illustrate the increased compression ratio in Pascal. In the illustrations, those frame buffer tiles that the GPU could compress were shaded in magenta, and those that could not be compressed without loss remained with the original color (top - Maxwell, bottom - Pascal).

As you can see, the new compression algorithms in GP104 really work much better than in Maxwell. Although the old architecture was also able to compress most of the tiles in the scene, a lot of grass and trees around the edges, as well as car parts, are not subject to legacy compression algorithms. But with the inclusion of new techniques in Pascal, a very small number of image areas remained uncompressed - improved efficiency is evident.

As a result of improvements in data compression, the GeForce GTX 1080 is able to significantly reduce the amount of data sent per frame. In numbers, improved compression saves an additional 20% of effective memory bandwidth. In addition to the more than 40% increase in memory bandwidth of the GeForce GTX 1080 relative to the GTX 980 from using GDDR5X memory, all together this gives about a 70% increase in effective memory bandwidth compared to the previous generation model.

Support for Async Compute

Most modern games use complex calculations in addition to graphics. For example, calculations when calculating the behavior of physical bodies can be carried out not before or after graphical calculations, but simultaneously with them, since they are not related to each other and do not depend on each other within the same frame. Another example is the post-processing of already rendered frames and the processing of audio data, which can also be performed in parallel with rendering.

Another clear example of the use of functionality is the Asynchronous Time Warp technique used in VR systems to change the output frame according to the movement of the player's head right before it is output, interrupting the rendering of the next one. Such asynchronous loading of GPU capacities allows increasing the efficiency of using its execution units.

These workloads create two new GPU usage scenarios. The first of these includes overlapping loads, since many types of tasks do not fully use the capabilities of GPUs, and some resources are idle. In such cases, you can simply run two different tasks on the same GPU, separating its execution units to get more efficient use - for example, PhysX effects that run in conjunction with the 3D rendering of the frame.

To improve the performance of this scenario, the Pascal architecture introduced dynamic load balancing. In the previous Maxwell architecture, overlapping workloads were implemented as a static distribution of GPU resources between graphics and compute. This approach is effective provided that the balance between the two workloads roughly corresponds to the division of resources and the tasks run equally in time. If non-graphical calculations take longer than graphical ones, and both are waiting for the completion of the common work, then part of the GPU will be idle for the remaining time, which will cause a decrease in overall performance and nullify all the benefits. Hardware dynamic load balancing, on the other hand, allows you to use the freed up GPU resources as soon as they become available - for understanding, we will give an illustration.

There are also tasks that are time-critical, and this is the second scenario for asynchronous computing. For example, the execution of the asynchronous time distortion algorithm in VR must complete before the scan out or the frame will be discarded. In such a case, the GPU must support very fast task interruption and switching to another task in order to take a less critical task from execution on the GPU, freeing its resources for critical tasks - this is called preemption.

A single render command from a game engine can contain hundreds of draw calls, each draw call in turn contains hundreds of rendered triangles, each containing hundreds of pixels to be calculated and drawn. The traditional GPU approach uses only high-level task interruption, and the graphics pipeline has to wait for all that work to complete before switching tasks, resulting in very high latency.

To fix this, the Pascal architecture first introduced the ability to interrupt a task at the pixel level - Pixel Level Preemption. Pascal GPU execution units can constantly monitor the progress of rendering tasks, and when an interrupt is requested, they can stop execution, saving the context for later completion by quickly switching to another task.

Thread-level interrupt and toggle for compute operations works similarly to pixel-level interrupt for graphics computing. Computational workloads consist of multiple grids, each containing multiple threads. When an interrupt request is received, the threads running on the multiprocessor terminate their execution. Other blocks save their own state to continue from the same point in the future, and the GPU switches to another task. The entire task switching process takes less than 100 microseconds after the running threads exit.

For gaming workloads, the combination of pixel-level interrupts for graphics, and thread-level interrupts for compute tasks gives Pascal architecture GPUs the ability to quickly switch between tasks with minimal time loss. And for computing tasks on CUDA, it is also possible to interrupt with minimal granularity - at the instruction level. In this mode, all threads stop execution at once, immediately switching to another task. This approach requires saving more information about the state of all registers of each thread, but in some cases of non-graphical calculations it is quite justified.

The use of fast interrupt and task switching in graphical and computational tasks was added to the Pascal architecture so that graphical and non-graphical tasks could be interrupted at the level of individual instructions, rather than entire threads, as was the case with Maxwell and Kepler. These technologies can improve the asynchronous execution of different GPU workloads and improve responsiveness when running multiple tasks simultaneously. At the Nvidia event, they showed a demonstration of the work of asynchronous calculations using the example of calculating physical effects. If without asynchronous calculations the performance was at the level of 77-79 FPS, then with the inclusion of these features, the frame rate increased to 93-94 FPS.

We have already given an example of one of the possibilities for using this functionality in games in the form of asynchronous time distortion in VR. The illustration shows the operation of this technology with traditional interruption (preemption) and fast. In the first case, the process of asynchronous time distortion is tried to be carried out as late as possible, but before the start of updating the image on the display. But the work of the algorithm must be given to the execution in the GPU a few milliseconds earlier, since without a fast interruption there is no way to accurately execute the work at the right time, and the GPU is idle for some time.

In the case of precise interruption at the pixel and thread level (shown on the right), this capability gives greater accuracy in determining the moment of interruption, and asynchronous time warping can be started much later with confidence in the completion of the work before the update of the information on the display begins. And idle for some time in the first case, the GPU can be loaded with some additional graphic work.

Simultaneous Multi-Projection Technology

The new GP104 GPU adds support for a new Simultaneous Multi-Projection (SMP) technology that allows the GPU to render data more efficiently on modern display systems. SMP allows the video chip to simultaneously display data in several projections, which required the introduction of a new hardware block in the GPU as part of the PolyMorph engine at the end of the geometric pipeline before the rasterization block. This block is responsible for working with multiple projections for a single geometry stream.

The multi-projection engine processes geometric data simultaneously for 16 pre-configured projections that combine the projection point (cameras), these projections can be independently rotated or tilted. Since each geometry primitive can appear simultaneously in multiple projections, the SMP engine provides this functionality, allowing the application to instruct the video chip to replicate the geometry up to 32 times (16 projections at two projection centers) without additional processing.

The whole processing process is hardware accelerated, and since multiprojection works after the geometry engine, it does not need to repeat all the stages of geometry processing several times. The saved resources are important when rendering speed is limited by the performance of geometry processing, such as tessellation, when the same geometric work is performed several times for each projection. Accordingly, in the peak case, multi-projection can reduce the need for geometry processing by up to 32 times.

But why is all this necessary? There are several good examples where multi-projection technology can be useful. For example, a multi-monitor system of three displays mounted at an angle to each other close enough to the user (surround configuration). In a typical situation, the scene is rendered in one projection, which leads to geometric distortions and incorrect geometry rendering. The correct way is three different projections for each of the monitors, according to the angle at which they are located.

With a video card on a chip with Pascal architecture, this can be done in one geometry pass, specifying three different projections, each for a different monitor. And the user, thus, will be able to change the angle at which the monitors are located to each other not only physically, but also virtually - by rotating the projections for the side monitors in order to get the correct perspective in the 3D scene with a noticeably wider viewing angle (FOV). True, there is a limitation here - for such support, the application must be able to render the scene with a wide FOV and use special SMP API calls to set it. That is, you can’t do this in every game, you need special support.

In any case, the days of a single projection on a single flat monitor are over, there are now many multi-monitor configurations and curved displays that can also use this technology. Not to mention virtual reality systems that use special lenses between the screens and the user's eyes, which require new techniques for projecting a 3D image into a 2D image. Many of these technologies and techniques are still in early development, the main thing is that older GPUs cannot effectively use more than one planar projection. They require multiple rendering passes, multiple processing of the same geometry, and so on.

Maxwell chips had limited Multi-Resolution support to help increase efficiency, but Pascal's SMP can do much more. Maxwell could rotate the projection by 90 degrees for cube mapping or different projection resolutions, but this was only useful in a limited range of applications like VXGI.

Other possibilities for using SMP include rendering at different resolutions and single-pass stereo rendering. For example, rendering at different resolutions (Multi-Res Shading) can be used in games to optimize performance. When applied, a higher resolution is used in the center of the frame, and at the periphery it is reduced to obtain a faster rendering speed.

Single-pass stereo rendering is used in VR, it has already been added to the VRWorks package and uses the multi-projection feature to reduce the amount of geometric work required in VR rendering. If this feature is used, the GeForce GTX 1080 GPU processes the scene geometry only once, generating two projections for each eye at once, which reduces the geometric load on the GPU by half, and also reduces the losses from the driver and OS.

An even more advanced technique for improving the efficiency of VR rendering is Lens Matched Shading, which uses multiple projections to simulate the geometric distortions required in VR rendering. This method uses multi-projection to render a 3D scene onto a surface that approximates the lens-adjusted surface when rendered for VR headset output, avoiding many extra pixels on the periphery that would be discarded. The easiest way to understand the essence of the method is by illustration - four slightly expanded projections are used in front of each eye (in Pascal, you can use 16 projections for each eye - to more accurately simulate a curved lens) instead of one:

This approach can lead to significant performance savings. For example, a typical Oculus Rift image per eye is 1.1 megapixels. But due to the difference in projections, to render it, the original image is 2.1 megapixels - 86% more than necessary! The use of multi-projection, implemented in the Pascal architecture, allows reducing the resolution of the rendered image to 1.4 megapixels, obtaining a 1.5-fold saving in pixel processing speed, and also saves memory bandwidth.

And along with a twofold saving in geometry processing speed due to single-pass stereo rendering, the GeForce GTX 1080 graphics processor is able to provide a significant increase in VR rendering performance, which is very demanding on geometry processing speed, and even more so on pixel processing.

Improvements in video output and processing blocks

In addition to performance and new functionality related to 3D rendering, it is necessary to maintain a good level of image output, as well as video decoding and encoding. And the first Pascal architecture graphics processor did not disappoint - it supports all modern standards in this sense, including the hardware decoding of the HEVC format, which is necessary for viewing 4K videos on a PC. Also, future owners of GeForce GTX 1080 graphics cards will soon be able to enjoy streaming 4K video from Netflix and other providers on their systems.

In terms of display output, the GeForce GTX 1080 has support for HDMI 2.0b with HDCP 2.2 as well as DisplayPort. So far, the DP 1.2 version has been certified, but the GPU is ready for certification for newer versions of the standard: DP 1.3 Ready and DP 1.4 Ready. The latter allows 4K screens to be displayed at 120Hz, and 5K and 8K displays at 60Hz using a pair of DisplayPort 1.3 cables. If for the GTX 980 the maximum supported resolution was 5120x3200 at 60Hz, then for the new GTX 1080 model it has grown to 7680x4320 at the same 60Hz. The reference GeForce GTX 1080 has three DisplayPort outputs, one HDMI 2.0b and one digital Dual-Link DVI.

The new model of the Nvidia video card also received an improved block for decoding and encoding video data. Thus, the GP104 chip complies with the high standards of PlayReady 3.0 (SL3000) for streaming video playback, which allows you to be sure that playing high-quality content from well-known providers such as Netflix will be of the highest quality and energy efficient. Details about the support of various video formats during encoding and decoding are given in the table, the new product clearly differs from previous solutions for the better:

But an even more interesting novelty is support for the so-called High Dynamic Range (HDR) displays, which are about to become widespread in the market. TVs are on sale as early as 2016 (with four million HDR TVs expected to be sold in just one year), and monitors next year. HDR is the biggest breakthrough in display technology in years, delivering double the color tones (75% visible spectrum versus 33% for RGB), brighter displays (1000 nits) with higher contrast (10000:1) and rich colors.

The emergence of the ability to play content with a greater difference in brightness and richer and more saturated colors will bring the image on the screen closer to reality, the black color will become deeper, the bright light will dazzle, just like in the real world. Accordingly, users will see more detail in bright and dark areas of images compared to standard monitors and TVs.

To support HDR displays, the GeForce GTX 1080 has everything you need - 12-bit color output, support for BT.2020 and SMPTE 2084 standards, and HDMI 2.0b 10/12-bit 4K HDR output. resolution, which was the case with Maxwell. In addition, Pascal has added support for decoding the HEVC format in 4K resolution at 60 Hz and 10- or 12-bit color, which is used for HDR video, as well as encoding the same format with the same parameters, but only in 10-bit for HDR video recording or streaming. Also, the novelty is ready for DisplayPort 1.4 standardization for HDR data transmission via this connector.

By the way, HDR video encoding may be needed in the future in order to transfer such data from a home PC to a SHIELD game console that can play 10-bit HEVC. That is, the user will be able to broadcast the game from a PC in HDR format. Wait, where can I get games with such support? Nvidia is constantly working with game developers to implement this support, giving them everything they need (driver support, code samples, etc.) to correctly render HDR images that are compatible with existing displays.

At the time of release of the video card, the GeForce GTX 1080, games such as Obduction, The Witness, Lawbreakers, Rise of the Tomb Raider, Paragon, The Talos Principle and Shadow Warrior 2. But this list is expected to be replenished in the near future.

Changes to multi-chip SLI rendering

There were also some changes related to the proprietary SLI multi-chip rendering technology, although no one expected this. SLI is used by PC gaming enthusiasts to boost performance either to the extreme by running the most powerful single-chip graphics cards in tandem, or to get very high frame rates by limiting themselves to a couple of mid-range solutions that are sometimes cheaper than one top-end ( controversial decision, but they do it). With 4K monitors, players have almost no other options than installing a couple of video cards, since even top models often cannot provide a comfortable game at maximum settings in such conditions.

One of the important components of Nvidia SLI are bridges that connect video cards into a common video subsystem and serve to organize a digital channel for data transfer between them. GeForce graphics cards have traditionally featured dual SLI connectors, which served to connect between two or four graphics cards in 3-Way and 4-Way SLI configurations. Each of the video cards had to be connected to each, since all the GPUs sent the frames they rendered to the main GPU, which is why two interfaces were needed on each of the boards.

Starting with the GeForce GTX 1080, all Nvidia graphics cards based on the Pascal architecture have two SLI interfaces linked together to increase the performance of data transfer between graphics cards, and this new dual-channel SLI mode improves performance and comfort when displaying visual information on very high-resolution displays or multi-monitor systems.

For this mode, new bridges were also needed, called SLI HB. They combine a pair of GeForce GTX 1080 video cards via two SLI channels at once, although the new video cards are also compatible with older bridges. For resolutions of 1920x1080 and 2560x1440 pixels at a refresh rate of 60 Hz, standard bridges can be used, but in more demanding modes (4K, 5K and multi-monitor systems) top scores in terms of smoothness of frame changes, only new bridges will provide, although the old ones will work, but somewhat worse.

Also, when using SLI HB bridges, the GeForce GTX 1080 data interface runs at 650 MHz, compared to 400 MHz for conventional SLI bridges on older GPUs. Moreover, for some of the tough old bridges, a higher data transfer rate is also available with video chips of the Pascal architecture. With an increase in the data transfer rate between the GPU via a doubled SLI interface with an increased frequency of operation, a smoother display of frames on the screen is also provided, compared to previous solutions:

It should also be noted that support for multi-chip rendering in DirectX 12 is somewhat different from what was customary before. AT latest version graphics API, Microsoft has made many changes related to the operation of such video systems. There are two multi-GPU options available to software developers in DX12: Multi Display Adapter (MDA) and Linked Display Adapter (LDA) modes.

Moreover, the LDA mode has two forms: Implicit LDA (which Nvidia uses for SLI) and Explicit LDA (when the game developer takes on the task of managing multi-chip rendering. The MDA and Explicit LDA modes were just implemented in DirectX 12 in order to give game developers have more freedom and opportunities when using multi-chip video systems.The difference between the modes is clearly visible in the following table:

In LDA mode, the memory of each GPU can be connected to the memory of another and displayed as a large total volume, of course, with all the performance limitations when the data is taken from "foreign" memory. In MDA mode, each GPU's memory works separately, and different GPUs cannot directly access data from another GPU's memory. LDA mode is designed for multi-chip systems of similar performance, while MDA mode is less restrictive and can work together with discrete and integrated GPUs or discrete solutions with chips from different manufacturers. But this mode also requires more attention and work from developers when programming collaboration so that GPUs can communicate with each other.

By default, the GeForce GTX 1080 based SLI system supports only two GPUs, and three- and four-GPU configurations are officially deprecated, as modern games are becoming increasingly difficult to achieve performance gains from adding a third and fourth GPU. For example, many games rely on the capabilities of the system's central processor when operating multi-chip video systems, and new games increasingly use temporal (temporal) techniques that use data from previous frames, in which the efficient operation of several GPUs at once is simply impossible.

However, the operation of systems in other (non-SLI) multi-chip systems remains possible, such as MDA or LDA Explicit modes in DirectX 12 or a two-chip SLI system with a dedicated third GPU for PhysX physical effects. But what about the records in benchmarks, is Nvidia really abandoning them altogether? No, of course, but since such systems are in demand in the world by almost a few users, for such ultra-enthusiasts they came up with special key Enthusiast Key, which can be downloaded from the Nvidia website and unlock this feature. To do this, you first need to get a unique GPU ID by running a special application, then request the Enthusiast Key on the website and, after downloading it, install the key into the system, thereby unlocking the 3-Way and 4-Way SLI configurations.

Fast Sync technology

Some changes have taken place in synchronization technologies when displaying information on the display. Looking ahead, there is nothing new in G-Sync, nor is Adaptive Sync technology supported. But Nvidia decided to improve the smoothness of the output and synchronization for games that show very high performance when the frame rate significantly exceeds the refresh rate of the monitor. This is especially important for games that require minimal latency and fast response, and which are multiplayer battles and competitions.

Fast Sync is a new alternative to vertical sync that does not have visual artifacts such as tearing in the image and is not tied to a fixed refresh rate, which increases latency. What is the problem with vertical sync in games like Counter-Strike: Global Offensive? This game on powerful modern GPUs runs at several hundred frames per second, and the player has a choice whether to enable v-sync or not.

In multiplayer games, users most often chase after minimal delays and disable VSync, getting clearly visible tearing in the image, which is extremely unpleasant even at high frame rates. If you turn on v-sync, then the player will experience a significant increase in delays between his actions and the image on the screen, when the graphics pipeline slows down to the monitor's refresh rate.

This is how a traditional pipeline works. But Nvidia decided to separate the process of rendering and displaying the image on the screen using Fast Sync technology. This allows the part of the GPU that renders frames at full speed to continue to work at maximum efficiency by storing those frames in a special temporary Last Rendered Buffer.

This method allows you to change the display method and take the best from the VSync On and VSync Off modes, getting low latency, but without image artifacts. With Fast Sync, there is no frame flow control, the game engine runs in sync-off mode and is not told to wait to draw another one, so latencies are almost as low as VSync Off mode. But since Fast Sync independently selects a buffer for displaying on the screen and displays the entire frame, there are no picture breaks either.

Fast Sync uses three different buffers, the first two of which work similar to double buffering in a classic pipeline. Primary buffer (Front Buffer - FB) is a buffer, information from which is displayed on the display, a fully rendered frame. The back buffer (Back Buffer - BB) is the buffer that receives information when rendering.

When using vertical sync in high frame rate conditions, the game waits until the refresh interval is reached in order to swap the primary buffer with the secondary buffer to display the image of a single frame on the screen. This slows things down, and adding more buffers like traditional triple buffering will only add to the delay.

With Fast Sync, a third Last Rendered Buffer (LRB) is added, which is used to store all the frames that have just been rendered in the secondary buffer. The name of the buffer speaks for itself, it contains a copy of the last fully rendered frame. And when the time comes to update the primary buffer, this LRB buffer is copied to the primary in its entirety, and not in parts, as from the secondary with disabled vertical synchronization. Since copying information from buffers is inefficient, they are simply swapped (or renamed, as it will be more convenient to understand), and the new logic of swapping buffers, introduced in GP104, manages this process.

In practice, the inclusion of a new synchronization method Fast Sync still provides a slightly larger delay compared to completely disabled vertical synchronization - an average of 8 ms more, but it displays frames on the monitor in its entirety, without unpleasant artifacts on the screen that tear the image. The new method can be enabled from the Nvidia control panel graphics settings in the vertical sync control section. However, the default value remains application control, and you simply do not need to enable Fast Sync in all 3D applications, it is better to choose this method specifically for games with high FPS.

Virtual reality technology Nvidia VRWorks

We've touched on the hot topic of VR more than once in this article, but it's mostly been about boosting framerates and ensuring low latency, which is very important for VR. All this is very important and there is indeed progress, but so far VR games look nowhere near as impressive as the best of the "regular" modern 3D games. This happens not only because leading game developers have not yet been particularly involved in VR applications, but also because VR is more demanding on frame rates, which prevents the use of many of the usual techniques in such games due to high demands.

In order to reduce the difference in quality between VR games and regular games, Nvidia decided to release a whole package of related VRWorks technologies, which included a large number of APIs, libraries, engines and technologies that can significantly improve both the quality and performance of VR- applications. How does this relate to the announcement of the first gaming solution in Pascal? It's very simple - some technologies have been introduced into it that help increase productivity and improve quality, and we have already written about them.

And although it concerns not only graphics, first we will talk a little about it. The set of VRWorks Graphics technologies includes the previously mentioned technologies, such as Lens Matched Shading, using the multi-projection feature that appeared in the GeForce GTX 1080. The new product allows you to get a performance increase of 1.5-2 times in relation to solutions that do not have such support. We also mentioned other technologies, such as MultiRes Shading, designed to render at different resolutions in the center of the frame and on its periphery.

But much more unexpected was the announcement of VRWorks Audio technology, designed for high-quality calculation of sound data in 3D scenes, which is especially important in virtual reality systems. In conventional engines, the positioning of sound sources in a virtual environment is calculated quite correctly, if the enemy shoots from the right, then the sound is louder from this side of the audio system, and such a calculation is not too demanding on computing power.

But in reality, sounds go not only towards the player, but in all directions and bounce off various materials, similar to how light rays bounce. And in reality, we hear these reflections, although not as clearly as direct sound waves. These indirect sound reflections are usually simulated by special reverb effects, but this is a very primitive approach to the task.

VRWorks Audio uses sound wave rendering similar to ray tracing in rendering, where the path of light rays is traced to multiple reflections from objects in a virtual scene. VRWorks Audio also simulates the propagation of sound waves in the environment when direct and reflected waves are tracked, depending on their angle of incidence and the properties of reflective materials. In its work, VRWorks Audio uses the high-performance Nvidia OptiX ray tracing engine known for graphic tasks. OptiX can be used for a variety of tasks, such as indirect lighting calculation and lightmapping, and now also for sound wave tracing in VRWorks Audio.

Nvidia has built accurate sound wave calculation into its VR Funhouse demo, which uses several thousand rays and calculates up to 12 reflections from objects. And in order to learn the advantages of the technology using a clear example, we suggest you watch a video about the operation of the technology in Russian:

It is important that Nvidia's approach differs from traditional sound engines, including the hardware-accelerated method from the main competitor using a special block in the GPU. All of these methods provide only accurate positioning of sound sources, but do not calculate the reflections of sound waves from objects in a 3D scene, although they can simulate this using the reverb effect. However, the use of ray tracing technology can be much more realistic, since only such an approach will provide an accurate imitation of various sounds, taking into account the size, shape and materials of objects in the scene. It is difficult to say whether such computational accuracy is required for a typical player, but we can say for sure: in VR, it can add to users the very realism that is still lacking in conventional games.

Well, it remains for us to tell only about the VR SLI technology, which works in both OpenGL and DirectX. Its principle is extremely simple: a two-GPU video system in a VR application will work in such a way that each eye is allocated a separate GPU, as opposed to the AFR rendering familiar to SLI configurations. This greatly improves the overall performance, which is so important for virtual reality systems. Theoretically, more GPUs can be used, but their number must be even.

This approach was required because AFR is not well suited for VR, since with its help the first GPU will draw an even frame for both eyes, and the second one will render an odd one, which does not reduce the delays that are critical for virtual reality systems. Although the frame rate will be quite high. So with the help of VR SLI, work on each frame is divided into two GPUs - one works on part of the frame for the left eye, the second for the right, and then these halves of the frame are combined into a whole.

Splitting work like this between a pair of GPUs brings about a 2x performance boost, allowing for higher frame rates and lower latency compared to systems based on a single GPU. True, the use of VR SLI requires special support from the application in order to use this scaling method. But VR SLI technology is already built into VR demo apps like Valve's The Lab and ILMxLAB's Trials on Tatooine, and that's just the beginning - Nvidia promises other apps to come soon, as well as bringing the technology to Unreal Engine 4, Unity, and Max Play.

Ansel Game Screenshot Platform

One of the most interesting announcements related to the software was the release of a technology for capturing high-quality screenshots in gaming applications, named after one famous photographer - Ansel. Games have long been not just games, but also a place to use playful hands for various creative personalities. Someone changes scripts for games, someone releases high-quality texture sets for games, and someone makes beautiful screenshots.

Nvidia decided to help the latter by introducing a new platform for creating (namely, creating, because this is not such an easy process) high-quality shots from games. They believe that Ansel can help create a new kind of contemporary art. After all, there are already quite a few artists who spend most of their lives on the PC, creating beautiful screenshots from games, and they still did not have a convenient tool for this.

Ansel allows you to not only capture an image in the game, but change it as the creator needs. Using this technology, you can move the camera around the scene, rotate and tilt it in any direction in order to obtain the desired composition of the frame. For example, in games like first-person shooters, you can only move the player, you can’t really change anything else, so all the screenshots are pretty monotonous. With a free camera in Ansel, you can go far beyond the gaming camera by choosing the angle you need for a good picture, or even capture a full 360-degree stereo image from the required point, and in high resolution for later viewing in a VR helmet.

Ansel works quite simply - with the help of a special library from Nvidia, this platform is embedded in the game code. To do this, its developer only needs to add a small piece of code to his project to allow the Nvidia video driver to intercept buffer and shader data. There is very little work to be done, bringing Ansel into the game takes less than one day to implement. So, the inclusion of this feature in The Witness took about 40 lines of code, and in The Witcher 3 - about 150 lines of code.

Ansel will come with an open development package - SDK. The main thing is that the user gets with him a standard set of settings that allow him to change the position and angle of the camera, add effects, etc. The Ansel platform works like this: it pauses the game, turns on the free camera and allows you to change the frame to the desired view by recording the result in the form of a regular screenshot, a 360-degree image, a stereo pair, or just a panorama of high resolution.

The only caveat is that not all games will receive support for all the features of the Ansel game screenshot platform. Some of the game developers, for one reason or another, do not want to include a completely free camera in their games - for example, because of the possibility of cheaters using this functionality. Or they want to limit the change in viewing angle for the same reason - so that no one gets an unfair advantage. Well, or so that users do not see miserable sprites in the background. All this is quite normal desires of game creators.

One of the most interesting features of Ansel is the creation of screenshots of simply huge resolution. It doesn't matter that the game supports resolutions up to 4K, for example, and the user's monitor is Full HD. Using the screenshot platform, you can capture a much higher quality image, limited rather by the size and performance of the drive. The platform captures screenshots up to 4.5 gigapixels with ease, stitched together from 3600 pieces!

It is clear that in such pictures you can see all the details, up to the text on the newspapers lying in the distance, if such a level of detail is provided in principle in the game - Ansel can also control the level of detail, setting the maximum level to get the best picture quality. But you can still enable supersampling. All this allows you to create images from games that you can safely print on large banners and be calm about their quality.

Interestingly, a special hardware-accelerated code based on CUDA is used to stitch large images. After all, no video card can render a multi-gigapixel image in its entirety, but it can do it in pieces, which you just need to combine later, taking into account the possible difference in lighting, color, and so on.

After stitching such panoramas, a special post-processing is used for the entire frame, also accelerated on the GPU. And to capture images in a higher dynamic range, you can use a special image format - EXR, an open standard from Industrial Light and Magic, the color values ​​​​in each channel of which are recorded in 16-bit floating point format (FP16).

This format allows you to change the brightness and dynamic range of the image in post-processing, bringing it to the desired for each specific display in the same way as it is done with RAW formats from cameras. And for the subsequent use of post-processing filters in image processing programs, this format is very useful, since it contains much more data than the usual image formats.

But the Ansel platform itself contains a lot of post-processing filters, which is especially important because it has access not only to the final image, but also to all the buffers used by the game when rendering, which can be used for very interesting effects, like depth of field. To do this, Ansel has a special post-processing API, and any of the effects can be included in the game with support for this platform.

Ansel postfilters include: color curves, color space, transformation, desaturation, brightness/contrast, film grain, bloom, lens flare, anamorphic glare, distortion, heathaze, fisheye, color aberration, tone mapping, lens dirt, lightshafts , vignette, gamma correction, convolution, sharpening, edge detection, blur, sepia, denoise, FXAA and others.

As for the appearance of Ansel support in games, then we will have to wait a bit until the developers implement and test it. But Nvidia promises that such support will soon appear in such well-known games as The Division, The Witness, Lawbreakers, The Witcher 3, Paragon, Fortnite, Obduction, No Man's Sky, Unreal Tournament and others.

The new 16nm FinFET process technology and architecture optimizations have allowed the GeForce GTX 1080 graphics card based on the GP104 GPU to achieve a high clock speed of 1.6-1.7 GHz even in the reference form, and the new generation guarantees the highest possible frequencies in games GPU Boost technologies. Together with an increased number of execution units, these improvements make it not only the highest performing single-chip graphics card of all time, but also the most energy efficient solution on the market.

The GeForce GTX 1080 is the first graphics card to feature the new GDDR5X graphics memory, a new generation of high-speed chips that achieve very high data rates. In the case of a modified GeForce GTX 1080, this type of memory operates at an effective frequency of 10 GHz. Combined with improved framebuffer compression algorithms, this resulted in a 1.7x increase in effective memory bandwidth for this GPU compared to its direct predecessor, the GeForce GTX 980.

Nvidia prudently decided not to release a radically new architecture on a completely new process technology, so as not to run into unnecessary problems in development and production. Instead, they seriously improved the already good and very efficient Maxwell architecture by adding some features. As a result, everything is fine with the production of new GPUs, and in the case of the GeForce GTX 1080 model, engineers have achieved a very high frequency potential - in overclocked versions from partners, the GPU frequency is expected up to 2 GHz! Such an impressive frequency became a reality thanks to the perfect technical process and painstaking work of Nvidia engineers in the development of the Pascal GPU.

And while Pascal is a direct follower of Maxwell, and these graphics architectures are fundamentally not too different from each other, Nvidia has introduced many changes and improvements, including display capabilities, video encoding and decoding engine, improved asynchronous execution of various types of calculations on the GPU, made changes to multi-chip rendering and introduced a new synchronization method, Fast Sync.

It is impossible not to highlight the Simultaneous Multi-Projection technology, which helps to improve performance in virtual reality systems, get more correct display of scenes on multi-monitor systems, and introduce new performance optimization techniques. But VR applications will see the greatest speed boost when they support multi-projection technology, which helps to save GPU resources by half when processing geometric data and by one and a half times in per-pixel calculations.

Among the purely software changes, the platform for creating screenshots in games called Ansel stands out - it will be interesting to try it in practice not only for those who play a lot, but also for those who are simply interested in high-quality 3D graphics. The novelty allows you to advance the art of creating and retouching screenshots to a new level. Well, such packages for game developers as GameWorks and VRWorks, Nvidia just continues to improve step by step - so, in the latter, an interesting possibility of high-quality sound calculation has appeared, taking into account numerous reflections of sound waves using hardware ray tracing.

In general, in the form of the Nvidia GeForce GTX 1080 video card, a real leader entered the market, having all the necessary qualities for this: high performance and wide functionality, as well as support for new features and algorithms. Early adopters of this graphics card will be able to experience many of these benefits right away, while other features of the solution will be revealed a little later, when there is widespread support from the software. The main thing is that the GeForce GTX 1080 turned out to be very fast and efficient, and, as we really hope, Nvidia engineers managed to fix some of the problem areas (the same asynchronous calculations).

Graphics accelerator GeForce GTX 1070

ParameterMeaning
Chip code nameGP104
Production technology16nm FinFET
Number of transistors7.2 billion
Core area314 mm²
ArchitectureUnified, with an array of common processors for stream processing of numerous types of data: vertices, pixels, etc.
DirectX hardware supportDirectX 12, with support for Feature Level 12_1
Memory bus256-bit: eight independent 32-bit memory controllers supporting GDDR5 and GDDR5X memory
GPU frequency1506 (1683) MHz
Computing blocks15 active (out of 20 in the chip) streaming multiprocessors, including 1920 (out of 2560) scalar ALUs for floating point calculations within the framework of the IEEE 754-2008 standard;
Texturing blocks120 active (out of 160 in the chip) texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Raster Operations Units (ROPs)8 wide ROPs (64 pixels) with support for various anti-aliasing modes, including programmable and with FP16 or FP32 frame buffer format. Blocks consist of an array of configurable ALUs and are responsible for depth generation and comparison, multisampling and blending
Monitor supportIntegrated support for up to four monitors connected via Dual Link DVI, HDMI 2.0b and DisplayPort 1.2 (1.3/1.4 Ready)

GeForce GTX 1070 Reference Graphics Specifications
ParameterMeaning
Core frequency1506 (1683) MHz
Number of universal processors1920
Number of texture blocks120
Number of blending blocks64
Effective memory frequency8000 (4×2000) MHz
Memory typeGDDR5
Memory bus256-bit
Memory8 GB
Memory Bandwidth256 GB/s
Computing performance (FP32)about 6.5 teraflops
Theoretical maximum fill rate96 gigapixels/s
Theoretical texture sampling rate181 gigatexels/s
TirePCI Express 3.0
ConnectorsOne Dual Link DVI, one HDMI and three DisplayPort
power usageup to 150 W
Extra foodOne 8-pin connector
Number of slots occupied in the system chassis2
Recommended price$379-449 (US), 34,990 (Russia)

The GeForce GTX 1070 video card also received a logical name similar to the same solution from the previous GeForce series. It differs from its direct predecessor GeForce GTX 970 only in a changed generation number. The novelty becomes a step below the current top solution GeForce GTX 1080 in the current line of the company, which became the temporary flagship of the new series until the release of even more powerful GPU solutions.

Suggested prices for Nvidia's new top-end graphics card are $379 and $449 for Nvidia partner regular and Founders Editions, respectively. Compared to the top model, this is a very good price given that the GTX 1070 is about 25% behind it at worst. And at the time of the announcement and release, the GTX 1070 becomes the best performance solution in its class. Like the GeForce GTX 1080, the GTX 1070 has no direct competitors from AMD, and can only be compared with the Radeon R9 390X and Fury.

The GP104 GPU in the GeForce GTX 1070 modification decided to leave a full 256-bit memory bus, although they did not use a new type of GDDR5X memory, but a very fast GDDR5, which operates at a high effective frequency of 8 GHz. The amount of memory installed on a video card with such a bus can be 4 or 8 GB, and in order to ensure maximum performance of the new solution in conditions of high settings and rendering resolutions, the GeForce GTX 1070 video card model was also equipped with 8 GB of video memory, like its older sister. This volume is enough to run any 3D applications with maximum quality settings for several years.

GeForce GTX 1070 Founders Edition

With the announcement of the GeForce GTX 1080 in early May, a special edition of the video card called Founders Edition was announced, which has a higher price than regular video cards from the company's partners. The same applies to the novelty. In this article, we will again talk about a special edition of the GeForce GTX 1070 video card called Founders Edition. As in the case of the older model, Nvidia decided to release this version of the manufacturer's reference video card at a higher price. They claim that many gamers and enthusiasts who buy expensive top-end graphics cards want a product with an appropriate "premium" look and feel.

Accordingly, it is for such users that the GeForce GTX 1070 Founders Edition video card will be released to the market, which is designed and manufactured by Nvidia engineers from premium materials and components, such as the GeForce GTX 1070 Founders Edition aluminum cover, as well as a low-profile back plate that covers the back of the PCB and quite popular among enthusiasts.

As you can see from the photos of the board, the GeForce GTX 1070 Founders Edition inherited exactly the same industrial design from the reference version of the GeForce GTX 1080 Founders Edition. Both models use a radial fan that blows heated air out, which is very useful in both small cases and multi-chip SLI configurations with limited physical space. By blowing heated air out instead of circulating it inside the case, you can reduce thermal stress, improve overclocking results, and extend the life of system components.

Under the cover of the GeForce GTX 1070 reference cooling system, there is a specially shaped aluminum radiator with three built-in copper heat pipes that remove heat from the GPU itself. The heat dissipated by the heat pipes is then dissipated by an aluminum heatsink. Well, the low-profile metal plate on the back of the board is also designed to provide better thermal performance. It also features a retractable section for better airflow between multiple graphics cards in SLI configurations.

As for the board's power system, the GeForce GTX 1070 Founders Edition has a four-phase power system optimized for a stable power supply. Nvidia claims that the use of special components in the GTX 1070 Founders Edition improves power efficiency, stability, and reliability over the GeForce GTX 970, delivering better overclocking performance. In the company's own tests, the GeForce GTX 1070 GPUs easily surpassed 1.9 GHz, which is close to the results of the older GTX 1080 model.

The Nvidia GeForce GTX 1070 graphics card will be available in retail stores starting June 10th. The recommended prices for the GeForce GTX 1070 Founders Edition and partner solutions are different, and this is the main question for this special edition. If Nvidia partners sell their GeForce GTX 1070 graphics cards starting at $379 (in the US market), then Nvidia's reference design Founders Edition will cost as little as $449. Are there many enthusiasts who are ready to overpay for, let's face it, the dubious advantages of the reference version? Time will tell, but we believe that the reference fee is more interesting as an option available for purchase at the very beginning of sales, and later the point of acquiring it (and even at a high price!) is already reduced to zero.

It remains to add that the printed circuit board of the reference GeForce GTX 1070 is similar to that of the older video card, and both of them differ from the device of the company's previous boards. The typical power consumption value for the new product is 150 W, which is almost 20% less than the value for the GTX 1080 and close to the power consumption of the previous generation GeForce GTX 970 video card. The Nvidia reference board has a familiar set of connectors for connecting image output devices: one Dual-Link DVI , one HDMI and three DisplayPort. Moreover, there is support for new versions of HDMI and DisplayPort, which we wrote about above in the review of the GTX 1080 model.

Architectural changes

The GeForce GTX 1070 is based on the GP104 chip, the first of a new generation of Nvidia's Pascal graphics architecture. This architecture was based on the solutions developed back in Maxwell, but it also has some functional differences, which we wrote about in detail above - in the part devoted to the top GeForce GTX 1080 video card.

The main change of the new architecture was the technological process by which all new GPUs will be executed. The use of the 16 nm FinFET manufacturing process in the production of GP104 made it possible to significantly increase the complexity of the chip while maintaining a relatively low area and cost, and the very first chip of the Pascal architecture has a significantly larger number of execution units, including those providing new functionality, compared to Maxwell chips of similar positioning.

The GP104 video chip is similar in its design to similar Maxwell architecture solutions, and you can find detailed information about the design of modern GPUs in our reviews of previous Nvidia solutions. Like previous GPUs, the chips of the new architecture will have a different configuration of Graphics Processing Cluster (GPC), Streaming Multiprocessor (SM) and memory controllers, and some changes have already occurred in the GeForce GTX 1070 - part of the chip was locked and inactive ( highlighted in grey):

Although the GP104 GPU includes four GPC clusters and 20 SM multiprocessors, in the version for the GeForce GTX 1070 it received a stripped-down modification with one GPC cluster disabled by hardware. Since each GPC cluster has a dedicated rasterization engine and includes five SMs, and each multiprocessor consists of 128 CUDA cores and eight texture TMUs, 1920 CUDA cores and 120 TMUs of 2560 stream processors are active in this version of GP104 and 160 physical texture units.

The graphics processor on which the GeForce GTX 1070 is based contains eight 32-bit memory controllers, resulting in a total 256-bit memory bus - exactly like in the case of the older GTX 1080 model. The memory subsystem has not been trimmed in order to provide a sufficiently high bandwidth memory with the condition of using GDDR5 memory in the GeForce GTX 1070. Each of the memory controllers has eight ROPs and 256 KB of L2 cache, so the GP104 chip in this modification also contains 64 ROPs and 2048 KB of L2 cache level.

Thanks to architectural optimizations and a new process technology, the GP104 GPU has become the most energy efficient GPU to date. Nvidia engineers were able to increase the clock speed more than they expected when moving to a new process, for which they had to work hard, carefully checking and optimizing all the bottlenecks of previous solutions that did not allow them to work at a higher frequency. Accordingly, the GeForce GTX 1070 also operates at a very high frequency, more than 40% higher than the reference value for the GeForce GTX 970.

Since the GeForce GTX 1070 is, in essence, just a slightly less productive GTX 1080 with GDDR5 memory, it supports absolutely all the technologies we described in the previous section. For more details about the Pascal architecture, as well as the technologies it supports, such as improved output and video processing units, Async Compute support, Simultaneous Multi-Projection technology, changes in SLI multi-chip rendering, and the new Fast Sync synchronization type, it is worth reading with a section on the GTX 1080.

High-performance GDDR5 memory and its efficient use

We wrote above about changes in the memory subsystem of the GP104 GPU, on which the GeForce GTX 1080 and GTX 1070 models are based - the memory controllers included in this GPU support both the new type of GDDR5X video memory, which is described in detail in the GTX 1080 review, as well as and the good old GDDR5 memory that we have known for several years now.

In order not to lose too much in memory bandwidth in the younger GTX 1070 compared to the older GTX 1080, all eight 32-bit memory controllers were left active in it, getting a full 256-bit common video memory interface. In addition, the video card was equipped with the fastest GDDR5 memory available on the market - with an effective operating frequency of 8 GHz. All this provided a memory bandwidth of 256 GB / s, in contrast to 320 GB / s for the older solution - the computing capabilities were cut by about the same amount, so the balance was maintained.

Keep in mind that while peak theoretical bandwidth is important for GPU performance, you need to pay attention to its efficiency as well. During the rendering process, many different bottlenecks can limit the overall performance, preventing the use of all available memory bandwidth. To minimize these bottlenecks, GPUs use special lossless data compression to improve the efficiency of data reads and writes.

The fourth generation of delta compression of buffer information has already been introduced in the Pascal architecture, which allows the GPU to more efficiently use the available capabilities of the video memory bus. The memory subsystem in the GeForce GTX 1070 and GTX 1080 uses improved old and several new lossless data compression techniques designed to reduce bandwidth requirements. This reduces the amount of data written to memory, improves L2 cache efficiency, and reduces the amount of data sent between different points on the GPU, like the TMU and the framebuffer.

GPU Boost 3.0 and overclocking features

Most of Nvidia's partners have already announced factory-overclocked solutions based on the GeForce GTX 1080 and GTX 1070. And many of the video card manufacturers also create special overclocking utilities that allow you to use the new functionality of GPU Boost 3.0 technology. One example of such utilities is EVGA Precision XOC, which includes an automatic scanner to determine the voltage-to-frequency curve - in this mode, for each voltage, by running a stability test, a stable frequency is found at which the GPU provides a performance boost. However, this curve can also be changed manually.

We know GPU Boost technology well from previous Nvidia graphics cards. In their GPUs, they use this hardware feature, which is designed to increase the operating clock speed of the GPU in modes where it has not yet reached the limits of power consumption and heat dissipation. In Pascal GPUs, this algorithm has undergone several changes, the main of which is a finer setting of turbo frequencies, depending on the voltage.

If earlier the difference between the base frequency and the turbo frequency was fixed, then in GPU Boost 3.0 it became possible to set turbo frequency offsets for each voltage separately. Now the turbo frequency can be set for each of the individual voltage values, which allows you to fully squeeze all the overclocking capabilities out of the GPU. We wrote about this feature in detail in the GeForce GTX 1080 review, and you can use the EVGA Precision XOC and MSI Afterburner utilities for this.

Since some details have changed in the overclocking methodology with the release of video cards with support for GPU Boost 3.0, Nvidia had to make additional explanations in the instructions for overclocking new products. There are different overclocking techniques with different variable characteristics that affect the final result. For each particular system, a particular method may be better suited, but the basics are always about the same.

Many of the overclockers use the Unigine Heaven 4.0 benchmark to check the stability of the system, which perfectly loads the GPU with work, has flexible settings and can be run in windowed mode along with an overclocking and monitoring utility window next to it, like EVGA Precision or MSI Afterburner. However, such a check is enough only for initial estimates, and to firmly confirm the stability of overclocking, it must be checked in several gaming applications, because different games require different loads on different functional units of the GPU: mathematical, texture, geometric. The Heaven 4.0 benchmark is also convenient for overclocking because it has a looped mode of operation, in which it is convenient to change overclocking settings, and there is a benchmark for evaluating the speed increase.

Nvidia advises running Heaven 4.0 and EVGA Precision XOC windows together when overclocking the new GeForce GTX 1080 and GTX 1070 graphics cards. At first, it is desirable to immediately increase the fan speed. And for serious overclocking, you can immediately set the speed value to 100%, which will make the video card work very loud, but it will cool the GPU and other components of the video card as much as possible, reducing the temperature to a minimum. possible level, preventing throttling (frequency reduction due to GPU temperature rising above a certain value).

Next, you need to set the target power value (Power Target) also to the maximum. This setting will provide the GPU with the maximum amount of power possible by increasing the power consumption level and the target temperature of the GPU (GPU Temp Target). For some purposes, the second value can be separated from the Power Target change, and then these settings can be adjusted individually - to achieve less heating of the video chip, for example.

The next step is to increase the GPU Clock Offset value - it means how much higher the turbo frequency will be during operation. This value raises the frequency for all voltages and results in better performance. As usual, when overclocking, you need to check the stability when increasing the frequency of the GPU in small steps - from 10 MHz to 50 MHz per step before you notice a hang, driver or application error, or even visual artifacts. When this limit is reached, you should reduce the frequency value by a step down and once again check the stability and performance during overclocking.

In addition to the GPU frequency, you can also increase the video memory frequency (Memory Clock Offset), which is especially important in the case of the GeForce GTX 1070 equipped with GDDR5 memory, which usually overclocks well. The process in the case of the memory frequency exactly repeats what is done when finding a stable GPU frequency, the only difference is that the steps can be made larger - add 50-100 MHz to the base frequency at once.

In addition to the above steps, you can also increase the Overvoltage limit, because a higher GPU frequency is often achieved at increased voltage, when unstable parts of the GPU receive additional power. True, a potential disadvantage of increasing this value is the possibility of damaging the video chip and its accelerated failure, so you need to use a voltage increase with extreme caution.

Overclocking enthusiasts use slightly different techniques, changing the parameters in a different order. For example, some overclockers share experiments on finding a stable GPU and memory frequency so that they do not interfere with each other, and then test the combined overclocking of both the video chip and memory chips, but these are already insignificant details of an individual approach.

Judging by the opinions in the forums and comments on articles, some users did not like the new GPU Boost 3.0 operation algorithm, when the GPU frequency first rises very high, often higher than the turbo frequency, but then, under the influence of an increase in GPU temperature or increased power consumption above the set limit, it can drop to much lower values. This is just the specifics of the updated algorithm, you need to get used to the new behavior of the dynamically changing GPU frequency, but it does not have any negative consequences.

The GeForce GTX 1070 is the second model after the GTX 1080 in Nvidia's new line of graphics processors based on the Pascal family. The new 16nm FinFET manufacturing process and architecture optimizations have enabled this graphics card to achieve high clock speeds, which is supported by the new generation of GPU Boost technology. Even though the number of functional blocks in the form of stream processors and texture modules has been reduced, their number remains sufficient for the GTX 1070 to become the most profitable and energy efficient solution.

Installing GDDR5 memory on the youngest of a pair of released models of Nvidia video cards on a GP104 chip, unlike the new type of GDDR5X that distinguishes the GTX 1080, does not prevent it from reaching high performance performance. Firstly, Nvidia decided not to cut the memory bus of the GeForce GTX 1070 model, and secondly, they put the fastest GDDR5 memory on it with an effective frequency of 8 GHz, which is only slightly lower than 10 GHz for the GDDR5X used in the older model. In addition, with improved delta compression algorithms, the effective memory bandwidth of the GPU has become higher than the same parameter for a similar model of the previous generation GeForce GTX 970.

The GeForce GTX 1070 is good in that it offers very high performance and support for new features and algorithms at a much lower price compared to the older model announced a little earlier. If a few enthusiasts can afford to buy a GTX 1080 for 55,000, then a much larger circle of potential buyers will be able to pay 35,000 for only a quarter of a less productive solution with exactly the same capabilities. It was the combination of relatively low price and high performance that made the GeForce GTX 1070 perhaps the most profitable purchase at the time of its release.

Graphics accelerator GeForce GTX 1060

ParameterMeaning
Chip code nameGP106
Production technology16nm FinFET
Number of transistors4.4 billion
Core area200 mm²
ArchitectureUnified, with an array of common processors for stream processing of numerous types of data: vertices, pixels, etc.
DirectX hardware supportDirectX 12, with support for Feature Level 12_1
Memory bus192-bit: six independent 32-bit memory controllers supporting GDDR5 memory
GPU frequency1506 (1708) MHz
Computing blocks10 streaming multiprocessors, including 1280 scalar ALUs for floating point calculations within the IEEE 754-2008 standard;
Texturing blocks80 texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Raster Operations Units (ROPs)6 wide ROPs (48 pixels) with support for various anti-aliasing modes, including programmable and with FP16 or FP32 frame buffer format. Blocks consist of an array of configurable ALUs and are responsible for depth generation and comparison, multisampling and blending
Monitor supportIntegrated support for up to four monitors connected via Dual Link DVI, HDMI 2.0b and DisplayPort 1.2 (1.3/1.4 Ready)

GeForce GTX 1060 Reference Graphics Specifications
ParameterMeaning
Core frequency1506 (1708) MHz
Number of universal processors1280
Number of texture blocks80
Number of blending blocks48
Effective memory frequency8000 (4×2000) MHz
Memory typeGDDR5
Memory bus192-bit
Memory6 GB
Memory Bandwidth192 GB/s
Computing performance (FP32)about 4 teraflops
Theoretical maximum fill rate72 gigapixels/s
Theoretical texture sampling rate121 gigatexels/s
TirePCI Express 3.0
ConnectorsOne Dual Link DVI, one HDMI and three DisplayPort
Typical Power Consumption120 W
Extra foodOne 6-pin connector
Number of slots occupied in the system chassis2
Recommended price$249 ($299) in the US and 18,990 in Russia

The GeForce GTX 1060 video card also received a name similar to the same solution from the previous GeForce series, differing from the name of its direct predecessor GeForce GTX 960 only by the changed first digit of the generation. The novelty has become in the current line of the company one step lower than the previously released GeForce GTX 1070 solution, which is average in terms of speed in the new series.

The recommended prices for Nvidia's new video card are $249 and $299 for the regular versions of the company's partners and for the special Founder's Edition, respectively. Compared to the two older models, this is a very favorable price, since the new GTX 1060 model, although inferior to top-end motherboards, is nowhere near as much as it is cheaper. At the time of the announcement, the novelty definitely became the best performance solution in its class and one of the most profitable offers in this price range.

This model of Nvidia's Pascal family video card came out to counter the fresh decision of the rival company AMD, which released the Radeon RX 480 a little earlier. You can compare the new Nvidia video card with this video card, although not quite directly, since they still differ quite significantly in price . The GeForce GTX 1060 is more expensive ($249-299 versus $199-229), but it's also clearly faster than its competitor.

The GP106 graphics processor has a 192-bit memory bus, so the amount of memory installed on a video card with such a bus can be 3 or 6 GB. A smaller value in modern conditions is frankly not enough, and many game projects, even in Full HD resolution, will run into a lack of video memory, which will seriously affect the smoothness of rendering. To ensure maximum performance of the new solution at high settings, the GeForce GTX 1060 model was equipped with 6 GB of video memory, which is enough to run any 3D applications with any quality settings. Moreover, today there is simply no difference between 6 and 8 GB, and such a solution will save some money.

The typical power consumption value for the new product is 120 W, which is 20% less than the value for the GTX 1070 and is equal to the power consumption of the previous generation GeForce GTX 960 graphics card, which has much lower performance and capabilities. The reference board has the usual set of connectors for connecting image output devices: one Dual-Link DVI, one HDMI and three DisplayPort. Moreover, there was support for new versions of HDMI and DisplayPort, which we wrote about in the review of the GTX 1080 model.

The length of the GeForce GTX 1060 reference board is 9.8 inches (25 cm), and from the differences from the older options, we separately note that the GeForce GTX 1060 does not support the SLI multi-chip rendering configuration, and does not have a special connector for this. Since the board consumes less power than older models, one 6-pin PCI-E external power connector was installed on the board for additional power.

GeForce GTX 1060 video cards have appeared on the market since the day of the announcement in the form of products from the company's partners: Asus, EVGA, Gainward, Gigabyte, Innovision 3D, MSI, Palit, Zotac. A special edition of the GeForce GTX 1060 Founder's Edition, produced by Nvidia itself, will be released in limited quantities, which will be sold at a price of $299 exclusively on the Nvidia website and will not be officially presented in Russia. The Founder's Edition is distinguished by the fact that it is made of high quality materials and components, including an aluminum case, and uses an efficient cooling system, as well as low resistance power circuits and specially designed voltage regulators.

Architectural changes

The GeForce GTX 1060 video card is based on a completely new graphics processor model GP106, which is functionally no different from the first-born of the Pascal architecture in the form of the GP104 chip, on which the GeForce GTX 1080 and GTX 1070 models described above are based. This architecture was based on solutions worked out back in Maxwell, but it also has some functional differences, which we wrote about in detail earlier.

The GP106 video chip is similar in its design to the top-end Pascal chip and similar solutions of the Maxwell architecture, and you can find detailed information about the design of modern GPUs in our reviews of previous Nvidia solutions. Like previous GPUs, the new architecture chips have a different configuration of Graphics Processing Cluster (GPC), Streaming Multiprocessor (SM) and memory controllers:

The GP106 graphics processor incorporates two GPC clusters, consisting of 10 streaming multiprocessors (Streaming Multiprocessor - SM), that is, exactly half of the GP104. As in the older GPU, each of the multiprocessors contains 128 cores, 8 TMU texture units, 256 KB of register memory, 96 KB shared memory and 48 KB of L1 cache. As a result, the GeForce GTX 1060 contains a total of 1,280 compute cores and 80 texture units, half that of the GTX 1080.

But the memory subsystem of the GeForce GTX 1060 was not halved relative to the top solution, it contains six 32-bit memory controllers, giving the final 192-bit memory bus. With an effective frequency of GDDR5 video memory for the GeForce GTX 1060 equal to 8 GHz, the bandwidth reaches 192 GB / s, which is quite good for a solution in this price segment, especially considering the high efficiency of its use in Pascal. Each of the memory controllers has eight ROPs and 256 KB of L2 cache associated with it, so in total the full version of the GP106 GPU contains 48 ROPs and 1536 KB of L2 cache.

To reduce memory bandwidth requirements and make more efficient use of the available Pascal architecture, lossless on-chip data compression has been further improved, which is able to compress data in buffers, gaining efficiency and performance gains. In particular, new 4:1 and 8:1 delta compression methods have been added to the new family of chips, providing an additional 20% to the bandwidth efficiency compared to previous solutions of the Maxwell family.

The base frequency of the new GPU is 1506 MHz - the frequency should not fall below this mark in principle. The typical Boost Clock is much higher, at 1708 MHz, which is the average of the real frequency that the GeForce GTX 1060 graphics chip runs at in a wide range of games and 3D applications. The actual Boost frequency depends on the game and the conditions in which the test takes place.

Like the rest of the solutions of the Pascal family, the GeForce GTX 1060 model not only operates at a high clock frequency, providing high performance, but also has a decent margin for overclocking. The first experiments indicate the possibility of reaching frequencies of the order of 2 GHz. It is not surprising that the company's partners are also preparing factory overclocked versions of the GTX 1060 video card.

So, the main change in the new architecture was the 16 nm FinFET process, the use of which in the production of GP106 made it possible to significantly increase the complexity of the chip while maintaining a relatively low area of ​​​​200 mm², so this Pascal architecture chip has a significantly larger number of execution units compared to a Maxwell chip of similar positioning produced using the 28 nm process technology.

If the GM206 (GTX 960) with an area of ​​​​227 mm² had 3 billion transistors and 1024 ALUs, 64 TMUs, 32 ROPs and a 128-bit bus, then the new GPU contained 4.4 billion transistors, 1280 ALUs, in 200 mm², 80 TMUs and 48 ROPs with a 192-bit bus. Moreover, at almost one and a half times higher frequency: 1506 (1708) versus 1126 (1178) MHz. And this is with the same power consumption of 120 watts! As a result, the GP106 GPU has become one of the most energy efficient GPUs, along with the GP104.

New Nvidia Technologies

One of the most interesting technologies of the company, which is supported by the GeForce GTX 1060 and other solutions of the Pascal family, is the technology Nvidia Simultaneous Multi-Projection. We already wrote about this technology in the GeForce GTX 1080 review, it allows you to use several new techniques to optimize rendering. In particular - to simultaneously project a VR image for two eyes at once, significantly increasing the efficiency of using the GPU in virtual reality.

To support SMP, all GPUs of the Pascal family have a special engine, which is located in the PolyMorph Engine at the end of the geometric pipeline before the rasterizer. With it, the GPU can simultaneously project a geometric primitive onto several projections from one point, while these projections can be stereo (ie, up to 16 or 32 projections are supported simultaneously). This capability allows Pascal GPUs to accurately reproduce a curved surface for VR rendering, as well as display correctly on multi-monitor systems.

It is important that Simultaneous Multi-Projection technology is already being integrated into popular game engines (Unreal Engine and Unity) and games, and to date, support for the technology has been announced for more than 30 games in development, including such well-known projects as Unreal Tournament , Poolnation VR, Everest VR, Obduction, Adr1ft and Raw Data. Interestingly, although Unreal Tournament is not a VR game, it does use SMP to achieve better visuals and performance.

Another long-awaited technology is a powerful tool for creating screenshots in games. Nvidia Ansel. This tool allows you to create unusual and very high-quality screenshots from games, with previously unavailable features, saving them in very high resolution and supplementing them with various effects, and share your creations. Ansel allows you to literally build a screenshot the way the artist wants it, allowing you to install a camera with any parameters anywhere in the scene, apply powerful post-filters to the image, or even take a 360-degree shot for viewing in a virtual reality helmet.

Nvidia has standardized the integration of the Ansel UI into games, and doing so is as easy as adding a few lines of code. You don’t need to wait for this feature to appear in games anymore, you can evaluate Ansel’s abilities right now in Mirror’s Edge: Catalyst, and a little later it will become available in Witcher 3: Wild Hunt. In addition, many Ansel-enabled game projects are in development, including games such as Fortnite, Paragon and Unreal Tournament, Obduction, The Witness, Lawbreakers, Tom Clancy's The Division, No Man's Sky, and more.

The new GeForce GTX 1060 GPU also supports the toolkit Nvidia VRWorks, which helps developers to create impressive projects for virtual reality. This package includes many utilities and tools for developers, including VRWorks Audio, which allows you to perform very accurate calculation of the reflections of sound waves from scene objects using GPU ray tracing. The package also includes integration into VR and PhysX physics effects to ensure physically correct behavior of objects in the scene.

One of the most exciting VR games to take advantage of VRWorks is VR Funhouse, Nvidia's own VR game that's available for free on Valve's Steam service. This game is powered by Unreal Engine 4 (Epic Games) and runs on GeForce GTX 1080, 1070 and 1060 graphics cards in conjunction with HTC Vive VR headsets. Moreover, the source code of this game will be publicly available, which will allow other developers to use ready-made ideas and code already in their VR attractions. Take our word for it, this is one of the most impressive demonstrations of the possibilities of virtual reality.

Including thanks to SMP and VRWorks technologies, the use of the GeForce GTX 1060 GPU in VR applications provides quite enough for entry level virtual reality performance, and the GPU in question meets the minimum required hardware level, including for SteamVR, becoming one of the most successful acquisitions for use in systems with official VR support.

Since the GeForce GTX 1060 model is based on the GP106 chip, which is in no way inferior to the GP104 graphics processor, which became the basis for older modifications, it supports absolutely all the technologies described above.

The GeForce GTX 1060 is the third model in Nvidia's new line of graphics processors based on the Pascal family. The new 16nm FinFET process technology and architecture optimizations have allowed all new graphics cards to achieve high clock speeds and place more functional blocks in the GPU in the form of stream processors, texture modules and others, compared to previous generation video chips. That is why the GTX 1060 has become the most profitable and energy efficient solution in its class and in general.

It is especially important that the GeForce GTX 1060 offers sufficiently high performance and support for new features and algorithms at a much lower price compared to older solutions based on the GP104. The GP106 graphics chip used in the new model delivers best-in-class performance and power efficiency. The GeForce GTX 1060 is specially designed and perfectly suited for all modern games at high and maximum graphics settings at a resolution of 1920x1080 and even with full-screen anti-aliasing enabled by various methods (FXAA, MFAA or MSAA).

And for those who want even more performance with ultra-high-resolution displays, Nvidia has top-of-the-line GeForce GTX 1070 and GTX 1080 graphics cards that are also quite good in terms of performance and power efficiency. And yet, the combination of low price and sufficient performance quite favorably distinguishes the GeForce GTX 1060 from the background of older solutions. Compared to the competing Radeon RX 480, Nvidia's solution is slightly faster with less complexity and GPU footprint, and has significantly better power efficiency. True, it is sold a little more expensive, so each video card has its own niche.

Jen-Hsun Huang took the stage last week and officially unveiled Nvidia graphics cards. GeForce GTX 1070 and GTX 1080. In addition to the presentation of the accelerators themselves and their overclocking potential, new technologies used in the architecture were demonstrated. Pascal. It is to them that this material is dedicated. Of course, not all innovations will be considered. Some of the new and/or updated technologies will be covered in the GTX 1080 review coming soon.

PascalandGPU GP 104

The first and most important change in Pascal- a departure from the 28nm process technology that has been used in consumer graphics cards since the release of the GeForce GTX 600-series, since March 2012. The Pascal architecture is based on the new 16nmFinFET TSMC's manufacturing process, and with the move to thinner lithography comes impressive improvements in power consumption and performance scaling.

But above all, a more subtle process technology often allows you to increase the frequency. In the "drain" the video card operates at more than 1700 MHz. Also, judging by the numerous reviews, the GTX 1080 is capable of overclocking to 2100+ MHz, which is a reference that is also seriously limited in terms of power.

It is worth noting that not only a decrease in the technical process made it possible to raise the frequency in such a way. According to Jonah Alben, senior vice president of GPU Engineering, after moving to 16nm FinFET, the new GPUs could run at around 1325MHz, and the team Nvidia for a long time worked on increasing the frequencies. The result of the work was the GTX 1080, which operates at 1733 MHz.

How did you manage to achieve such a level of improvement in clock speed and performance relative to the Maxwell architecture? Pascal combines several interesting innovations to increase efficiency significantly.

Optimizations made it possible not only to increase the clock frequency, but also the efficiency of the CUDA cores of the GPU GP104 relative to its predecessor, the GM204. Proof of this is a performance increase of 70% (relative to the GTX 980) and this is still on drivers that have not been fully brought to mind.

One of the changes can be seen in the flowchart above. Now in one GPC cluster, replace four SM-s (simultaneous multiprocessor) blocks, there are five of them.

PolyMorphengine 4.0

There is only one significant addition to the GPU chip itself - the addition of a new module to the PolyMorph Engine. A synchronous multi-projection block has been added. The new block is located at the very end of the frame processing path and creates several projection schemes from one geometry stream.

If you do not go into details, and everything is very complicated there, then the new block takes over the processing of geometry, not all, but a significant part. Thus, the load on other GPU units is reduced. Besides, PolyMorph helps to form a picture at the correct angles on multi-monitor configurations, but more on that later.

Nvidia GeForce GTX 1080 Pascal Review | Meet the GP104 GPU

On the eve of Computex, Nvidia decided to present its long-awaited novelty - the Pascal architecture adapted for gamers. In the new GeForce GTX 1080 and 1070 graphics cards, the manufacturer installs the GP104 graphics processor. Today, we will review the older model, and the younger should be in our hands in early June.

The Pascal architecture promises faster and more efficient performance, more compute modules, reduced die area, and faster memory with an upgraded controller. It is better suited for VR, 4K gaming, and other performance-intensive applications.

As always, we will try to understand the promises of the manufacturer and test them in practice. Let's start.

Will the GeForce GTX 1080 change the balance of power in the high-end segment?

The Nvidia GeForce GTX 1080 is the fastest of the two gaming graphics cards announced earlier this month. Both use the GP104 GPU, which, by the way, is already the second Pascal microarchitecture GPU (the first was the GP100, which appeared on the GTC in April). Nvidia CEO Ren-Sun Huan teased enthusiasts when he unveiled the new product to the general public, claiming that the GeForce GTX 1080 would outperform two 980s in SLI.

He also noted that the GTX 1080, with greater performance, has lower power consumption than the 900 series. It's twice as fast and three times as efficient as the former flagship GeForce Titan X, but if you look at the accompanying graphs and charts, it turns out that such an impressive difference manifests itself in certain tasks related to virtual reality. But even if these promises are only partially confirmed, we are still waiting for a very interesting times in terms of developing high-end PC games.

Virtual reality is slowly gaining momentum, but the high hardware requirements for the graphics subsystem create a significant barrier to access to these technologies. In addition, most of the games available today do not know how to take advantage of multi-processor rendering. That is, you are usually limited to the capabilities of one fast video adapter with one GPU. The GTX 1080 is capable of outperforming two 980s and should not struggle with today's VR games, eliminating the need for multi-processor configurations in the future.

The 4K ecosystem is progressing just as fast. Higher bandwidth interfaces such as HDMI 2.0b and DisplayPort 1.3/1.4 should open the door to 4K monitors with 120Hz panels and support for dynamic refresh rates by the end of this year. While previous generations of top-end GPUs from AMD and Nvidia were positioned as solutions for 4K gaming, users had to compromise on quality in order to maintain acceptable frame rates. The GeForce Nvidia GTX 1080 could be the first graphics card to be fast enough to maintain high frame rates at 3840x2160 resolution with maximum graphics detail settings.

What is the situation with multi-monitor configurations? Many gamers are willing to install three monitors with a resolution of 1920x1080, but on the condition that the graphics system can handle the load, because in this case the card has to render half a million pixels, since the resolution is 7680x1440. There are even enthusiasts willing to take three 4K displays with a combined resolution of 11520x2160 pixels.

The latter option is too exotic even for a new gaming flagship graphics card. However, the Nvidia GP104 processor is equipped with technology that promises to improve the experience for typical tasks of the new model, i.e. 4K and Surround. But before we move on to new technologies, let's take a closer look at the GP104 processor and its underlying Pascal architecture.

What is GP104 made of?

Since the beginning of 2012, AMD and Nvidia have been using the 28nm process technology. By switching to it, both companies made a significant leap forward, introducing us to the Radeon HD 7970 and GeForce GTX 680 graphics cards. However, over the next four years, they had to dodge a lot to extract more performance from the existing technology. The accomplishments of the Radeon R9 Fury X and GeForce GTX 980 Ti graphics cards are truly marvels given their complexity. The first chip built by Nvidia on the 28nm process was the GK104, which consisted of 3.5 billion transistors. The GM200 found in the GeForce GTX 980 Ti and Titan X already has eight billion transistors.

The transition to 16nm TSMC FinFET Plus technology allowed Nvidia engineers to implement new ideas. According to the technical data, 16FF+ chips are 65% faster, can have twice the density of 28HPM, or consume 70% less power. When creating their GPUs, Nvidia uses the optimal combination of these advantages. TSMC claims that it was based on the engineering of the existing 20 nm process, but used FinFET transistors instead of flat transistors. The company says that this approach reduces the amount of scrap, and increases the output of the working plates. It is also claimed that the company did not have a 20-nanometer process technology with fast transistors. Again, the world of computer graphics has been sitting on the 28 nm process technology for more than four years.


GP104 Processor Block Diagram

The successor to the GM204 consists of 7.2 billion transistors placed on an area of ​​314 mm2. For comparison, the GM204 die area is 398 mm2 with 5.2 billion transistors. AT full version One GP104 GPU has four Graphics Processing Clusters (GPCs). Each GPC includes five Thread/Texture Processing Clusters (TPCs) and a rasterizer. The TPC combines one streaming multiprocessor (SM) and the PolyMorph engine. The SM combines 128 single precision CUDA cores, 256KB of register memory, 96KB of shared memory, 48KB of L1/texture cache, and eight texture units. The fourth generation of the PolyMorph engine includes a new logic block, which is located at the end of the geometry pipeline before the rasterization block, it controls the Simultaneous Multi-Projection function (more on that below). In total, we get 20 SMs, 2560 CUDA cores and 160 texture processing units.


One streaming multiprocessor (SM) in GP104

The GPU back-end includes eight 32-bit memory controllers (256-bit total channel width), eight rasterization units, and 256KB of L2 cache per unit. We end up with 64 ROPs and 2MB of shared L2 cache. Although the block diagram of the Nvidia GM204 processor showed four 64-bit controllers and 16 ROPs, they were grouped together and are functionally equivalent.

Some of the structural elements of the GP104 are similar to those of the GM204, as the new GPU was built from the "building blocks" of its predecessor. There is nothing wrong. If you remember, in the Maxwell architecture, the company relied on energy efficiency and did not shake up the blocks, which were Kepler's strengths. We see a similar picture here.

Adding four SMs may not noticeably affect performance. However, the GP104 has a few tricks up its sleeve. The first trump card is significantly higher clock frequencies. The base clock speed of the GPU is 1607 MHz. The GM204 specifications, for comparison, indicate 1126 MHz. GPU Boost maxes out at 1733MHz, but we bumped our sample up to 2100MHz using EVGA's PrecisionX beta utility. Where does such a reserve for overclocking come from? According to John Albin, senior vice president of GPU engineering, his team knew that the TSMC 16FF+ process would impact the architecture of the chip, so they focused on optimizing the chip's timings to remove the bottlenecks that prevent higher clock speeds from being achieved. As a result, the GP104's single-precision compute speed reached 8228 GFLOPs (at base clock) compared to the 4612 GFLOPs ceiling of the GeForce GTX 980. The texel fill rate jumped from 155.6 Gtex/s on the 980 (with GPU Boost) to 277, 3 Gtex /s.

GPU GeForce GTX 1080 (GP104) GeForce GTX 980 (GM204)
SM 20 16
Number of CUDA Cores 2560 2048
Base GPU frequency, MHz 1607 1126
GPU frequency in Boost mode, MHz 1733 1216
Calculation speed, GFLOPs (at the base frequency) 8228 4612
Number of texture units 160 128
Texel filling speed, Gtex/s 277,3 155,6
Memory transfer rate, Gbps 10 7
Memory bandwidth, GB/s 320 224
Number of rasterization blocks 64 64
L2 cache size, MB 2 2
Thermal package, W 180 165
Number of transistors 7.2 billion 5.2 billion
Crystal area, mm2 314 398 mm
Process technology, nm 16 28

The back end still includes 64 ROPs and a 256-bit memory bus, but Nvidia has introduced GDDR5X memory to increase the available bandwidth. The company has put a lot of effort into promoting the new type of memory, especially against the backdrop of HBM memory, which is used in various AMD graphics cards and HBM2, which Nvidia is installing in the Tesla P100. There seems to be a shortage of HBM2 memory in the market right now, and the company isn't ready to accept the HBM limits (four 1GB stacks, or the difficulty of implementing eight 1GB stacks). Thus, we got GDDR5X video memory, the supply of which, apparently, is also limited, since the GeForce GTX 1070 already uses regular GDDR5. But this does not cover the advantages of the new solution. The GDDR5 memory in the GeForce GTX 980 had a data transfer rate of 7 Gb/s. This provided 224 GB/s of bandwidth over a 256-bit bus. GDDR5X starts at 10 Gb/s, increasing throughput to 320 GB/s (~43% increase). According to Nvidia, the increase is achieved through an upgraded I / O scheme, and without increasing power consumption.

The Maxwell architecture has become more efficient in using bandwidth by optimizing the cache and compression algorithms, and Pascal is following the same path with new lossless compression methods to more economically use the available bandwidth of the memory subsystem. The delta color compression algorithm tries to achieve a 2:1 gain, and this mode has been improved to be used more frequently. There is also a new 4:1 mode, which is used in cases where the differences per pixel are very small. Finally, Pascal introduces another new 8:1 algorithm that applies 4:1 compression to 2x2 blocks, the difference between which is processed in a 2:1 algorithm.



The difference is not difficult to illustrate. The first image shows an uncompressed screenshot from Project CARS. The following image shows the elements that the Maxwell card can compress, they are shaded in purple. In the third shot, you can see that Pascal compresses the scene even more. According to Nvidia, this difference translates into about a 20% reduction in the amount of information in bytes that must be fetched from memory for each frame.

Nvidia GeForce GTX 1080 Pascal Review | Reference card design

Nvidia has changed its approach to card design. Instead of "reference", she calls her own version of the map Founders Edition (creators' version). It is impossible not to notice that the appearance of the GeForce GTX 1080 has become more angular, however, the same old proven mechanism for ejecting hot air out through the side bar is used in the cooling system.

The card weighs 1020 g and has a length of 27 cm. It is quite pleasant to the touch, because the cooler casing not only looks like metal, it is actually made of metal, to be more precise, aluminum. The matte silver parts are lacquered, and if the card is not handled very carefully, they will quickly get scratched.

The back plate is divided into two parts. It serves only as a decoration and does not carry a cooling function. Later we will find out if this is the right decision. Nvidia recommends removing this plate when using SLI in order to achieve better airflow between cards mounted close to each other.

There is nothing interesting at the bottom, although we noticed that parts of the black cover can come into contact with elements of the motherboard located under it, such as the chipset cooler and SATA ports.

At the top of the card, we see one auxiliary eight-pin power connector. Given the official specifications of the video card, as well as 60 watts of power received from the slot motherboard, one such connector should be enough for a nominal TDP of 180W. Naturally, we will check how much power this card actually consumes, and whether it overloads the power lines.

There are also two SLI connectors. Alongside the new Pascal graphics cards, Nvidia introduced new high-bandwidth bridges. Later we will look at them in more detail. In short, SLI configurations of only two video cards are officially supported so far, and both connectors are used to operate the dual-channel interface between the GPU.

Three full-fledged DisplayPort connectors are available on the I/O panel. Specifications list DisplayPort 1.2 but are expected to be compatible with DisplayPort 1.3/1.4 (at least the display controller can work with the new standards). There's also an HDMI 2.0 output and dual-link DVI-D. You can not look for analog connectors.

On the other end of the card, there is a large slot for air capture and three screw holes for additional fixation of the card in the case.

Cooler design and power

After carefully studying the appearance, it's time to look at the stuffing hidden under the aluminum casing. This turned out to be more difficult than it might seem at first glance. After disassembly, we counted 51 parts on the table, including screws. If you remove the fans, 12 more will be added.

Nvidia is finally back to using a real vapor chamber. It is attached to the board with four screws on top of the GPU.

The centrifugal fan should be familiar to you. Direct heat removal involves air intake in one place, its passage through the radiator fins and out of the case. The cooler shroud, which also doubles as a frame, not only stabilizes the card, but also helps cool the voltage converters and memory modules.

Having removed all external components, we got to the printed circuit board. Unlike previous solutions, Nvidia uses a six-phase power supply. Five phases serve the GPU, and the remaining phase powers the GDDR5X memory.

On the board, you can see a place for another phase, which is empty.

The GP104 GPU covers an area of ​​314mm2, which is much smaller than its predecessor. Around the processor, lines of other layers of the board are visible. To achieve high clock frequencies, the conductors must be as short as possible. Due to stringent requirements, Nvidia partners will likely need more time to get production up and running.

GDDR5X memory is represented by Micron 6HA77 chips. They have recently gone into mass production, as we saw 6GA77 chips in the leaked pictures of the new Nvidia video card earlier in the press.

A total of eight memory modules are connected to the 256-bit memory bus via 32-bit controllers. At a frequency of 1251 MHz, the bandwidth reaches 320 GB / s.

Micron's GDDR5X modules use a 170-pin package instead of the 190-pin GDDR5. In addition, they are slightly smaller: 14x10 mm instead of 14x12 mm. That is, they have a higher density and they require improved cooling.

Turning the card over, we found free space for the second power connector. Thus, Nvidia partners can install a second auxiliary connector to add power or move the existing one to another position.

The board also has a slot that allows you to turn the power connector 180 degrees.

Capacitors are located directly below the GPU to smooth out possible surges. Also on this side of the board is PWM (previously it was located on the front side). This solution gives Nvidia partners the ability to install other PWM controllers.

But back to the PWM voltage regulator controller. Nvidia's GPU Boost 3.0 technology has received a new set of voltage regulation requirements, resulting in significant changes. We expected to see an IR3536A type controller from International Rectifier paired with a 5+1 phase design, but Nvidia used the µP9511P. This is not the best news for overclockers, since the card does not support the interface and protocol of such tools as MSI Afterburner and Gigabyte OC Guru. The transition to a new controller, which is not yet very well described, is most likely due to technical features.

Since the PWM controller cannot directly drive the individual phases of the voltage converter, Nvidia uses powerful MOSFET drivers with 53603A chips to drive the gate of the MOSFETs. But compared to some of the other options, the circuit layout looks neat and tidy.

There are different types of MOSFETs here. The 4C85N is a fairly flexible dual channel voltage conversion MOSFET. It serves all six phases of the power supply and has large enough electrical and thermal reserves to withstand the loads of the reference design.


I wonder how Nvidia's GPU Boost 3.0 technology and modified voltage regulator circuitry will affect power consumption. We will definitely check it out.

Nvidia GeForce GTX 1080 Pascal Review | Simultaneous Multi-Projection and Async Compute Technology

Simultaneous Multi-Projection Engine

The increased core count, core clock speed, and 10Gbps GDDR5X memory performance speed up every game we've tested. However, the Pascal architecture includes several features that we will only be able to appreciate in future games.

One of the new features Nvidia calls the Simultaneous Multi-Projection Engine, or multi-projection engine, represented by a hardware block added to the PolyMorph engines. The new engine can create up to 16 projections of geometric data from a single viewpoint. Or it can shift the viewpoint to create a stereoscopic image by duplicating the geometry 32 times in hardware, without the performance hit you would experience if you were trying to achieve this effect without SMP.


One Plane Projection

Let's try to understand the advantages of this technology. For example, we have three monitors in a Surround configuration. They are slightly turned inward to "wrap" the user, so it is more convenient to play and work. But games do not know about this and render the image in one plane, so it seems to be curved at the junction of the monitor frames, and in general the picture looks distorted. For such a configuration, it would be more correct to render one projection straight ahead, a second projection to the left, as if from a panoramic cockpit of an aircraft, and a third projection to the right. In this way, a previously curved panorama will appear smoother, and the user will get a much wider viewing angle. The entire scene still needs to be rasterized and painted, but the GPU doesn't have to render the scene three times, which eliminates the overhead.


Incorrect perspective on angled displays



SMP Corrected Perspective

However, the application must support wide viewing angle settings and use SMP API calls. This means that before you can take advantage of this feature, game developers must master it. We're not sure how much effort they're willing to put in for a handful of multi-monitor Surround users. But there are other applications for which it makes sense to implement this feature as soon as possible.


using single-pass stereo rendering, SMP creates one projection for each eye

Let's take virtual reality as an example. It already needs an individual projection for each eye. Games today simply render images on two separate screens, with all the associated drawbacks and efficiency losses. But since SMP supports two projection centers, the scene can be rendered in one pass using Nvidia's Single Pass Stereo feature. The geometry is processed once, and SMP creates its projection for the left and right eyes. Further, SMP can apply additional projections for a feature called Lens Matched Shading.


Images after the first pass with Lens Matched Shading features



The final scene that is sent to the headset

In a nutshell, Lens Matched Shading attempts to make VR rendering more efficient by avoiding the heavy work that traditional planar projection rendering typically has to do to distort the geometry to match the distortion of the headset lenses (so pixels are rendered wasted where there is the most curvature) . This effect can be approached by using SMP to divide the area into quadrants. So instead of rendering and working with a square projection, the GPU creates images that match the lens distortion filter. This method prevents the generation of extra pixels. You won't notice a difference in quality, as long as the developers meet or exceed the eye sampling rate on the HMD.

According to Nvidia, the combination of Single Pass Stereo and Lens Matched Shading techniques can deliver up to 2x performance gains in VR compared to non-SMP GPUs. Part of it has to do with pixel rendering. Using Lens Matched Shading to avoid processing pixels that shouldn't be rendered, the render rate in a scene with Nvidia's balanced presets dropped from 4.2 MP/s (Oculus Rift) to 2.8 MP/s, thus shader load on GPU decreased by one and a half times. Single Pass Stereo technology that renders geometry only once (instead of re-rendering for the second eye) effectively eliminates half of the geometry processing that must be done today. Now it's clear what Ren-Sun meant when he claimed "a twofold increase in performance and a threefold increase in efficiency compared to Titan X."

Asynchronous computing

The Pascal architecture also includes some changes regarding asynchronous computing, which are related to DirectX 12, VR, and AMD's architectural advantage for a number of reasons.

Nvidia has supported static GPU resource sharing for graphics and compute workloads since the Maxwell architecture. In theory, this approach is good when both blocks are active at the same time. But let's say that 75% of the processor's resources are devoted to graphics, and it completes its part of the task faster. Then this block will be idle, waiting for the computing block to complete its part of the work. Thus, all possible advantages of the simultaneous execution of these tasks are lost. Pascal addresses this shortcoming with dynamic load balancing. If the driver decides that one of the partitions is underused, it can switch its resources to help the other, preventing idle that negatively affects performance.

Nvidia also improved Pascal's interrupt capabilities, that is, the ability to stop the current task in order to solve a more "urgent" one with a very short execution time. As you know, GPUs are highly parallelized machines with large buffers designed to keep similar resources next to each other busy. An idle shader is useless, so it must be involved in the workflow by all means.


It is better for VR to have interrupt requests sent as late as possible to capture the latest tracking data

A great example is the Asynchronous Time Warp (ATW) feature that Oculus introduced with the Rift. In the case where the video card is unable to produce a new frame every 11ms on a 90Hz display, ATW generates an intermediate frame using the last frame with the head position adjusted. But there must be enough time to create such a frame, and, unfortunately, graphical interruption is not very accurate. In fact, the Fermi, Kepler, and Maxwell architectures support draw-level interruption, meaning frames can be switched within a draw call, potentially stifling the ATW technique.

Pascal implements a pixel-level interrupt for graphics, so GP104 can stop the current pixel-level operation, save its state, and switch to a different context. Instead of the millisecond interrupt that Oculus wrote about, Nvidia claims less than 100 microseconds.

In the Maxwell architecture, the equivalent of a pixel-level interrupt in a compute unit was implemented via a thread-level interrupt. Pascal also retained this technique, but added support for instruction-level interrupts in CUDA computational tasks. At the moment, Nvidia drivers do not include this feature, but it will be available soon along with pixel-level interruption.

Nvidia GeForce GTX 1080 Pascal Review | Output pipeline, SLI and GPU Boost 3.0

Pascal display channel: HDR-Ready

Last year, we met with AMD in Sonoma, California, where they shared some of the details of their new Polaris architecture, such as the imaging pipeline supporting high dynamic range content and related displays.

Not surprisingly, Nvidia's Pascal architecture is packed with features like this, some of which were even available in Maxwell. For example, the display controller in the GP104 received support for 12-bit color, BT.2020 wide color gamut, SMPTE 2084 electro-optical transmission, and HDMI 2.0b with HDCP 2.2.

To this list, Pascal adds accelerated 4K60p HEVC decoding with 10/12-bit color through a dedicated hardware block that claims to support the HEVC Version 2 standard. Previously, Nvidia used a hybrid approach using software resources. In addition, encoding was limited to eight bits of color information per pixel. But we believe that in order to support the controversial specification, Microsoft PlayReady 3.0 required a faster and more efficient solution.

The architecture also supports HEVC encoding in 10-bit color at 4K60p for recording or streaming in HDR, Nvidia even has a dedicated app for that. Using the GP104 processor's encoding and the soon-to-be-released GameStream HDR software, you'll be able to stream high dynamic range games to Shield devices connected to an HDR-compatible TV. The Shield is equipped with its own HEVC decoder with support for 10-bit color per pixel, which further offloads the image output pipeline.

GeForce GTX 1080 GeForce GTX 980
H.264 encoding Yes (2x 4K60p) Yes
HEVC encoding Yes (2x 4K60p) Yes
HEVC encoding 10-bit Yes Not
H.264 decoding Yes (4K120p up to 240Mbps) Yes
HEVC decoding Yes (4K120p/8K30p up to 320Mbps) Not
VP9 decoding Yes (4K120p up to 320Mbps) Not
HEVC 10/12-bit decoding Yes Not

In addition to HDMI 2.0b support, the GeForce GTX 1080 is DisplayPort 1.2 certified and DP 1.3/1.4 compliant. In this regard, it already surpasses the still unreleased Polaris, whose display controller so far only supports DP 1.3. Luckily for AMD, version 1.4 specifications do not include a faster transfer mode, and the ceiling is still 32.4Gbps set by HBR3 mode.

As mentioned earlier, the GeForce GTX 1080 Founders Edition is equipped with three Display Port outputs, one HDMI 2.0b connector, and one DVI digital dual-link output. Like the GTX 980, the novelty is capable of displaying an image on four independent monitors simultaneously. But compared to 5120x3200 resolution via two DP 1.2 cables, the GTX 1080's maximum resolution is 7680x4320 pixels at 60Hz refresh rate.

SLI now only officially supports two GPUs

Traditionally, high-end Nvidia graphics cards are equipped with two connectors for connecting two, three or even four accelerators in an SLI bundle. As a rule, the best scaling is achieved in dual GPU configurations. Further, the costs often do not justify themselves, since many pitfalls appear. However, some enthusiasts still use three or four graphics adapters in pursuit of every additional frame and the opportunity to show off to friends.

But the situation has changed. Due to performance scaling issues in new games, no doubt related to DirectX 12, the GeForce GTX 1080 only officially supports dual-GPU SLI configurations, according to Nvidia. So why does the card need two connectors? Thanks to the new SLI bridges, both connectors can be used simultaneously for data transfer in two-channel mode. In addition to the dual-channel mode, the interface also has an increased I / O frequency from 400 MHz to 650 MHz. As a result, throughput between processors more than doubles.


Frame render time in Middle earth: Shadow of Mordor with new (blue line on graph) and old (black) SLI bridge

However, many gamers will not experience the benefits of a faster channel. It will be relevant, first of all, at high resolutions and refresh rates. Nvidia has shown an FCAT shot of two GeForce 1080 GTXs running Middle earth: Shadow of Mordor on three 4K displays. Connecting two cards with an old bridge resulted in constant frame time jumps, which lead to predictable timing issues that manifest as stuttering. With the new bridge, the number of jumps has decreased, and they have become less pronounced.

According to Nvidia, not only SLI HB bridges support dual-channel mode. The already familiar LED bridges can also transmit data at a frequency of 650 MHz when connected to Pascal cards. Flexible or conventional bridges are best avoided if you want to work in 4K or higher. Detailed information regarding compatibility can be found in the table provided by Nvidia:

1920x1080@60Hz 2560x1440 @ 120Hz+ 2560x1440 4K 5K Surround
standard bridge x x
LED bridge x x x x
High Data Rate Bridge (HB) x x x x x x

What caused the rejection of three- and four-chip configurations? After all, the company is always striving to sell more and achieve higher productivity. It's cynical to say that Nvidia doesn't want to take responsibility for the loss of advantage when linking two or four cards in SLI, when the modern video game market uses increasingly subtle and complex approaches to rendering. But the company insists it's in the best interest of customers as Microsoft gives more control over multi-processor configurations to game developers, who in turn are exploring new technologies such as single-frame co-rendering instead of the current frame-by-frame rendering (AFR).

Enthusiasts who only care about speed records and are not interested in the factors described above can still link three or four GTX 1080s in SLI using the old software. They need to generate a unique "hardware" signature using a program from Nvidia that can request an "unlock" key. Naturally, the new HB SLI bridges will not work with more than two GPUs, so you will have to limit yourself to the old LED bridges to combine the work of three/four GP104s at 650 MHz.

Briefly about GPU Boost 3.0

In an effort to get even more performance out of their GPUs, Nvidia has again improved its GPU Boost technology.

In the previous generation (GPU Boost 2.0), the clock speed was set by moving a certain value of the sloping line of the voltage / frequency dependence. The potential headroom above this line was usually left unused.


GPU Boost 3.0 - setting the frequency increase per step of increasing the voltage

Now GPU Boost 3.0 allows you to set the frequency gain for individual voltage values, which are limited only by temperature. In addition, you do not have to experiment and check the stability of the map over the entire range of values ​​on the curve. Nvidia has a built-in algorithm to automate this process, creating a voltage/frequency curve unique to your GPU.

Nvidia GeForce GTX 1080 Pascal Review | Meet the GP104 GPU

On the eve of Computex, Nvidia decided to present its long-awaited novelty - the Pascal architecture adapted for gamers. In the new GeForce GTX 1080 and 1070 graphics cards, the manufacturer installs the GP104 graphics processor. Today, we will review the older model, and the younger should be in our hands in early June.

The Pascal architecture promises faster and more efficient performance, more compute modules, reduced die area, and faster memory with an upgraded controller. It is better suited for VR, 4K gaming, and other performance-intensive applications.

As always, we will try to understand the promises of the manufacturer and test them in practice. Let's start.

Will the GeForce GTX 1080 change the balance of power in the high-end segment?

The Nvidia GeForce GTX 1080 is the fastest of the two gaming graphics cards announced earlier this month. Both use the GP104 GPU, which, by the way, is already the second Pascal microarchitecture GPU (the first was the GP100, which appeared on the GTC in April). Nvidia CEO Ren-Sun Huan teased enthusiasts when he unveiled the new product to the general public, claiming that the GeForce GTX 1080 would outperform two 980s in SLI.

He also noted that the GTX 1080, with greater performance, has lower power consumption than the 900 series. It's twice as fast and three times as efficient as the former flagship GeForce Titan X, but if you look at the accompanying graphs and charts, it turns out that such an impressive difference manifests itself in certain tasks related to virtual reality. But even if these promises are only partially confirmed, we are still in for some very interesting times in terms of the development of high-end games on the PC.

Virtual reality is slowly gaining momentum, but the high hardware requirements for the graphics subsystem create a significant barrier to access to these technologies. In addition, most of the games available today do not know how to take advantage of multi-processor rendering. That is, you are usually limited to the capabilities of one fast video adapter with one GPU. The GTX 1080 is capable of outperforming two 980s and should not struggle with today's VR games, eliminating the need for multi-processor configurations in the future.

The 4K ecosystem is progressing just as fast. Higher bandwidth interfaces such as HDMI 2.0b and DisplayPort 1.3/1.4 should open the door to 4K monitors with 120Hz panels and support for dynamic refresh rates by the end of this year. While previous generations of top-end GPUs from AMD and Nvidia were positioned as solutions for 4K gaming, users had to compromise on quality in order to maintain acceptable frame rates. The GeForce Nvidia GTX 1080 could be the first graphics card to be fast enough to maintain high frame rates at 3840x2160 resolution with maximum graphics detail settings.

What is the situation with multi-monitor configurations? Many gamers are willing to install three monitors with a resolution of 1920x1080, but on the condition that the graphics system can handle the load, because in this case the card has to render half a million pixels, since the resolution is 7680x1440. There are even enthusiasts willing to take three 4K displays with a combined resolution of 11520x2160 pixels.

The latter option is too exotic even for a new gaming flagship graphics card. However, the Nvidia GP104 processor is equipped with technology that promises to improve the experience for typical tasks of the new model, i.e. 4K and Surround. But before we move on to new technologies, let's take a closer look at the GP104 processor and its underlying Pascal architecture.

What is GP104 made of?

Since the beginning of 2012, AMD and Nvidia have been using the 28nm process technology. By switching to it, both companies made a significant leap forward, introducing us to the Radeon HD 7970 and GeForce GTX 680 graphics cards. However, over the next four years, they had to dodge a lot to extract more performance from the existing technology. The accomplishments of the Radeon R9 Fury X and GeForce GTX 980 Ti graphics cards are truly marvels given their complexity. The first chip built by Nvidia on the 28nm process was the GK104, which consisted of 3.5 billion transistors. The GM200 found in the GeForce GTX 980 Ti and Titan X already has eight billion transistors.

The transition to 16nm TSMC FinFET Plus technology allowed Nvidia engineers to implement new ideas. According to the technical data, 16FF+ chips are 65% faster, can have twice the density of 28HPM, or consume 70% less power. When creating their GPUs, Nvidia uses the optimal combination of these advantages. TSMC claims that it was based on the engineering of the existing 20 nm process, but used FinFET transistors instead of flat transistors. The company says that this approach reduces the amount of scrap, and increases the output of the working plates. It is also claimed that the company did not have a 20-nanometer process technology with fast transistors. Again, the world of computer graphics has been sitting on the 28 nm process technology for more than four years.

GP104 Processor Block Diagram

The successor to the GM204 consists of 7.2 billion transistors placed on an area of ​​314 mm2. For comparison, the GM204 die area is 398 mm2 with 5.2 billion transistors. In the full version, one GP104 GPU has four Graphics Processing Clusters (GPCs). Each GPC includes five Thread/Texture Processing Clusters (TPCs) and a rasterizer. The TPC combines one streaming multiprocessor (SM) and the PolyMorph engine. The SM combines 128 single precision CUDA cores, 256KB of register memory, 96KB of shared memory, 48KB of L1/texture cache, and eight texture units. The fourth generation of the PolyMorph engine includes a new logic block, which is located at the end of the geometry pipeline before the rasterization block, it controls the Simultaneous Multi-Projection function (more on that below). In total, we get 20 SMs, 2560 CUDA cores and 160 texture processing units.

One streaming multiprocessor (SM) in GP104

The GPU back-end includes eight 32-bit memory controllers (256-bit total channel width), eight rasterization units, and 256KB of L2 cache per unit. We end up with 64 ROPs and 2MB of shared L2 cache. Although the block diagram of the Nvidia GM204 processor showed four 64-bit controllers and 16 ROPs, they were grouped together and are functionally equivalent.

Some of the structural elements of the GP104 are similar to those of the GM204, as the new GPU was built from the "building blocks" of its predecessor. There is nothing wrong. If you remember, in the Maxwell architecture, the company relied on energy efficiency and did not shake up the blocks, which were Kepler's strengths. We see a similar picture here.

Adding four SMs may not noticeably affect performance. However, the GP104 has a few tricks up its sleeve. The first trump card is significantly higher clock frequencies. The base clock speed of the GPU is 1607 MHz. The GM204 specifications, for comparison, indicate 1126 MHz. GPU Boost maxes out at 1733MHz, but we bumped our sample up to 2100MHz using EVGA's PrecisionX beta utility. Where does such a reserve for overclocking come from? According to John Albin, senior vice president of GPU engineering, his team knew that the TSMC 16FF+ process would impact the architecture of the chip, so they focused on optimizing the chip's timings to remove the bottlenecks that prevent higher clock speeds from being achieved. As a result, the GP104's single-precision compute speed reached 8228 GFLOPs (at base clock) compared to the 4612 GFLOPs ceiling of the GeForce GTX 980. The texel fill rate jumped from 155.6 Gtex/s on the 980 (with GPU Boost) to 277, 3 Gtex /s.

GPU GeForce GTX 1080 (GP104) GeForce GTX 980 (GM204)
SM 20 16
Number of CUDA Cores 2560 2048
Base GPU frequency, MHz 1607 1126
GPU frequency in Boost mode, MHz 1733 1216
Calculation speed, GFLOPs (at the base frequency) 8228 4612
Number of texture units 160 128
Texel filling speed, Gtex/s 277,3 155,6
Memory transfer rate, Gbps 10 7
Memory bandwidth, GB/s 320 224
Number of rasterization blocks 64 64
L2 cache size, MB 2 2
Thermal package, W 180 165
Number of transistors 7.2 billion 5.2 billion
Crystal area, mm2 314 398 mm
Process technology, nm 16 28

The back end still includes 64 ROPs and a 256-bit memory bus, but Nvidia has introduced GDDR5X memory to increase the available bandwidth. The company has put a lot of effort into promoting the new type of memory, especially against the backdrop of HBM memory, which is used in various AMD graphics cards and HBM2, which Nvidia is installing in the Tesla P100. There seems to be a shortage of HBM2 memory in the market right now, and the company isn't ready to accept the HBM limits (four 1GB stacks, or the difficulty of implementing eight 1GB stacks). Thus, we got GDDR5X video memory, the supply of which, apparently, is also limited, since the GeForce GTX 1070 already uses regular GDDR5. But this does not cover the advantages of the new solution. The GDDR5 memory in the GeForce GTX 980 had a data transfer rate of 7 Gb/s. This provided 224 GB/s of bandwidth over a 256-bit bus. GDDR5X starts at 10 Gb/s, increasing throughput to 320 GB/s (~43% increase). According to Nvidia, the increase is achieved through an upgraded I / O scheme, and without increasing power consumption.

The Maxwell architecture has become more efficient in using bandwidth by optimizing the cache and compression algorithms, and Pascal is following the same path with new lossless compression methods to more economically use the available bandwidth of the memory subsystem. The delta color compression algorithm tries to achieve a 2:1 gain, and this mode has been improved to be used more frequently. There is also a new 4:1 mode, which is used in cases where the differences per pixel are very small. Finally, Pascal introduces another new 8:1 algorithm that applies 4:1 compression to 2x2 blocks, the difference between which is processed in a 2:1 algorithm.



The difference is not difficult to illustrate. The first image shows an uncompressed screenshot from Project CARS. The following image shows the elements that the Maxwell card can compress, they are shaded in purple. In the third shot, you can see that Pascal compresses the scene even more. According to Nvidia, this difference translates into about a 20% reduction in the amount of information in bytes that must be fetched from memory for each frame.

NVIDIA is preparing to release new series gaming graphics cards, which will be opened by the GeForce GTX 1080. This model will be the first gaming-class product based on the Pascal architecture. The GeForce GTX 1080 will bring a number of technological innovations, which we will discuss in this article. The material will be of a theoretical nature, it discusses the architectural features and new features of the GeForce GTX 1080. Testing and comparison with other video cards will appear later.

Rapid progress in the miniaturization of silicon chips in last years slowed down. Intel even abandoned the tick-tock strategy, which included a regular transition to a thinner process technology. Several generations of NVIDIA and AMD products have changed in the graphics accelerator market within the framework of one 28-nm process technology. In part, this was beneficial and forced manufacturers to pay more attention to the development of architecture. This qualitative transition was clearly visible at the time when switching from Kepler to the Maxwell architecture, when the new generation turned out to be more productive and energy efficient without increasing the number of transistors or even reducing the size of the crystals. For example, the GeForce GTX 980 is based on a more compact GM204 chip, which does not prevent the video card from demonstrating higher performance compared to the GeForce GTX 780 Ti with a more complex GK110 chip.

The new generation of GeForce will receive both a new architecture and a thinner process technology. And the GeForce GTX 1080 is a trailblazer in many ways. This is the first Pascal architecture GPU with GP104 GPU based on 16nm FinFET process technology. Among the important innovations, NVIDIA notes the fast GDDR5X memory. New technological features allow you to raise frequencies to record levels, defining a new level of "mastery". And new gaming technologies expand the capabilities of GeForce, especially in the field of working with VR content. These are the five main features that the manufacturer highlights in a new product.

It is worth noting that initially the Tesla P100 specialized computing accelerator became the pioneer of the Pascal architecture. It is based on the GP100 processor. But since the product is focused on a completely different area of ​​​​application, it is the GeForce GTX 1080 that is the pioneer among desktop graphics accelerators.

GPU GP104 is the heir to GM204, so when studying the GeForce GTX 1080, you can build on the GeForce GTX 980, although the newcomer is faster than the GeForce GTX 980 Ti and GeForce GTX Titan X. Pascal processors use a cluster structure similar to their predecessors, where the GPC cluster (Graphics Processing Cluster) is essentially is an independent computing unit. The GP100 is based on six clusters, the GP104 has four clusters, and the next GP106 chip should receive two clusters. Four GPCs make the new GP104 GPU as close as possible to the GM204. And the block diagram of this chip also resembles an old processor.

Differences in structure become apparent upon closer examination. In the past generation, the cluster included four large multiprocessor SMM units. In the GP104, the lower execution units are grouped into five SM multiprocessor units. Each such large data processing unit is associated with its own Polymorph Engine geometry processing unit, of which there are now 20 instead of 16 for the GM204.

One SM is divided into four data processing arrays with their own control logic, and this is also similar to the structure of older GPUs. And in both cases, the multiprocessor operates with 128 streaming cores (CUDA cores). The SM has 96 KB of shared cache, a separate texture cache, and eight texture units. As a result, we have a configuration of 2560 stream processors and 160 texture units. The new processor has 64 ROPs and 2 MB L2 cache - there are no differences from the GM204.

There are more memory controllers, Pascal has changed the entire memory subsystem. Instead of four 64-bit controllers, eight 32-bit controllers are implemented, which provides a memory bus width of 256 bits. After the successful GeForce GTX 980, such a memory bus in a top product is no longer surprising. At the same time, the bus efficiency of the GeForce GTX 1080 is higher due to new data compression algorithms. Also, the growth of throughput is provided by microcircuits of the new GDDR5X standard, in which the effective data exchange value is equivalent to a frequency of 10 GHz. The usual GDDR5 memory was limited to frequencies up to 7 GHz. The video buffer has been increased to 8 GB.

Thanks to the new process technology, the GP104 is more compact than the GM204 with more computing units. At the same time, the new processor has more opportunities for increasing frequencies. Initially, it was set to a base value of 1607 MHz with an average Boost Clock of 1733 MHz. Peak frequency values ​​are even higher. With such record frequencies, the GeForce GTX 1080 fits into a TDP of 180 W, which is slightly higher than the GeForce GTX 980. But the newcomer is faster than the top Ti version, which has a noticeably higher TDP.

For a visual comparison, let's summarize the characteristics of the GeForce GTX 1080 and top-end video cards of previous generations in one table.

Video adapter GeForce GTX 1080 GeForce GTX Titan X GeForce GTX 980 Ti GeForce GTX 980 GeForce GTX 780 Ti
Core GP104 GM200 GM200 GM204 GK110
Number of transistors, million pieces 7200 8000 8000 5200 7100
Process technology, nm 16 28 28 28 28
Core area, sq. mm 314 601 601 398 561
Number of stream processors 2560 3072 2816 2048 2880
Number of texture blocks 160 192 176 128 240
Number of render units 64 96 96 64 48
Core frequency, MHz 1607-1733 1000-1075 1000-1075 1126-1216 875-926
Memory bus, bit 256 386 386 256 384
Memory type GDDR5X GDDR5 GDDR5 GDDR5 GDDR5
Memory frequency, MHz 10010 7010 7010 7010 7010
Memory size, MB 8192 12288 6144 4096 3072
Supported version of DirectX 12.1 12.1 12.1 12.1 12.0
Interface PCI-E3.0 PCI-E3.0 PCI-E3.0 PCI-E3.0 PCI-E3.0
Power, W 180 250 250 165 250

NVIDIA's mid-range and high-end graphics cards have long featured GPU Boost technology, which increases the frequency of the GPU until it exceeds temperature or power limits. The minimum value for 3D mode is the base frequency, but often under normal gaming load, the frequencies are always higher. The new GeForces received improved GPU Boost 3.0 technology with a more flexible algorithm for changing the frequency depending on the supply voltage in the frequency acceleration mode. GPU Boost 2.0 has a fixed difference between the base value and the Turbo frequency. GPU Boost 3.0 allows you to use different frequency offsets, which will better reveal the potential of the GPU. Theoretically, when the parameters are automatically changed in Boost mode with an increase or decrease in voltage, the frequency will change non-linearly, at some points the Boost delta may be greater than it would be with the GPU Boost of the old version. New flexible Boost adjustment options will be available to users. The latest version of the EVGA Precision utility already supports the GeForce GTX 1080, among its features is an automatic scanner with a stability test that can generate a non-linear Boost frequency curve for different voltages. The transition to a new process technology and optimization of the core structure made it possible to achieve such a significant frequency acceleration that the maximum Boost relative to the declared values ​​can be increased to 2 GHz.

Since the advent of GDDR5, NVIDIA has been working on the next generation of high-speed memory. The result of interaction with memory developers was the appearance of GDDR5X with a data transfer rate of 10 Gb / s. Working with such a fast memory puts forward new requirements for the wiring of electrical circuits. Therefore, the data transmission lines between the GPU and memory chips were redesigned, the structure of the chip itself was changed. All this allows you to work effectively with an ultra-fast video buffer. Among the advantages of GDDR5X is a lower operating voltage of 1.35V.

With an effective memory frequency of 10,000 MHz, the increase in bandwidth relative to the usual 7012 MHz for the current generation is almost 43%. But the benefits of Pascal don't stop there. GeForce supports special algorithms for data compression in memory, which allows more efficient use of the cache and transfer more data for the same bandwidth. Several techniques are supported, depending on the type of data, a different compression algorithm is selected. An important role is played by the delta color compression algorithm. Thanks to him, not the color of each individual pixel is encoded, but the difference between the pixels during serial data transmission. Some average tile color and color offset data for each pixel of this tile is calculated.

This compression makes Maxwell highly productive, but Pascal is even more efficient. The GP104 GPU additionally supports new algorithms with even more compression for cases where the difference between the color is minimal.

As an example, NVIDIA cites two slides from the game Project CARS. Those tiles where data compression was applied are painted in pink on them. The top slide shows the work of compression on Maxwell, the bottom one on Pascal.

As you can see, Pascal compression is also applied to those areas where it is not performed on Maxwell. As a result, almost the entire frame was compressed. Of course, the efficiency of such algorithms depends on each particular scene. According to NVIDIA, the difference in this efficiency between the GeForce GTX 1080 and GeForce GTX 980 ranges from 11% to 28%. If we take 20% as an average value, then taking into account the increase in memory frequencies, the resulting increase in throughput is about 70%.

The next generation of GeForce supports Async Compute with improved compute utilization for different types tasks. In modern games, GPUs can perform other tasks at the same time as rendering images. This can be the calculation of the physics of bodies, image post-processing and a special technique of asynchronous time distortion (Asynchronous Time Warp) for the virtual reality mode. When performing different tasks, not all computing units are always involved, and the execution of each task may take different times. For example, if non-graphical computations take longer than graphical computations, then it still waits for each process to complete in order to switch to new tasks. At the same time, part of the GPU resources is idle. Pascal introduced dynamic load balancing. If one task was completed earlier, then the freed resources are connected to another task.

Thus, it is possible to avoid downtime and increase overall performance with a combined load on the GPU. With such a load, the speed of switching between tasks also plays an important role. Pascal supports task interruption at different levels for the fastest possible switching. When a new command is received, the processor interrupts tasks at the pixel and stream processing levels, saving their state for further completion, and the computing units are taken over for a new task. Pascal supports interrupts at the instruction level, Maxwell and Kepler only at the thread level.

Interruption at different levels allows you to more accurately determine the moment of task switching. This is important for the Asynchronous Time Warp technique, which deforms an already formed image before outputting it for correction according to the position of the head. With Asynchronous Time Warp, you need a quick lead to switch strictly before the frame is displayed, otherwise artifacts in the form of “jitter” of the picture are possible. Pascal handles this task best.

Pascal introduced hardware support for multi-projection technology, which allows you to work simultaneously with different image projections. A special Simultaneous Multi-Projection block inside the PolyMorph Engine is responsible for the formation of different projections when processing one geometry stream. This block processes geometry simultaneously for 16 projections with one or two perspective centers. This does not require geometry reprocessing and allows data to be replicated up to 32 times (16 projections by two points).

Thanks to the technology, you can get the correct image on multi-monitor configurations. When using three monitors, the image is rendered for one projection. If the edge monitors are rotated at a slight angle to create an ambience effect, then you will get incorrect geometry in the side areas. Multi-projection creates the correct image, forming the correct projection in accordance with the angle of the monitor. The only condition for this mode is that the application itself supports wide FOV.

This imaging technique allows for the most efficient use of curved panels, and also opens up opportunities for correct rendering on other display devices, even on a spherical screen.

This technology extends the capabilities of Pascal in the formation of a stereo image and in virtual reality (VR) systems. In stereo mode, two images of the same scene are generated for each eye. Hardware support for Simultaneous Multi-Projection allows you to create each projection for your eye with a single geometry processing using Single Pass Stereo technology. And it significantly speeds up the work in this mode.

In VR systems, the user uses glasses with special lenses that introduce certain distortions. To compensate, the image is slightly deformed at the edges, and the user eventually observes the image corrected by the lens. But initially, the video card outlines the image in the usual flat projection, and then part of the peripheral image disappears.

Lens Matched Shading technology can split an image into four quadrants and then sample pixels. That is, the picture is initially projected onto several planes that simulate the curved shape of the lens.

The final image is rendered at a lower resolution, unnecessary areas are cut off. Initially, the Oculus Rift image is 1.1 megapixels per eye, but the original flat projection is rendered at 2.1 megapixels. Thanks to Lens Matched Shading, the initial image will be 1.4 megapixels. This allows you to significantly increase performance in VR mode.

Virtual reality is a promising direction that will expand the experience of interacting with the virtual environment and give players new sensations. NVIDIA actively supports the development of VR. One of the limiting factors for the popularization of VR systems is the high performance requirements of the graphics accelerator. Special technologies and hardware optimization contribute to a qualitative increase in performance in this direction. The company has released a comprehensive set of VRWorks from special APIs, libraries and software engines. It includes tools for working with Single Pass Stereo and Lens Matched Shading. It also includes MultiRes Shading technology, which allows you to change the resolution in the side zones during VR rendering in order to reduce the load.

The effect of presence is associated not only with visual sensations, but also with other senses. Sound also plays an important role. That's why NVIDIA developed VRWorks Audio technology to recreate realistic sound by taking into account the position of the source of sound waves and their reflection from surfaces. The technology uses the OptiX engine, which was originally used for rendering lighting using the ray tracing method. Tracks the path of sound "beams" from the source to reflective surfaces and back. This progressive method will allow you to recreate realistic sound, taking into account the acoustic features of the virtual room and with the imposition of reflected sounds. Learn more about NVIDIA VRWorks Audio in the video:

You can enhance the immersion effect by interacting with the virtual environment. Now interactivity is implemented through positional tracking and tracking of hand controllers. Based on PhysX, a mechanism has been created that determines whether there will be interaction during virtual contact with one or another object. Also with PhysX, you can implement physically reliable effects when exposed to a virtual environment.

The new generation of video cards has support for VR SLI. This mode provides that a separate GPU will process the image for each eye in VR mode. This method eliminates delays in SLI operation and provides better performance. Support for VR SLI will be implemented in Unreal Engine 4 and Unity, which allows us to hope for a greater popularization of this technology as the availability of virtual reality systems grows.

Simple SLI technology has also been updated. Older GeForce video cards always had two connectors for SLI bridges. These bridges are needed for switching all video cards with each other in 3-Way and 4-Way SLI modes. Now in simple SLI, two video cards can use two communication interfaces at once, increasing the overall throughput.

The new switching method requires new SLI HB dual bridges. Support for shared mode when connected via a simple single bridge is retained. The dual bridge is recommended for high resolutions - 4K, 5K and multi-monitor systems. Speed ​​bridging is also recommended at 2K with a 120Hz monitor or faster. In simpler modes, you can get by with an old-style bridge.

The GeForce GTX 1080 has increased the speed of the interface itself - from 400 MHz to 650 MHz. It can be implemented with new bridges and some versions of the old format. Increasing the data rate in SLI provides smoother frame changes and some increase in performance in heavy modes.

Multi-GPU rendering capabilities in DirectX 12 have been enhanced. Two main types of work are supported with such configurations: Multi Display Adapter (MDA) and Linked Display Adapter (LDA). The first allows you to work together with different GPUs, including combining the potential of integrated and external graphics. LDA is designed to share similar solutions. Implicit LDA is essentially used in SLI, which provides wide compatibility with applications at the software level. Explicit LDA and MDA give more options to developers, but it's up to them to ensure this mode in every application.

It is also worth noting that SLI support is officially announced only in a configuration of two GeForce GTX 1080. More complex configurations are theoretically possible in Explicit LDA and MDA modes. Interestingly, at the same time, NVIDIA offers to unlock the 3-Way and 4-Way modes using a special code for enthusiasts. To do this, you will need to make a special request on the company's website for the identifier of your GPU.

Fast Sync support has been added to the GP104 GPU. This technology is an alternative to having V-sync turned on or off. In fast-paced games (especially multiplayer games), high frame rates ensure maximum responsiveness to user actions. But if the monitor refresh rate is exceeded, artifacts in the form of image breaks are possible. This neutralizes vertical sync, which provides some delay along the way. Fast Sync allows you to display the maximum number of frames without possible gaps. This is provided by hardware changes in the image output pipeline. A triple buffer is used instead of the traditional double buffer, and only the fully rendered frame is output.

With Fast Sync, you can play on a regular monitor at 100-200 fps without visual artifacts and with minimal delays, as in normal mode with VSync disabled. Below are the results of a study of delays in displaying an image in different modes in the game Counter-Strike: Global Offensive.

As you can see, there is a slight difference between Fast Sync and disabled VSync, but it cannot be compared with frame output delays with active VSync.

If we talk not about maximum responsiveness, but about maximum image smoothness, then it is provided by G-Sync technology, which is implemented in conjunction with special monitors. G-Sync provides full hardware synchronization of the displayed frames with the screen refresh rate.

The GeForce GTX 1080 can output via DVI, HDMI and DisplayPort. DisplayPort 1.2 and HDMI 2.0b with HDCP 2.2 are supported, but the graphics card is also DisplayPort 1.3/1.4 ready. When using the latter, 4K at 120Hz or 8K (7680x4320) at 60Hz can be output via two DisplayPort 1.3 cables. For comparison, it should be noted that the GeForce GTX 980 can only output 5120x3200 when switched via two DisplayPort cables.

The standard version of the GeForce GTX 1080 is equipped with three DisplayPort ports, one HDMI and one Dual-Link DVI.

The GP104 processor received an improved video decoding / encoding block with support for the PlayReady 3.0 standard (SL3000) and HEVC hardware decoding with support for high-quality 4K / 8K video. The full capabilities of the GeForce GTX 1080 versus the GeForce GTX 980 are shown in the table below.

In the list of innovations GeForce GTX 1080 support for HDR content and displays. This standard is a major breakthrough in technology, providing 75% visible color space coverage instead of 33% for RGB at 10/12 bit color depth. Such displays display more hues, have higher brightness and deeper contrast, allowing you to see more subtle color nuances. At the moment, HDR-enabled TVs are already being released, monitors are expected next year.

In addition to HDR decoding, hardware encoding is also supported, which will allow you to record video of this standard. And the HDR streaming feature for the Shield game console will be added soon.

NVIDIA is working with developers to bring HDR to PC gaming. As a result, Rise of the Tomb Raide, Tom Clancy's The Division, The Talos Principle, Paragon, the second part of Shadow Warrior and other games will receive HDR support.

Modern gaming is changing, players are showing new interests and a desire to look at their favorite game from a new angle. Sometimes an ordinary screenshot turns into something more than a simple frame from the game. And with NVIDIA Ansel, every screenshot can be extraordinary. This is a new technology for capturing images with a set of special features. Ansel allows you to apply filters, enhance the image, use a free camera and create panoramas. Application support is required for full functionality. To do this, Ansel provides a simple integration. For example, to integrate Ansel into The Witcher 3, the developers added only 150 lines of code, and the logic game Witness needed 40 lines of code.

Ansel puts the game in pause mode and then allows you to perform various operations. For example, you can change the camera and choose any angle. Some restrictions are possible only if the developers intentionally limit the movement of the free camera.

You can increase the resolution of the final image and increase the LOD level to achieve maximum clarity in all details. The upscaling is combined with additional anti-aliasing for the best effect.

Moreover, Ansel allows you to create gigantic images up to 4.5 gigapixels. Such images are stitched from separate fragments, which is performed at the hardware level. Also, various post-effects can be applied to the final image. The image can be saved in RAW format or EXR with 16-bit color encoding. This will give ample opportunities for subsequent work with him.

You can create stereo panoramas and 360-degree shots, which can then be viewed in virtual reality glasses.

There are a huge variety of effects that can be applied to the captured image - Grain, Bloom, Sepia, Lens effects and many more, up to creating a picture with a fisheye effect. The wide possibilities of Ansel are amazing. The player gets opportunities that simply did not exist before.

After studying the architecture and new technologies, we need to take a look at the GeForce GTX 1080 graphics card itself. The reference version looks like previous models with a slightly updated design and sharper outlines.

The reverse side is protected by two plates, which resembles the “booking” of the GeForce GTX 980.

The overall cooling design remained unchanged. The cooler works on the principle of a turbine. There is a large base, a ribbed heatsink for cooling the GPU, and an additional heatsink near the power node for better cooling of power elements.

We will consider all other nuances in a separate article, where at the same time we will conduct comparative testing. If we talk about the manufacturer's preliminary estimates, then NVIDIA compares the new product with the GeForce GTX 980 and talks about an advantage of about 70% in simple games and gap more than 2.5 times in VR mode. The difference with the GeForce GTX 980 Ti will be smaller, but we can talk about some specific values ​​​​after practical tests.

findings

It's time to sum up our theoretical acquaintance with the GeForce GTX 1080. This video card is currently the most technologically advanced product among graphics accelerators. The GeForce GTX 1080 features a 16nm Pascal processor and new GDDR5X memory for the first time. The architecture itself is a development of Maxwell with optimizations and new features for DirectX 12. Architectural improvements are greatly enhanced by a significant increase in GPU and memory frequencies. Very significant progress in the field of VR rendering due to new technologies that speed up work in this mode. A progressive innovation is support for HDR displays and related content. Thanks to the new video processing unit, even more possibilities for playing and recording high-definition video, including work with the HDR format. Fans of ultra-dynamic multiplayer games will appreciate Fast Sync technology. Connoisseurs of virtual beauty will be pleased with the possibilities of Ansel. Buying a GeForce GTX 1080, you end up with not only the fastest video accelerator at the moment, but also the most functional one.

Officially, this model will be available to customers after May 27. Founders Edition reference design versions will go on sale first. They will have a higher price tag. A little later, non-standard options will be released, the cost of which is $100 lower. Well, by the time the GeForce GTX 1080 appears on the domestic market, we will try, as part of a large test, to fully reveal their potential in comparison with existing top-end video cards.