The world’s first with the world’s fastest 4nm GPU and HBM3 memory



NVIDIA’s flagship datacenter GPU, the Hopper H100, is featured in all its glory. (image credit: CNET)

At GTC 2022, NVIDIA unveiled its Hopper H100 GPU, a compute powerhouse designed for next-generation data centers. It’s been a while since we talked about this powerful chip, but it looks like NVIDIA has given select media a close-up of its flagship chip.

NVIDIA Hopper H100 GPU: First delivers high-resolution images with 4nm and HBM3 technology

CNET managed to capture not only the graphics board to which the H100 GPU is attached, but also the H100 chip. The H100 GPU is a monster chip loaded with the latest 4nm technology and includes 80 billion transistors with bleeding-edge HBM3 memory technology. According to Tech Outlet, the H100 is built on a PG520 PCB board with over 30 power VRMs and a massive integral interposer that uses TSMC’s CoWoS technology to combine the H100 GPU with a 6-stack HBM3 design.

Next-gen NVIDIA GeForce RTX 4090 with top AD102 GPU could be the first gaming graphics card to break past 100 TFLOP

NVIDIA Hopper H100 GPU images (Image credit: CNET):



Two of the six piles are kept to ensure yield integrity. But the new HBM3 standard allows up to 80Gb of capacity at 3Tb/s speeds which are insane. For comparison, the current fastest gaming graphics card, the RTX 3090 Ti, only offers 1 TB/s bandwidth and 24 GB of VRAM capacity. In addition, the H100 Hopper GPU also packs in the latest FP8 data format, and through its new SXM connection, it helps accommodate the 700W power design the chip is designed around.

NVIDIA Hopper H100 GPU Specifications at a Glance

So as per the specifications, the NVIDIA Hopper GH100 GPU is made up of a massive 144 SM (Streaming Multiprocessor) chip layout that is clocked in a total of 8 GPC. These GPCs are a total of 9 TPCs which is further made up of 2 SM units. This gives us 18 SMs per GPC and 8 on the whole 144 GPC configuration. Each SM is made up of 128 FP32 units which should give us a total of 18,432 CUDA cores. The following are some of the configurations you can expect from the H100 chip:

The full implementation of the GH100 GPU consists of the following units:

Intel CEO Pat Gelsinger expects end of chip shortage by 2024

  • 8 GPC, 72 TPC (9 TPC/GPC), 2 SM/TPC, 144 SM per full GPU
  • 128 FP32 CUDA cores per SM, 18432 FP32 CUDA cores per full GPU
  • 4 4th generation Tensor cores per SM, 576 per full GPU
  • 6 HBM3 or HBM2e stacks, 12 512-bit memory controllers
  • 60 MB L2 Cache
  • Fourth Generation NVLink and PCIe Gen 5

The NVIDIA H100 GPU with SXM5 board form-factor consists of the following units:

  • 8 GPC, 66 TPC, 2 SM/TPC, 132 SM per GPU
  • 128 FP32 CUDA Cores per SM, 16896 FP32 CUDA Cores per GPU
  • 4 4th generation Tensor cores per SM, 528 per GPU
  • 80 GB HBM3, 5 HBM3 stacks, 10 512-bit memory controllers
  • 50 MB L2 Cache
  • Fourth Generation NVLink and PCIe Gen 5

This is an increase of 2.25x over the full GA100 GPU configuration. NVIDIA is also benefiting from more FP64, FP16 and Tensor cores within its Hopper GPUs which will greatly increase performance. And it’s going to be a necessity to rival Intel’s Ponte Vecchio which is also expected to feature 1:1 FP64.

Cache is another place where NVIDIA has paid a lot of attention, bumping it up to 48MB in the Hopper GH100 GPU. That’s 50MB of cache featured on the Ampere GA100 GPU and 3 times the size of AMD’s flagship Aldebaran MCM GPU, the Mi250X.

Completing the performance figures, NVIDIA’s GH100 Hopper GPU will offer 4000 TFLOP of FP8, 2000 TFLOP of FP16, 1000 TFLOP of TF32 and 60 TFLOP of FP64 compute performance. These record-breaking figures supersede all other HPC accelerators that have come before it. For comparison, it’s 3.3x faster than NVIDIA’s own A100 GPU and 28% faster than AMD’s Instinct MI250X in FP64 counts. In FP16 compute, the H100 GPU is 3x faster than the A100 and 5.2x faster than the MI250X, which is really bonkers.

The PCIe variant which is a cut-down model was recently listed in Japan for over US$30,000, so one can imagine that the SXM variant with the beefier configuration would cost around $50 grand.

NVIDIA Ampere GA100 GPU Based Tesla A100 Specs:

NVIDIA Tesla Graphics Card Nvidia H100 (SMX5) Nvidia H100 (PCIE) Nvidia A100 (SXM4) Nvidia A100 (PCIE4) Tesla V100S (PCIE) Tesla V100 (SXM2) Tesla P100 (SXM2) Tesla P100
(PCI-Express)
Tesla M40
(PCI-Express)
Tesla K40
(PCI-Express)
GPU GH100 (Hopper) GH100 (Hopper) GA100 (amperes) GA100 (amperes) GV100 (Volta) GV100 (Volta) GP100 (Pascal) GP100 (Pascal) GM200 (Maxwell) GK110 (Kepler)
process node 4nm 4nm 7nm 7nm 12nm 12nm 16nm 16nm 28nm 28nm
Transistor 80 billion 80 billion 54.2 billion 54.2 billion 21.1 billion 21.1 billion 15.3 billion 15.3 billion 8 billion 7.1 billion
GPU Die Size 814mm2 814mm2 826mm2 826mm2 815mm2 815mm2 610 mm2 610 mm2 601 mm2 551 mm2
SMS 132 114 108 108 80 80 56 56 24 15
TPC 66 57 54 54 40 40 28 28 24 15
FP32 CUDA Core Per SM 128 128 64 64 64 64 64 64 128 192
FP64 CUDA Core / SM 128 128 32 32 32 32 32 32 4 64
FP32 CUDA Core 16896 14592 6912 6912 5120 5120 3584 3584 3072 2880
FP64 CUDA Core 16896 14592 3456 3456 2560 2560 1792 1792 96 960
tensor core 528 456 432 432 640 640 n/a n/a n/a n/a
texture units 528 456 432 432 320 320 224 224 192 240
boost clock TBD TBD 1410 MHz 1410 MHz 1601 MHz 1530 MHz 1480 MHz 1329 MHz 1114 MHz 875 MHz
TOP (DNN/AI) 2000 Tops
4000 top
1600 top
3200 top
1248 Tops
2496 tops with sparsity
1248 Tops
2496 tops with sparsity
130 top 125 top n/a n/a n/a n/a
FP16 Count 2000 TFLOPs 1600 TFLOP 312 TFLOPs
624 TFLOPs with seldom
312 TFLOPs
624 TFLOPs with seldom
32.8 TFLOP 30.4 TFLOP 21.2 TFLOP 18.7 TFLOP n/a n/a
FP32 Count 1000 TFLOPs 800 TFLOPs 156 TFLOPs
(19.5 TFLOP STANDARD)
156 TFLOPs
(19.5 TFLOP STANDARD)
16.4 TFLOP 15.7 TFLOP 10.6 TFLOP 10.0 TFLOPs 6.8 TFLOP 5.04 TFLOP
FP64 Count 60 TFLOPs 48 TFLOP 19.5 TFLOP
(9.7 TFLOP STANDARD)
19.5 TFLOP
(9.7 TFLOP STANDARD)
8.2 TFLOPs 7.80 TFLOP 5.30 TFLOP 4.7 TFLOP 0.2 TFLOPs 1.68 TFLOPs
memory interface 5120-bit HBM3 5120-bit HBM2e 6144-bit HBM2e 6144-bit HBM2e 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 384-bit GDDR5 384-bit GDDR5
memory size HBM3 @ 3.0Gbps up to 80Gb HBM2e @ 2.0 Gbps up to 80 GB HBM2 up to 40 GB @ 1.6 TB/sec
HBM2 up to 80 GB @ 1.6 TB/sec
HBM2 up to 40 GB @ 1.6 TB/sec
HBM2 @ 2.0 TB/s up to 80 GB
16 Gb HBM2 @ 1134 Gb/s 16 Gb HBM2 @ 900 Gb/s 16 Gb HBM2 @ 732 Gb/s 16 Gb HBM2 @ 732 Gb/s
12 Gb HBM2 @ 549 Gb/s
24 Gb GDDR5 @ 288 Gb/s 12 Gb GDDR5 @ 288 Gb/s
L2 cache size 51200 KB 51200 KB 40960 KB 40960 KB 6144 KB 6144 KB 4096 KB 4096 KB 3072 KB 1536 KB
TDP 700W 350W 400W 250W 250W 300W 300W 250W 250W 235W

Source



Related News

hur man tar bort ditt Twitter-konto på Android

Twitter har haft sina upp- och nedgångar de senaste åren, och även om ingen riktigt med säkerhet vet vad som kommer att hända med dess nya ägande, finns det gott om

Hur man gör en ficklampa i Minecraft: vi berättar alla möjliga alternativ

I Minecraft finns det vissa element som är avgörande för överlevnad, och en av dem är facklor. Och det är alltid viktigt att ha dessa element att vara

Po:s AI chatbot-app låter dig nu skapa dina egna bots med hjälp av gester

En app som heter Poe låter nu användare skapa sin egen chatbot med hjälp av signaler kombinerade med befintliga bots som ChatGPT som bas. Lanserades först offentligt i