Nvidia plunges amid US export restrictions on AI chips to China: A good buy or goodbye?

Views 239K Contents 134

Global strongest AI chip is coming! NVIDIA's H200 performance soars by 90%

Carter West joined discussion · Nov 14, 2023 18:10

Follow me to stay informed and connected!

Global strongest AI chip is coming! NVIDIA's H200 performance soars by 90%

On Monday, Nvidia announced the HGX H200 Tensor Core GPU, which utilizes the Hopper architecture to accelerate AI applications. It's a follow-up to the H100 GPU, released last year and previously Nvidia's most powerful AI GPU chip. If widely deployed, it could lead to far more powerful AI models—and faster response times for existing ones like ChatGPT—in the near future.

Nvidia will make the H200 available in several form factors. This includes Nvidia HGX H200 server boards in four- and eight-way configurations, compatible with both hardware and software of HGX H100 systems. It will also be available in the Nvidia GH200 Grace Hopper Superchip, which combines a CPU and GPU into one package for even more AI oomph (that's a technical term).

Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure will be the first cloud service providers to deploy H200-based instances starting next year, and Nvidia says the H200 will be available "from global system manufacturers and cloud service providers" starting in Q2 2024.

Performance increased by 1.4-1.9 times

Lack of computing power (often called "compute") has been a major bottleneck of AI progress this past year, hindering deployments of existing AI models and slowing the development of new ones. Shortages of powerful GPUs that accelerate AI models are largely to blame. One way to alleviate the compute bottleneck is to make more chips, but you can also make AI chips more powerful. That second approach may make the H200 an attractive product for cloud providers.

Let's take a closer look at where the performance improvements of H200 over H100 are reflected.

According to Nvidia, the H200 is the company's first chip to use HBM3e memory. This type of memory is faster and has a larger capacity, making it more suitable for large language models. The following figure shows the relative performance comparison between H100 and H200 on a range of AI inference workloads:

As can be seen, the main improvement in H200's performance compared to H100 is its inference performance on large models. The H200 doubles inference performance compared to H100 GPUs when handling large language models such as Llama2 70B.

It is clear that achieving a 2x performance increase within the same power range means that actual power consumption and overall ownership costs are reduced by 50%. Therefore, in theory, NVIDIA could price the H200 GPU similar to the H100.

With the introduction of H200, energy efficiency and TCO reach new levels. This cutting-edge technology offers unparalleled performance, all within the same power profile as the H100 Tensor Core GPU. AI factories and supercomputing systems that are not only faster but also more eco-friendly deliver an economic edge that propels the AI and scientific communities forward.

Thanks to the Transformer engine, decreased floating-point precision, and faster HBM3 memory, the H100, which has been fully shipped since this year, has already seen an 11-fold increase in inference performance on the GPT-3 175B model compared to the A100. With larger and faster HBM3e memory, the H200 can directly boost performance up to 18 times without any hardware or code changes. Even compared to the H100, the performance of H200 has increased by 1.64 times, purely due to the growth of memory capacity and bandwidth.

In order to prevent customers who have hoarded a large number of H100 GPUs from being dissatisfied, Nvidia seems to have only one solution: to price the Hopper equipped with 141 GB HBM3e memory at 1.5 to 2 times the price of the 80 GB or 96 GB HBM3 memory version. Imagine what level of performance will be achieved if future devices have 512 GB of HBM memory and 10 TB/s bandwidth? How much are you willing to pay for this fully functional GPU? The final product may sell for $60,000 or even $90,000 since many users are already willing to pay $30,000 for products that are currently not fully utilized.

More memory

For various technical and economic reasons, processors have often been configured with excessive computing power for decades, but the corresponding memory bandwidth is relatively insufficient. The actual memory capacity often depends on the requirements of the device and workload. In the fields of HPC simulation/modeling and even AI training/inference, the memory bandwidth and memory capacity of even the most advanced GPUs are relatively insufficient, making it impossible to substantially improve the utilization of existing vector and matrix engines on chips. As a result, these GPUs can only spend a lot of time waiting for data delivery, unable to fully utilize their strengths.

Memory bandwidth is crucial for HPC applications, as it enables faster data transfer and reduces complex processing bottlenecks. For memory-intensive HPC applications like simulations, scientific research, and artificial intelligence, the H200’s higher memory bandwidth ensures that data can be accessed and manipulated efficiently, leading to 110X faster time to results.

B100 is coming

During Nvidia's financial conference about a month ago, the company released its technology roadmap. It was revealed that the GH200 GPU and H200 GPU accelerators would serve as transitional products before the release of the "Blackwell" GB100 GPU and B100 GPU, which are planned to be launched by 2024.

Regardless of how the Blackwell B100 GPU accelerator from Nvidia performs, it can be assumed that it will bring more powerful inference performance, and this performance improvement is likely to come from breakthroughs in memory rather than upgrades at the computational level. Here's a look at the inference performance improvement of the B100 GPU on the GPT-3 175B parameter model:

Finally, although the Blackwell B100 accelerator will debut at the GTC 2024 conference in March next year, actual shipments are expected to wait until the end of 2024.

When evaluating the competitive landscape in the chip industry, it's clear that Nvidia faces several contenders and potential threats:

AMD: AMD is a well-funded chipmaker with strong GPU expertise. However, its relative weakness on the software front may hinder its ability to compete effectively with Nvidia.

Intel: Although Intel hasn't seen much success in AI accelerators or GPUs, it should not be underestimated. As a major player in the semiconductor industry, Intel has the resources and capacity to make significant advancements in this field.

In-House Solutions from Hyperscalers: Companies like Google, Amazon, Microsoft, and Meta Platform are developing their in-house chips, such as TPUs, Trainium, and Inferentia. While these chips may excel in specific workloads, they might not outperform Nvidia's GPUs across a wide range of applications.

Cloud Computing Companies: Cloud providers will need to offer a variety of GPUs and accelerators to cater to their enterprise customers running AI workloads. While Amazon and Google may use their in-house chips for their own AI models, convincing a broad range of enterprise customers to optimize their AI models for these proprietary semiconductors could lead to vendor lock-in, which enterprises typically avoid.

In the fiercely competitive AI chip market, just as the market is questioning Nvidia's leading position will be challenged, the release of such a chip undoubtedly gives Nvidia investors a lot of confidence. Based on continuous product innovation, Nvidia still maintains a significant lead over its peers in the industry. However, the biggest uncertainty lies in China. Nvidia has customized the H20 chip for its Chinese market to bypass government regulations. In the next article, I will compare whether Nvidia's Chinese version of chips can meet the demands.