share_log

英伟达25年路线图惊爆流出!老黄豪赌B100暴打AMD,秘密武器X100曝光

Nvidia's 25-year roadmap has been revealed with surprise! Lao Huang Hao bets B100 violently on AMD, secret weapon X100 revealed

wallstreetcn ·  Oct 11, 2023 01:57

Recently, foreign media revealed a new Nvidia GPU roadmap. The technical details of the most powerful B100 in history have been revealed. Among them, the most mysterious X100, is reported to be launched in 2025.

Nvidia's AI hardware supremacy has been around for too long!

Now, every major technology company is watching, waiting to overthrow its dominance in one fell swoop.

Of course, Nvidia won't sit back and wait to die.

Recently,Foreign media SemiAnalysis revealed Nvidia's hardware roadmap for the next few years, including the much-anticipated H200, B100, and “X100” GPUs.

big

Along with that, some hard-core information also flowed out, including Nvidia's process technology plan, HBM3E speed/capacity, PCIe 6.0, PCIe 7.0, NVLink, and 1.6T 224G SerDes plans.

If these plans are successful, Nvidia will continue to successfully crush its rivals.

Of course, the position of supremacy is not that good — AMD's MI300 and MI400, Amazon's Trainium2, Microsoft's Athena, and Intel's Gaudi 3 won't make Nvidia any better.

Get ready for a high-powered attack ahead!

big

Nvidia doesn't just want to be a hardware hegemon

Google has already begun to lay out its own AI infrastructure. The TPUV5 and TPUV5e they have built can be used not only for internal training and inference, but also for external customers such as Apple, Anthropic, CharacterAI, and Midjourney.

Google isn't Nvidia's only threat.

On the software side, Meta's PyTorch 2.0 and OpenAI's Triton are also growing rapidly, making it compatible with other hardware vendors.

big

Today, the software gap still exists, but it's far less than it used to be.

In terms of software stacks, AMD's GPUs, Intel's Gaudi, Meta's MTIA, and Microsoft's Athena have all achieved a certain level of development.

Although Nvidia continues to lead the way in hardware, the gap will close faster and faster.

Nvidia H100 won't dominate the coquettish market for too long.

Over the next few months, whether it's AMD's MI300 or Intel's Gaudi 3, will launch hardware products that are technically superior to the H100.

big

However, in addition to difficult rivals such as Google, AMD, and Intel, there are also companies that have put a lot of pressure on Nvidia.

Although these companies are temporarily lagging behind in hardware design, they can receive subsidies from the giants behind them — the world has been struggling with Nvidia for a long time, and these companies all want to break Nvidia's huge profit monopoly on HBM.

Amazon's upcoming Trainium2 and Inferentia3, and Microsoft's upcoming Athena, are investments that have been in the works for years.

The competition is menacing, and of course Nvidia won't sit back and wait to die.

big

According to the foreign media SemiAnalysis, regardless of management style or route decisions, Nvidia is “one of the most questionable companies in the industry.”

And Hwang In-hoon embodies the spirit of Andy Grove.

Success leads to complacency. Complacency leads to failure. Only paranoid people can survive.

To stay in first place, Nvidia was ambitious and adopted a multi-pronged adventure strategy.

They no longer want to compete with Intel and AMD in traditional markets, but instead want to become tech giants like Google, Microsoft, Amazon, Meta, and Apple.

big

However, behind Nvidia's DGX Cloud, software, and acquisition strategies for non-semiconductor fields, there is a big game of chess.

The latest details of the road map have come to light!

Important details of Nvidia's latest roadmap have been revealed.

The content includes details such as the network, memory, package and process nodes used, various GPUs, SerDes selection, PCIe 6.0, co-packaged optics, and optical path switches.

big

Obviously, under competitive pressure from Google, Amazon, Microsoft, AMD, and Intel, Nvidia accelerated overnightB100with“X100”Research and development.

B100: Time to market is above everything else

According to internal sources, Nvidia's B100 will be mass-produced in the third quarter of 2024, and some early samples will be shipped in the second quarter of 2024.

Judging from performance and TCO, whether it's Amazon's Trainium2, Google's TPUv5, AMD's MI300X, Intel's Gaudi 3, or Microsoft's Athena, it's all underexploded compared to it.

big

Even if you take into account the subsidies received from design partners, AMD, or TSMC, they are all incongruent.

In order to bring the B100 to market as soon as possible, Nvidia made a number of compromises.

For example, Nvidia wanted to set power consumption at a higher level (1000W), but in the end, they chose to continue using the H100's 700W.

In this way, when the B100 is launched, it can continue to use air cooling technology.

big

Also, in the early B100 series, Nvidia will stick to PCIe 5.0.

The combination of 5.0 and 700W means that it can be directly inserted into the existing H100 HGX server, thereby greatly improving supply chain capabilities and mass production and delivery earlier.

Part of the reason they decided to stick with 5.0 is that AMD and Intel are still far behind in PCIe 6.0 integration. And even Nvidia's own in-house team wasn't ready to use PCIe 6.0 CPUs.

Also, they'll be using faster C2C-style links.

big

In the future, ConnectX-8 will be equipped with an integrated PCIe6.0 switch, but currently no one is ready.

It is reported that Broadcom and AsteraLabs will not be able to prepare mass-produced PCIe 6.0 retimers until the end of the year, and considering the size of these substrates, only more retimers will be needed.

This also means that the initial B100 will be limited to 3.2T, and the speed will only be 400G when using ConnectX-7, not 800G per GPU as claimed by Nvidia on PPT.

If the air is kept cool, and the power supply, PCIe, and network speed are constant, then both manufacturing and deployment will be easy.

big

Later, Nvidia will launch a 1,000W+ version, the B100, which requires water cooling.

This version of the B100 will provide each GPU with full 800G network connectivity through ConnectX-8.

For Ethernet/InfiniBand, these SerDes are still 8x100G.

Although the network speed for each GPU has doubled, the base has been cut in half because they still have to go through the same 51.2T switch. However, the 102.4T switch will no longer be used in the B100 generation.

Interestingly, it has been revealed that the NVLink component on the B100 will use 224g SerDes. If Nvidia can actually do this, it would certainly be a huge step forward.

Most industry insiders agree that 224G is unreliable and impossible to achieve in 2024, with the exception of those at Nvidia.

You know, whether it's Google, Meta, or Amazon, their target for mass production of 224G AI accelerators is set for 2026/2027.

If Nvidia achieves this by 2024/2025, the iron will surely beat their opponents to nothing.

big

It is reported that the B100 is still TSMC's N4P, not a technology based on the 3nm process.

Obviously, for such a large chip size, TSMC's 3nm process is not mature.

big

According to the board size revealed by Nvidia's substrate supplier Ibiden, it appears that Nvidia has switched to a design composed of 2 single-chip large-chip MCMs, including 8 or 12 HBM stacks.

SambaNova and Intel's chips next year both use a similar macro-design.

big

The reason Nvidia doesn't use hybrid bonding technology like AMD is because they need mass production, and cost is one of their major concerns.

According to SemiAnalysis estimates, the memory capacity of these two B100 chips will be similar to or higher than AMD's MI300X, reaching 24GB of stacking.

The air-cooled version of the B100 can reach 6.4 Gbps, while the liquid-cooled version may be as high as 9.2 Gbps.

Additionally, Nvidia also showcased the GB200 and B40 in the roadmap.

Both the GB200 and GX200 use G, which is clearly a placeholder since Nvidia will launch a new CPU based on the Arm architecture. I won't be using Grace for long.

The B40 is likely only half of the B100, only a single N4P chip, and a maximum of 4 or 6 layers of HBM. Unlike the L40S, this makes a lot of sense for small models of reasoning.

“X100”: Fatal Hit

The most notable part of the revealed roadmap is Nvidia's “X100” schedule.

The interesting thing is that it fits perfectly with AMD's current MI400 schedule. Just one year after the launch of the H100, AMD released the MI300X strategy.

AMD's packaging for the MI300X is impressive. They've put in a lot more computation and memory, hoping to surpass the H100 a year ago and thus surpass Nvidia in terms of pure hardware.

big

Nvidia also discovered that the pace at which they release new GPUs every two years gives competitors a great chance to seize the market.

Nvidia, which is under pressure, is speeding up the product cycle to once a year without giving rivals any chance. For example, they plan to launch the “X100” in 2025, just one year after the B100.

Of course, the “X100” is currently not mass-produced (unlike the B100), so everything is still up in the air.

You know, in the past, Nvidia never discussed products after the next generation of products; this time it's unprecedented.

Also, the name probably isn't “X100.”

Nvidia's tradition has always been to name GPUs after prominent female scientists such as Ada Lovelace, Grace Hopper, and Elizabeth Blackwell.

As for “X,” the only logical thing is Xie Xide to study semiconductors and metal strip structures, but considering her status, the probability shouldn't be great.

big

Supply Chain Master: Lao Huang's Big Gamble

Since Nvidia's inception, Wong In-hoon has been actively promoting supply chain control to support huge growth goals.

Not only are they willing to take on irrevocable orders — up to $11.15 billion in procurement, capacity, and inventory commitments, but they also have an advance payment agreement of $3.81 billion.

Arguably, no supplier can match it.

And Nvidia's story shows more than once that they can creatively increase supply when supply is in short supply.

big

A conversation between Wong In-hoon and Chang Chung-mou in 2007

In 1997, when Zhang Zhongmou and I met, Nvidia, which had only 100 people, earned 27 million dollars in revenue in that year. You might not believe it, but in the past, Zhang Zhongmou used to make sales calls, and also visited his home. And I'll explain to Zhang Zhongmou what Nvidia does and how big our chips need to be, and they get bigger every year. Since then, Nvidia has made a total of 127 million wafers. Since then, Nvidia has grown nearly 100% every year until now. That is, in the past 10 years, the compound annual growth rate has reached about 70%.

At the time, Zhang Zhongmou couldn't believe Nvidia needed so many wafers, but Huang Renxun persisted.

Nvidia has achieved great success through bold experiments on supply. Despite having to write down billions of dollars worth of inventory from time to time, they still reap positive benefits from overordering.

This time, Nvidia has directly seized most of the supply of GPU upstream components --

They placed very large orders from three HBM suppliers, SK Hynix, Samsung, and Micron, which squeezed out supplies from everyone but Broadcom and Google. At the same time, it also bought most of TSMC's CoWoS supply, as well as Amkor's production capacity.

In addition, Nvidia also makes full use of downstream components required for HGX boards and servers, such as retimers, DSP, and optics.

If suppliers turn a deaf ear to Nvidia's requirements, then they will face Lao Huang's “radish plus stick” —

On the one hand, they will get unimaginable orders from Nvidia; on the other hand, they may be removed from Nvidia's current supply chain.

Of course, Nvidia also only uses promises and non-cancellable orders if the supplier is critical and cannot be eliminated or diversified.

big

Every vendor seems to think they're a winner in AI, partly because Nvidia has placed a large number of orders with all of them, and they all think they've won most of the business. But in reality, it's just because Nvidia is growing so fast.

Back to market dynamics, although Nvidia's goal is to achieve over $70 billion in data center sales next year, only Google has sufficient production capacity upstream — with over 1 million devices. AMD's total production capacity in the AI field is still very limited, with a maximum of only a few hundred thousand units.

Business Strategy: Potential Anti-Competition

As we all know, Nvidia is taking advantage of the huge demand for GPUs to market and cross-sell products to customers.

A wealth of information has been revealed in the supply chain, and Nvidia will provide priority allocations to certain companies based on a range of factors. This includes but is not limited to: diversified procurement plans, plans to independently develop AI chips, and purchase of Nvidia's DGX, NIC, switches, and/or optical equipment.

big

In fact, Nvidia's bundles have been very successful. Although previously only a very small fiber transceiver supplier, their business volume has tripled in a quarter, and next year's shipments are expected to exceed $1 billion — far exceeding the growth rate of their GPU or network chip business.

It can be said that these strategies are quite thorough.

For example, if you want a 3.2T network and reliable RDMA/ROCE on Nvidia's system, the only way is to use Nvidia's NIC. Of course, on the one hand, it is also because the products of Intel, AMD, and Broadcom really lack competitiveness—they are still at 200G level.

And through supply chain management, Nvidia has also enabled the delivery cycle of the 400G InfiniBand NIC to be significantly shorter than that of the 400G Ethernet NIC. However, in terms of chip and circuit board design, these two NICs (ConnectX-7) are actually exactly the same.

The reason for this is Nvidia's SKU configuration rather than actual supply chain bottlenecks — forcing businesses to buy more expensive InfiniBand switches instead of standard Ethernet switches.

That's not all. If you look at how fascinated the supply chain is with L40 and L40S GPUs, you know that Nvidia is manipulating distribution again—in order to win more H100 allocations, OEMs need to buy more L40S.

This is the same as Nvidia's operation in the PC field — notebook manufacturers and AIB partners must buy larger amounts of G106/G107 (mid/low-end GPUs) to get the more scarce and more profitable G102/G104 (high-end and flagship GPUs).

As a partner, people in the supply chain have also been instilled with the saying that the L40S is better than the A100 because it has higher FLOPS.

But in reality, these GPUs aren't suited to LLM inference because their video memory bandwidth is less than half that of the A100, and they don't have NVLink.

This means it's almost impossible to run LLM on the L40S and achieve good TCO, unless it's a very small model. And high-volume processing also causes the tokens/s assigned to each user to be almost unusable, making theoretical FLOPS useless in practical applications.

big

Additionally, Nvidia's MGX modular platform eliminates the hard work of server design, but at the same time reduces OEM profit margins.

Companies such as Dell, HP, and Lenovo are clearly boycotting MGX, but companies such as Ultra Micro, Quanta, Asus, and Gigabyte are scrambling to fill this gap and commercialize low-cost “enterprise artificial intelligence.”

And these OEM/ODMs involved in the L40S and MGX hype can also get better distribution of mainline GPU products from Nvidia.

Co-Optoelectronic Packaging (Co-Optics)

In terms of CPO, Nvidia also attaches great importance to it.

They've been working on a variety of solutions, including those from Ayer Labs, as well as their own solutions from Global Foundries and TSMC.

big

Currently, Nvidia has examined several startup companies' CPO plans, but has yet to make a final decision.

big

Analysts believe that Nvidia is likely to integrate the CPO into the “X100” NVSwitch.

Because integrating directly into the GPU itself can be too expensive and difficult in terms of reliability.

big

Optical circuit switch (optical circuit switch)

One of Google's greatest strengths in artificial intelligence infrastructure is its optical path switch.

Obviously, Nvidia is also pursuing something similar. Currently, they have contacted a number of companies and are hoping to cooperate in development.

big

Nvidia realized that Fat Tree had reached its end in continuing to expand, so a different topology was needed.

Unlike Google's choice for the 6D Torus, Nvidia favors the Dragonfly structure.

big

According to information, Nvidia is still far from shipping OCS, but they hope to be closer to this goal by 2025, but it is likely that it will not be possible to achieve it.

big

OCS+ CPO is the holy grail, and especially when OCS can implement packet-by-packet switching, it will directly change the rules of the game.

However, no one has demonstrated this ability so far, not even Google.

Although Nvidia's OCS and CPO are still just two sets of PPTs for the research department, analysts believe that the CPO will move further away from commercialization from 2025 to 2026.

Source of this article: Xinzhiyuan, original title: “Nvidia's 25-year roadmap has been released in surprise! “Lao Huang Hao bets B100 violently on AMD, secret weapon X100 revealed”

Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more
    Write a comment