Delay to Nvidia's new AI chip could affect Microsoft, Google and Meta.
Nvidia’s $NVIDIA (NVDA.US)$ upcoming artificial intelligence chips will be delayed by three months or more due to design flaws, a snafu that could affect customers such as Meta Platforms, Google and Microsoft that have collectively ordered tens of billions of dollars worth of the chips, according to two people who help produce the chip and server hardware for it.
Nvidia this week told Microsoft $Microsoft (MSFT.US)$ , one of its biggest customers, and another large cloud provider about a delay involving the most advanced AI chip in its new Blackwell series of chips, according to a Microsoft employee and another person with direct knowledge.
Nvidia told its new GPUs would be delayed three months or longer
The delay was caused by design flaws that decreased the production yield of the chip
The snag could impact AI development plans for Microsoft, OpenAI and others
Nvidia unveiled Blackwell in March and CEO Jensen Huang said in May that the company planned to ship large numbers of Blackwells later this year. That was before design problems arose unusually late in the production process. Nvidia is conducting new test production runs with its chip manufacturer, Taiwan Semiconductor Manufacturing Company, to sort out the kinks, the people involved with the Blackwell chip said.
As a result, big shipments aren’t expected until the first quarter. After receiving chips, it typically takes cloud providers about three months to get large clusters of them up and running, according to someone who works on these types of data centers.
The design and production snag adds to the concerns hanging over Nvidia, which the U.S. Department of Justice is now investigating over complaints from rivals about alleged anti- competitive behavior. The company is still in a strong position, as the performance of its chips is far ahead of those of its competitors.
Shareholder expectations for the Blackwell chips are running high. An analyst at Keybanc Capital Markets projected that the Blackwell chips could drive Nvidia’s data center revenue to more than $200 billion in 2025 from $47.5 billion in 2024. (Such estimates may not account for the new delay.)
“We will see a lot of Blackwell revenue this year,” Huang said in a May earnings call with analysts.
Nvidia’s AI server chips, known as graphics processing units, have been the lifeblood of conversational and video AI from developers such as OpenAI. They’ve also helped cloud providers such as Microsoft juice sales from renting out the chips to other developers.
If the upcoming AI chips, known as the B100, B200 and GB200, are delayed three months or more, it may prevent some customers from operating large clusters of the chips in their data centers in the first quarter of 2025, as they had planned.
The biggest customers, including Microsoft, OpenAI and Meta $Meta Platforms (META.US)$ , plan to use the new chips to develop future generations of large language models, the software behind ChatGPT, the Meta AI assistant and other automation features.
These companies said they need many times more computing power to achieve big leaps in the performance of their software so it can give better answers to complex queries, automate multi-step tasks or generate realistic video. They expect Nvidia’s next AI chips to enable such leaps, especially when they are grouped together in a cluster, also known as a supercomputer.
An Nvidia spokesperson wouldn’t comment about its statements to customers about the delay but said customers are testing samples of Blackwell chips and “production is on track to ramp” later this year.
Spokespeople for Microsoft, Google $Alphabet-C (GOOG.US)$ , Amazon $Amazon (AMZN.US)$ Web Services and Meta declined to comment. A TSMC $Taiwan Semiconductor (TSM.US)$ spokesperson didn’t respond to a request for comment sent after office hours.
Nvidia’s biggest customers have big plans for GB200 chips in particular. Over the past week, Google, Meta and Microsoft have disclosed unprecedented increases in spending on data centers and AI chips, temporarily lifting Nvidia shares and prompting questions about when those companies would produce revenue and profits from the investments.
Giant Blackwell Orders
Google, for example, has ordered more than 400,000 GB200 chips, said the two people who work on the chip. Together with server hardware, the cost of Google’s orders could be well north of $10 billion, though it isn’t clear when Google expects to receive them.
Google, for example, has ordered more than 400,000 GB200 chips, said the two people who work on the chip. Together with server hardware, the cost of Google’s orders could be well north of $10 billion, though it isn’t clear when Google expects to receive them.
It is highly unusual to uncover significant design flaws right before mass production.
To put that amount into context, Google has been on pace to spend about $50 billion this year on chips and other equipment and property, up more than 50% from last year.
Meta also placed an order worth at least $10 billion, while Microsoft in recent weeks increased the size of its order 20%, the two people said, though its total order size couldn’t be learned. Microsoft was planning to have between 55,000 and 65,000 GB200 chips ready for OpenAI to use by the first quarter of 2025, according to a person with direct knowledge of the order.
Microsoft managers had planned to make Blackwell-powered servers available to OpenAI by January but may need to plan for March or early spring, said a person with knowledge of the situation.
The Blackwell design problem came up in recent weeks, as engineers at TSMC discovered flaws in preparation for mass production, said the two people involved with the Blackwell chip production.
The GB200 chips contain two connected Blackwell GPUs alongside a Grace central processing unit. The problem involved a processor die—a piece of silicon that holds circuits for a chip—that connected the two Blackwell GPUs. The snag decreased the yield, or number of chips TSMC was able to produce for Nvidia. Such problems typically prompt companies to stop production.
As a result, Nvidia has been making adjustments to the design and will have to conduct a new production test run at TSMC before mass production can begin, the people said.
Nvidia told at least one cloud provider that it might consider producing a version of the chip that only contains one Blackwell chip, in an effort to avoid the die issue and ship chips faster, according to someone who spoke with Nvidia about the delay.
Uncommon Delay
TSMC initially planned to start mass production of the Blackwell chips in the third quarter and ship them en masse to Nvidia customers starting in the fourth quarter. The Blackwell chips are now expected to go into mass production in the fourth quarter, with the servers slated for mass shipment in the subsequent quarters if no further issue arises, they said.
TSMC initially planned to start mass production of the Blackwell chips in the third quarter and ship them en masse to Nvidia customers starting in the fourth quarter. The Blackwell chips are now expected to go into mass production in the fourth quarter, with the servers slated for mass shipment in the subsequent quarters if no further issue arises, they said.
Chip delays aren’t unheard of. Nvidia experienced some delays with earlier versions of its flagship GPU in 2020, according to someone with direct knowledge. But the stakes for Nvidia were lower then, and fewer customers were counting on orders to arrive so they could begin generating revenue from their data center and chip investments.
Still, it is highly unusual to uncover significant design flaws right before mass production. Chip designers typically work with chip makers like TSMC to conduct multiple production test runs and simulations to ensure the viability of the product and a smooth manufacturing process before taking large orders from customers.
It’s also uncommon for TSMC, the world’s largest chipmaker, to halt its production lines and go back to the drawing board with a high-profile product that’s so close to mass production, according to two TSMC employees. TSMC has freed up machine capacity in anticipation of the mass production of GB200s but will have to let its machinery sit idle until the snags are fixed.
The design flaw will also impact the production and delivery of Nvidia’s NVLink server racks because the companies that work on the servers have to wait for a new chip sample before finalizing a server rack design.
Disclaimer: Community is offered by Moomoo Technologies Inc. and is for educational purposes only.
Read more
Comment
Sign in to post a comment
105535782 : ok
Space Dust : thanks . Mr Long Term..
wow..
this may be the pull that takes apart the sweater.
HIGHLY UNUSUAL to have a design flaw like that at this stage?.
how low will ENRON, oops, I mean nvidia , fall.
Space Dust : company is pretty much Taiwan based. and , an enemy couldn't harm America more directly, than having a loaded up company, HUGE CAPITALIZATION, take out the market.
liability. tsmc is down?
did America , through nvidia, both finance new factories Ripe for the picking, seized assets of an enemy style,
simultaneously, have a HUGE company, out of nowhere, DISRUPT THE MARKET, create MASSIVE TURBULENCE, and then, when the time is .
blow it ALL UP..
there is TOO MUCH SUBTERFUGE at play to fully believe any MSM sources..
104166257 : hi
Ramamohan : all good
54088 FROM MBS : vs
jalilah_104793183 : Learning
104309970 : I'm here again Good morning everyone