share_log

Dojo--马斯克的“无人驾驶”豪赌

Dojo - Musk's bet on self-driving cars

wallstreetcn ·  Aug 4 19:01

According to the technology media TechCrunch, the core of Dojo's plan is Tesla's proprietary D1 chip, which means that Tesla may not have to rely on Nvidia's chips in the future and can obtain a large amount of computing power at low cost. It is expected that by the end of this year, Dojo1 will achieve online training equivalent to about 8,000 H100s.

The importance of Dojo supercomputer is growing day by day.$Tesla (TSLA.US)$The importance of it is increasing day by day.

Dojo is not just a supercomputer used by Tesla to train its autonomous driving models in the cloud. In fact, it has become the cornerstone of Musk's AI business empire.

Goldman Sachs even compared Dojo to "Tesla's AWS" and believes it will be Tesla's biggest value driver in the future.

On Saturday morning, TechCrunch journalist Rebecca Bellan published an in-depth report titled "Tesla Dojo: Elon Musk's Big Plan to Build an AI Supercomputer, Explained". Starting from Dojo, the article explains Musk's AI plan in detail.

Highlights of the article include:

1. Tesla's reliance on a supercomputer is mainly due to its pure visual path (capturing data solely through cameras rather than sensors).

2. Tesla's goal is to achieve a "half-Tesla AI hardware, half-NVIDIA/other" configuration in the next 18 months. The "other" may be an AMD chip.

3. The core of the Dojo plan is Tesla's proprietary D1 chip, which means Tesla may not have to rely on NVIDIA chips in the future and can access a large amount of computing power at low cost.

4. The Dojo chip is Tesla's insurance policy and could bring dividends.

5. It is expected that by October of this year, the total computing power of Dojo will reach 100 exaflops, which is equivalent to the computing power of about 320,500 NVIDIA A100 GPUs. By the end of this year, Dojo 1 is expected to achieve online training equivalent to about 8,000 H100s.

The full article is as follows:

For years, Elon Musk has been talking about Dojo - the artificial intelligence supercomputer that will become the cornerstone of Tesla's AI ambitions. The project is very important to Musk. He recently said that as Tesla prepares to unveil its robotaxi in October, the company's AI team will "double down" on the Dojo project. But what exactly is Dojo? Why is it so critical to Tesla's long-term strategy? In short, Dojo is a custom-built supercomputer by Tesla aimed at training its neural networks for "full self-driving". Improving Dojo is closely related to Tesla's goal of achieving full automation and pushing robotaxis to the market. FSD is currently available on about 2 million Tesla vehicles and can perform some automated driving tasks, but humans still need to maintain focus from the driver's seat. Tesla has delayed the announcement of its robotaxi from August to October, but both Musk's public comments and internal sources at Tesla tell us that the goal of autonomous driving has not disappeared. Tesla seems to be preparing to invest heavily in AI and Dojo to achieve this feat.

But what exactly is Dojo? Why is it so critical to Tesla's long-term strategy?

In short, Dojo is a supercomputer customized by Tesla aimed at training its 'full self-driving' neural network. Improving Dojo is closely related to Tesla's goal of realizing full self-driving and pushing robotaxis to the market. FSD is currently on about 2 million Tesla vehicles and can perform some automated driving tasks, but still requires human attention in the driver's seat.

Tesla's announcement of its robotaxi was postponed from August to October. But both Musk's public remarks and internal sources at Tesla tell us that the goal of autonomous driving has not disappeared.

Tesla seems to be preparing to invest heavily in artificial intelligence and Dojo to achieve this feat.

The story behind Tesla's Dojo.

Musk does not want Tesla to be just an auto manufacturer, nor just a provider of solar panels and energy storage systems. Instead, he wants Tesla to be an AI company, one that cracks the autonomous driving car code by imitating human perception. Most other companies that develop autonomous driving car technology rely on a combination of sensors (such as lidar, radar, and cameras) and high-definition maps to perceive the world. Tesla believes it can capture visual data solely through cameras, and then process this data using advanced neural networks to quickly decide how the car should behave.

Most other companies developing autonomous driving technology rely on a combination of sensors to perceive the world (such as lidar, radar, and cameras) and high-definition maps to locate vehicles. Tesla believes that it can capture visual data using only cameras, process the data using advanced neural networks, and quickly decide how the car should behave.

As Tesla's former AI chief Andrej Karpathy said at the company's first AI Day in 2021, the company is essentially trying to "build a synthetic biology from scratch". (Musk has been predicting Dojo since 2019, but Tesla officially announced it at AI Day.)

Companies like Alphabet's Waymo have already commercialized level 4 autonomous driving cars through more traditional sensor and machine learning approaches-SAE defines them as systems that can self-driving under specific conditions without human intervention. Tesla has yet to produce an autonomous driving system that does not require human involvement.

About 1.8 million people have paid a high subscription fee for Tesla's FSD, which currently costs $8,000 and was priced as high as $15,000. The selling point is that the AI software trained by Dojo will eventually be pushed to Tesla customers through over-the-air updates. The scale of FSD also means that Tesla has been able to collect millions of miles of video footage for FSD training. This means that the more data Tesla can collect, the closer the automaker gets to achieving true autonomous driving.

However, some industry experts say that the method of simply feeding more data into the model and expecting it to become smarter may have limitations.

"First, there are economic constraints, and this quickly becomes too expensive," said Anand Raghunathan, professor of Silicon Valley electronics and computer engineering at Purdue University, to TechCrunch. He further stated, "There are voices that say we may actually run out of meaningful data to train the model. More data doesn't necessarily mean more information, so it depends on whether those data contain useful information to create a better model and whether the training process can really refine that information into a better model."

Raghunathan said that despite these concerns, data seems to be more abundant in the short term. More data means that more computing power is needed to store and process it to train Tesla's AI model. This is where Dojo, the supercomputer, comes in.

What is a supercomputer?

Dojo is a supercomputer system designed by Tesla for artificial intelligence, specifically for training FSD. The name is a tribute to martial arts practice dojos.

Supercomputers are made up of thousands of small computers called nodes. Each node has its own CPU (central processing unit) and GPU (graphics processing unit). The former is responsible for overall node management, while the latter handles complex things like dividing tasks into multiple parts and processing them simultaneously. GPUs are crucial for machine learning operations, just as they support FSD training simulations. They also support large language models, which is why the rise of generative AI has made Nvidia the most valuable company on earth.

Even Tesla buys Nvidia GPUs to train its AI (this is a story for later).

Why does Tesla need a supercomputer?

Tesla's pure visual path is the main reason why it needs a supercomputer. The neural network behind FSD is trained on a large amount of driving data to identify and classify objects around the vehicle and then make driving decisions. This means that when FSD starts, the neural network must continuously collect and process visual data at a speed that matches human depth and speed recognition.

In other words, Tesla wants to create a digital version of the human visual cortex and brain function.

To achieve this goal, Tesla needs to store and process all the video data collected from cars around the world and run millions of simulations to train the data on its model.

It seems that Tesla relies on Nvidia to power its current Dojo training computers, but it doesn't want to put all its eggs in one basket-especially since Nvidia chip prices are expensive. Tesla also hopes to make better things, increase bandwidth and reduce latency. That's why the automaker's AI department decided to present its own custom hardware plan, which aims to train AI models more efficiently than traditional systems.

The core of this plan is Tesla's proprietary D1 chip, which the company says has been optimized for AI workloads.

More about these chips

Tesla and Apple share a similar view that hardware and software should be designed to work together. That's why Tesla is working to get rid of standard GPU hardware and design its own chips to power Dojo.

Tesla showcased its D1 chip, a palm-sized silicon block, at its 2021 AI Day. As of May this year, the D1 chip has been put into production. Taiwan Semiconductor Manufacturing Company is using a 7-nanometer process to manufacture these chips for Tesla. According to Tesla, the D1 has 50 billion transistors and a large size of 645 square millimeters, all of which means that D1 promises to be very powerful and efficient, and can handle complex tasks quickly.

"We can do both computation and data transfer at the same time. Our custom ISA (instruction set architecture) is fully optimized for machine learning workloads," said Ganesh Venkataramanan, former senior director of Tesla's autopilot hardware, at Tesla AI Day 2021. "This is a pure machine learning machine."

However, the D1 chip is still not as powerful as Nvidia's A100 chip, which is also manufactured by TSMC using 7-nanometer process. With 54 billion transistors and a size of 826 square millimeters, the A100 slightly outperforms Tesla's D1 in performance.

To achieve higher bandwidth and computing power, Tesla's AI team combines 25 D1 chips together to form a block, serving as a unified computer system. Each block has a computing power of 9 petaflops and a bandwidth of 36 TB per second, and contains all the hardware required for power, cooling, and data transfer. You can imagine this block as a self-sufficient computer consisting of 25 mini computers. Six such blocks make up a rack, and two racks make up a cabinet. Ten cabinets constitute one ExaPOD. At AI Day 2022, Tesla stated that Dojo will be expanded by deploying multiple ExaPODs. All of this together makes up a supercomputer.

Tesla is also developing the next-generation D2 chip to address information flow bottleneck issues. Instead of connecting various chips, the D2 places the entire Dojo block on a single silicon chip.

Tesla has not yet confirmed how many D1 chips it has ordered or expects to receive, nor has it provided a timetable for running the Dojo supercomputer on D1 chips.

In a post on X in June, it read:"Elon is building a giant GPU cooler in Texas" To this, Musk replied that Tesla's goal is to achieve "half Tesla AI hardware and half Nvidia/other" in the next 18 months. According to Musk's comments in January, "other" may be an AMD chip.

What does Dojo mean for Tesla?

Controlling its own chip production means Tesla may one day be able to add significant computing power to its AI training projects at a low cost, especially as Tesla and TSMC expand chip production. This also means Tesla may not have to rely on Nvidia's chips in the future, which are becoming increasingly expensive and difficult to secure.

In Tesla's Q2 earnings call, Musk said demand for Nvidia hardware was "so high that it's often difficult to get a GPU." He said he was "pretty concerned" about being able to obtain GPUs stably when needed, "so I think this requires us to put more effort into Dojo to make sure that we have the compute capacity that we need."

However, Tesla is still buying Nvidia's chips to train its AI today. In a post on X in June, Musk said:"Of the approximately $10 billion in AI-related spending that Tesla will do this year, about half is internal, primarily Tesla designed AI inference computers and sensors present in all cars, plus Dojo. Nvidia hardware accounts for approximately 2/3 of the cost for building the AI training supercluster. My best guess for how much Nvidia hardware Tesla will buy this year is $3 billion to $4 billion."

That being said, Tesla is still purchasing Nvidia chips to train its AI. In June, Musk posted on X:

'Of the approximately $10B in capex spending that Tesla expects to spend this year, a little over 1/2 is for building new factories and an estimated $2-3B for increasing production at existing factories. The remaining $4-$5B is incremental to what Tesla would normally spend in a given year, and is being only slightly reallocated to the Dojo/AI effort. Dojo hardware is only a small part of the cost; our primary expense is building a massive supercomputer. For training, a high-end GPU can still be better than an ASIC. Also, a lot of software optimization is needed.'

Inference computing refers to the AI computations that Tesla cars execute in real time, separate from the training computations handled by Dojo.

Dojo is a risky bet, and Musk has hedged it by repeatedly saying that Tesla may not succeed with it.

In the long run, Tesla could theoretically create a new business model based on its AI division. Musk has said that the first version of Dojo will be specifically tailored for Tesla's computer vision tagging and training, which is beneficial for FSD and training Optimus (Tesla's humanoid robot), but not much use for anything else.

Musk has said that later versions of Dojo will be more geared toward general AI training. A related potential problem is that nearly all existing AI software is written for GPUs. Using Dojo to train generalized AI models would require rewriting software.

Unless Tesla rents out its computing power, similar to how AWS and Azure rent cloud computing power, Musk also pointed out at the Q2 earnings call that he saw "a path to competing more directly with Nvidia through Dojo."

Morgan Stanley predicted in a September 2023 report that Dojo could increase Tesla's market cap by $500 billion by unlocking new revenue streams for robotaxis and software services.

In short, Dojo chips are Tesla's insurance policy and could pay dividends for the automaker.

How is Dojo progressing?

According to Reuters' report last year, Tesla began production of Dojo in July 2023, but Musk hinted in an article in June that Dojo had been "online and running useful tasks for months."

Around the same time, Tesla announced that Dojo is expected to become one of the five most powerful supercomputers by February 2024, which has not been publicly disclosed and makes us wonder if it has happened.

The company also expects the total computing power of Dojo to reach 100 exaflops by October 2024. (1 exaflop is equivalent to 10 million trillion computer operations per second. Assuming a D1 can achieve 362 teraflops, Tesla will require more than 276,000 D1s or about 320,500 Nvidia A100 GPUs.)

In January 2024, Tesla also pledged to invest $0.5 billion to build a Dojo supercomputer in its Buffalo, New York super factory.

In May 2024, Musk pointed out that the back of Tesla's Austin super factory would be reserved for a "super-dense water-cooled supercomputer cluster."

Just after Tesla's second-quarter earnings conference call, Musk posted on X that the automaker's AI team is using Tesla's HW4 AI computer (renamed AI4), hardware in Tesla cars, in the training loop of Nvidia GPUs. He pointed out that it's about 90,000 Nvidia H100s plus 40,000 AI4 computers. "Dojo1 will achieve online training equivalent to about 8,000 H100s by the end of this year. Not a lot, but not a few," he continued.

"Dojo1 will achieve online training equivalent to about 8,000 H100s by the end of this year. Not a lot, but not a few," he continued.

Editor/Somer

Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more
    Write a comment