Why Li Auto is revving up its smart driving efforts, chasing Tesla
Just a day before the China Auto Chongqing Forum (CACS), Li Auto CEO Li Xiang made a spontaneous decision to change his speech focus from artificial intelligence to autonomous driving. During the forum, Li emphasized that future autonomous driving will emulate human capabilities, including quick response times and logical reasoning to handle complex scenarios.
Subsequent events offer some insight on the last-minute change . A month later, Li Auto revealed an end-to-end system enhanced by a vision language model (VLM).
Unlike its domestic rivals using a segmented approach, Li Auto's solution resembles Tesla's "One Model" design more closely.
Historically seen as a follower in smart driving, Li Auto frequently shifted its approach last year amid fierce competition, first relying on high-definition maps, then lightweight maps, before discarding the maps altogether.
In a recent interview with 36Kr, Lang Xianpeng, vice president of smart driving R&D, and Jia Peng, head of smart driving technology R&D, discussed Li Auto's ongoing journey. Reflecting on efforts to catch up, Lang surmised that the core principle is to identify the essence of the problem and then decisively and quickly correct the course.
Choosing the end-to-end (E2E) technical route is an extension of this principle. Lang explained that previous smart driving solutions have fundamentally been map-based. They followed the conventional process of managing control with perception, with upstream flaws requiring downstream adjustments, demanding significant investment.
However, the core issue isn't just resource allocation but the fact that rule-based smart driving has a ceiling and is unlikely to fully emulate human driving.
Combining E2E with a VLM and what Li Auto calls a "world model" is the optimal paradigm that the automaker has now arrived upon.
In simple terms, Li Auto's approach eliminates the previously separate modules for perception, prediction, planning, and control, merging them into a single neural network.
The VLM acts as a plugin similar to ChatGPT for the system. While the E2E system's behaviour depends on the data it receives, the VLM provides cognitive and logical reasoning capabilities. In complex scenarios, the system can query the VLM in real-time for driving assistance.
The world model serves as a massive problem set, generating simulated data through reconstruction and production methods, combined with real-world cases previously accumulated by Li Auto to create a mix of real and simulated test scenarios to challenge the E2E model. Only models that score well on these tests are released to users.
Internally, these three models are referred to as Systems 1, 2, and 3. System 1 corresponds to the brain's immediate thought mode, System 2 to logical thinking, and System 3 acts as an examination model, assessing the learning outcomes of systems 1 and 2.
E2E smart driving technology was initiated by Tesla. In Aug 2023, CEO Elon Musk demonstrated the capabilities of its Full Self-Driving (FSD) tech in its V12 version during a live stream, which has now iterated to V12.5. However, unlike Tesla, Li Auto incorporates the VLM capability in addition to the E2E and world models.
Jia Peng explained to 36Kr that he spent a week each in the west and east of the US testing Tesla's FSD and found that even E2E tech has limitations. In complex road conditions on the East Coast, such as New York and Boston, Tesla's takeover rate significantly increased. Jia said that the parameters of the E2E model that can run on HW3.0 are not particularly large, and the model capacity has a natural upper limit.
Li Auto's design of the VLM role aims to enhance the system's upper limit. The VLM can learn from various scenarios, such as bumpy roads, schools, construction zones, and roundabouts, providing crucial decision-making support.
Lang and Jia both believe that the VLM is a significant variable in Li Auto's smart driving system. With parameters already reaching 2.2 billion and a response time of 300 milliseconds, if equipped with more powerful chips, the VLM's deployable parameters could reach hundreds of billions, marking the best path toward Level 3 and 4 autonomous driving.
"VLM itself is also following the development of large language models (LLMs), and no one can yet answer how large the parameter count will eventually be," Jia said.
Based on the trajectory of data-driven, vision language models, it seems that the smart driving industry is now part of the computational power contest initiated by companies like OpenAI, Microsoft, and Tesla.
Lang candidly stated that, at this stage, the competition is all about the quantity and quality of data and computational power reserves. High-quality data relies on an absolute data scale - supporting Level 4 model training requires about tens of EFLOPS of computational power.
"No company without a net profit of USD 1 billion can afford future autonomous driving," Lang asserted.
Currently, Li Auto's cloud computing power is 4.5 EFLOPS, rapidly closing the gap with leading companies like Huawei. According to 36Kr, Li Auto has recently bought a large number of Nvidia's cloud chips, purchasing almost all available stock from distributors.
CEO Li is well aware of this competitive landscape, leveraging resources and intelligent technology to outpace competitors. He often asks Lang if there's sufficient computational power, and if not, to source more from Xie Yan, the company's CTO.
With cars and more money than other players, Li Auto has an opportunity to widen the gap on this path. Financial reports show that, as of the first quarter of this year, Li Auto's cash reserves are close to RMB 99 billion (USD 13.8 billion).
Data from Li Auto also indicates that the commercial loop of smart driving is starting to take shape. In early July, Li Auto began delivering the 6.0 smart driving version, which can operate nationwide, to Max model drivers. Lang observed that the proportion of Max models quickly surpassed 50%, with over 10% growth each month.
Lang also understands that, although the long-term vision of Level 4 autonomous driving is becoming clearer, its implementation path remains unchanged. "We need to quickly help the company sell cars. Only by selling cars can we afford to buy chips to train smart driving.”
If smart driving is the decisive factor in the automaking race, it represents a fiercely competitive and resource-intensive arena. Li Auto has proactively prepared by integrating top-level strategic planning, technical advancements, and substantial resource investments. But what about the others?
Disclaimer: Community is offered by Moomoo Technologies Inc. and is for educational purposes only.
Read more
Comment
Sign in to post a comment