SharesGrow com
commented on
$NVIDIA (NVDA.US)$
These modifications go far beyond standard CUDA-level development, but they are notoriously difficult to maintain. Therefore, this level of optimization reflects the exceptional skill of DeepSeek's engineers from TOP Chinese Universities. The global GPU shortage, amplified by U.S. restrictions, has compelled companies like DeepSeek to adopt innovative solutions, and DeepSeek has m...
These modifications go far beyond standard CUDA-level development, but they are notoriously difficult to maintain. Therefore, this level of optimization reflects the exceptional skill of DeepSeek's engineers from TOP Chinese Universities. The global GPU shortage, amplified by U.S. restrictions, has compelled companies like DeepSeek to adopt innovative solutions, and DeepSeek has m...
3
9
$NVIDIA (NVDA.US)$
The remarkable low cost of DeepSeek V3 is not entirely due to engineering innovation, but is built on an important but easily overlooked foundation: it is not a model trained from scratch. V3 adopts the "knowledge distillation" technology, proposed by Professor Hinton in 2015, allowing a powerful model (teacher model) to impart knowledge to a smaller new model (student model), significantly reducing the resources and time needed for training.
In the case of DeepSeek V3, the application of this technology is manifested as:
As a teacher model, DeepSeek R1 was already released in November 2023.
V3 inherits R1's reasoning ability through knowledge distillation, especially in the fields of mathematics and programming.
This technology roadmap allows a large number of parameters to be directly inherited without the need to train from scratch.
Many key hyperparameter tuning processes can be omitted.
Therefore, when we discuss the training cost of V3, we cannot just look at the surface numbers. It's like calculating the construction cost of a building, if there is already a complete foundation and framework, it will definitely be much cheaper than starting from scratch.
The remarkable low cost of DeepSeek V3 is not entirely due to engineering innovation, but is built on an important but easily overlooked foundation: it is not a model trained from scratch. V3 adopts the "knowledge distillation" technology, proposed by Professor Hinton in 2015, allowing a powerful model (teacher model) to impart knowledge to a smaller new model (student model), significantly reducing the resources and time needed for training.
In the case of DeepSeek V3, the application of this technology is manifested as:
As a teacher model, DeepSeek R1 was already released in November 2023.
V3 inherits R1's reasoning ability through knowledge distillation, especially in the fields of mathematics and programming.
This technology roadmap allows a large number of parameters to be directly inherited without the need to train from scratch.
Many key hyperparameter tuning processes can be omitted.
Therefore, when we discuss the training cost of V3, we cannot just look at the surface numbers. It's like calculating the construction cost of a building, if there is already a complete foundation and framework, it will definitely be much cheaper than starting from scratch.
Translated
1
Link: Fair Price DBS = $70
According to my analysis in September 2024, the sum of the next few years earnings and dividends remain intact, hence the fair price $70
According to my analysis in September 2024, the sum of the next few years earnings and dividends remain intact, hence the fair price $70
6
1
SharesGrow com
Set a live reminder
$Microsoft (MSFT.US)$ Microsoft Q1 2025 earnings conference call is scheduled for October 30 at 5:30 PM ET /October 31 at 5:30 AM SGT /October 31 8:30 AM AEST. Subscribe to join the live earnings conference with management NOW!
Beat or Miss?
What do you expect from Microsoft's Q1 earnings? Will the company beat or miss the estimates? Make sure to click the "Book" button to get what managements have to say!
Disclaimer:
This presentation is for informatio...
Beat or Miss?
What do you expect from Microsoft's Q1 earnings? Will the company beat or miss the estimates? Make sure to click the "Book" button to get what managements have to say!
Disclaimer:
This presentation is for informatio...
![](https://usliveimg.moomoo.com/live_client/77777055/20241025/1436e4ad90d126a76ef043d384078fdb.png/thumb?area=100&is_public=true)
Microsoft Q1 2025 earnings conference call
Oct 31 05:30
14
SharesGrow com OP : 3/44=6.8%
DBS pays 60c regular plus 15c special dividends for next few years to distribute its excessive assets by the way of special dividends.