1.
$Advanced Micro Devices (AMD.US)$ 's answer to $NVDA's CUDA is ROCm. But he thinks there are big differences between these two. CUDA is a mature ecosystem, and most developers know CUDA as it has been around since 2006. CUDA is also much more optimized and stable.
2. $AMD has made significant strides in optimizing ROCm, but it still needs to catch up with CUDA. He also thinks ROCm's disadvantage is its heterogeneous nature of supporting $AMD and other hardware, which adds to the complexity. CUDA, on the other hand, is optimized for $NVDA hardware only.
3. CUDA has an extensive layer of libraries, while ROCm's libraries are still evolving. He also notes that ROCm is not doing well with partnerships compared to CUDA, which has partnerships with hyperscalers.
4. He thinks
$Amazon (AMZN.US)$ AWS's Trainium chip is performance wise between $NVDA H100 and $NVDA A100. He really likes H100 for LLMs; with the introduction of P5 instances, they have reduced training costs by 40%.
5. He sees a few bottlenecks for the industry, including $NVDA:
- Advanced nodes with issues related to yield, defect densities, and overall process stability
- Material limitation, with silicon reaching physical limits now.
- Heat dissipation challenge. He notes that $NVDA is having difficulty balancing power consumption with performance.
- Interconnected latency. While $NVDA has developed NVLink and Infinity Fabric, there are still issues surrounding this topic.
6. In his view, the most exciting developments in the industry are optical interconnects, where people are exploring optical interconnects to replace traditional copper-based interconnects, offering higher bandwidth and lower latency.
FumooFu : Where is this ex-nvidia employee now working in?