Account Info
Log Out
English
Back
Log in to access Online Inquiry
Back to the Top

Meta reports that H100 breaks down once every 3 hours

$NVIDIA(NVDA.US)$While Meta was training a large-scale language model for Llama 3, it became clear that it was suffering from frequent H100 GPU failures. During training with 16,384 H100 80GB GPUs, unexpected component failures occurred at an average rate of once every 3 hours. More than half of the alarming frequency of failures are due to GPU or memory.
Disclaimer: Community is offered by Moomoo Technologies Inc. and is for educational purposes only. Read more
5
2
3
+0
2
See Original
Report
11K Views
Comment
Sign in to post a comment
60Followers
1Following
100Visitors
Follow