Kimihiko
commented on a stock · Dec 19, 2024 06:51
Apple partners with Nvidia to speed up token generation
Apple, the world's largest company (NASDAQ: AAPL ) and Nvidia (NASDAQ: NVDA ), both companies announced today that they have cooperated to speed up large-scale language model inference for Nvidia GPUs through a method called recurrent drafter (redrafter).RedRafter uses a recursive neural network (RNN) draft model and “combines beam search with dynamic tree attention to speed LLM token generation up to 3.5 tokens per open source model generation step, surpassing the performance of conventional speculative decoding techniques.”AppleI mentioned it in today's blog post.Apple collaborated with Nvidia to make Redrafter Nvidia'sTensorRT-LLMIt has been integrated into. According to Nvidia, this will make it accessible to the broader developer community.“When we benchmarked tens of billions of parameter generation models on NVIDIA GPUs and used the NVIDIA TensorRT-llm inference acceleration framework with Redrafter, we were able to increase the speed of tokens generated per second with greedy decoding by 2.7 times,” Apple said. “These benchmark results show that this technology can significantly reduce latency users may experience while also reducing the number of GPUs used and power consumption.”“With this collaboration between Nvidia and Apple, TensorRT-LLM is more powerful and flexible, and the LLM community can innovate more sophisticated models and easily deploy them with TensorRT-LLM to achieve unmatched performance on NVIDIA GPUs.”NvidiaI mentioned it.
Disclaimer: Community is offered by Moomoo Technologies Inc. and is for educational purposes only.
Read more13
パーマン6号 : The strongest tag team is born
くどうのぶ : Strong.