Kimihiko commented on a stock · Dec 19, 2024 06:51

Apple partners with Nvidia to speed up token generation

Apple, the world's largest company (NASDAQ: AAPL ) and Nvidia (NASDAQ: NVDA ), both companies announced today that they have cooperated to speed up large-scale language model inference for Nvidia GPUs through a method called recurrent drafter (redrafter).
RedRafter uses a recursive neural network (RNN) draft model and “combines beam search with dynamic tree attention to speed LLM token generation up to 3.5 tokens per open source model generation step, surpassing the performance of conventional speculative decoding techniques.”AppleI mentioned it in today's blog post.
Apple collaborated with Nvidia to make Redrafter Nvidia'sTensorRT-LLMIt has been integrated into. According to Nvidia, this will make it accessible to the broader developer community.
“When we benchmarked tens of billions of parameter generation models on NVIDIA GPUs and used the NVIDIA TensorRT-llm inference acceleration framework with Redrafter, we were able to increase the speed of tokens generated per second with greedy decoding by 2.7 times,” Apple said. “These benchmark results show that this technology can significantly reduce latency users may experience while also reducing the number of GPUs used and power consumption.”
“With this collaboration between Nvidia and Apple, TensorRT-LLM is more powerful and flexible, and the LLM community can innovate more sophisticated models and easily deploy them with TensorRT-LLM to achieve unmatched performance on NVIDIA GPUs.”NvidiaI mentioned it.

Disclaimer: Community is offered by Moomoo Technologies Inc. and is for educational purposes only. Read more

See Original

Report

19K Views

Comment

パーマン6号 : The strongest tag team is born

Report
Reply

Cancel Reply
くどうのぶ : Strong.

Report
Reply

Cancel Reply

Kimihiko

日興證券 HSBC証券 2社の証券会社の設立などの証券会社での勤務

1754Followers

1126Following

6639Visitors