share_log

火爆全网!AI新星Groq横空出世,真的能碾压英伟达GPU?

Hot all over the network! With the advent of AI star GroQ, can it really crush Nvidia's GPU?

Gelonghui Finance ·  Feb 20 21:47

Speed is a double-edged sword

The AI community is popular, and the Internet is being screened!

Recently, Groq sparked widespread discussion. Its large model can output 750 tokens per second, 18 times faster than GPT-3.5, and the self-developed LPU inference speed is 10 times faster than Nvidia GPUs.

big

Surprisingly fast

Groq's name is similar to Musk's big model Grok pronunciation. It was founded in 2016 and positioned as an artificial intelligence solutions company.

Groq exploded mainly because of its extremely fast processing speed. According to media reports,The company's chip inference speed is 10 times faster than Nvidia GPUs, and the cost is only 1/10.

The large model is running at a rate of nearly 500 tokens per second, crushing ChatGPT-3.5 at about 40 tokens/second.

At its extreme, Groq's Llama2 7B can even achieve 750 tokens per second, 18 times GPT-3.5.

big

In Groq's founding team, 8 people were from Google's early TPU core design team, but Groq did not choose TPU, GPU, CPU, etc., but instead developed its own language processing unit (LPU).

big

According to Groq's official website, Meta AI's LLama 270B running on the Groq LPU inference engine performs better than all other cloud-based inference providers, and has an 18-fold increase in throughput.

big

Can it replace Nvidia?

However, speed is not the only decisive factor in the development of AI. At the same time that Groq is trending, there are also voices of doubt.

First, Groq just seemed cheap.Groq's LPU card has only 230MB of memory and costs over $20,000.

Some netizens analyzed that the Nvidia H100 should be 11 times more cost-effective than Groq.

big

More importantly,Groq LPU is completely unequipped with high bandwidth memory (HBM),Instead, it is equipped with only a small block of ultra-high speed static random access memory (SRAM), which is 20 times faster than the HBM3.

big

This also means that more GroQ LPUs need to be configured when running a single AI model compared to Nvidia's H200.

Also, according to a Groq employee,GroQ's LLM runs on hundreds of chips.

big

Regarding this, Yao Jinxin, a chip expert at Tencent Technology, believesGroq's chips are currently no replacement for Nvidia.

In his opinion, speed is Groq's double-edged sword. Groq's architecture is built on small memory and large computing power, so the limited amount of processed content corresponds to extremely high computing power, making it very fast.

On the other hand, Groq's extremely high speed is based on the very limited throughput capacity of a single card. To guarantee the same throughput as the H100, more cards are needed.

He analyzed that for the Groq architecture, there are also application scenarios where it shows its strengths, which is great for many scenarios that require frequent data handling.

Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more
    Write a comment