When is the release time of ChatGPT, the open source large-scale model? The highly anticipated Llama 3 405B is about to be released.

wallstreetcn · Jul 22 23:46

分析认为，Llama 3 405B不仅仅是人工智能能力的又一次提升，对于开源 AI 来说，“这是一个潜在的 ChatGPT 时刻”。在基准测试中，Meta Llama 3.1 在GSM8K、Hellaswag等多项测试中均优于 GPT-4o。

千呼万唤始出来，原定于23日发布的Llama 3 405B就要来了。

作为Llama 3系列中的顶配，405B版本拥有4050亿个参数，是迄今为止最大的开源模型之一。

昨夜凌晨，META突发Llama 3.1-405B评测数据的泄漏事件，有网友预计可能还会同时发布一个Llama 3.1-70B版本，因为“（模型提前泄露）是META的老传统了，去年的Llama模型就干过一次。”

有分析认为，Llama 3 405B不仅仅是人工智能能力的又一次提升，对于开源AI 来说，“这是一个潜在的ChatGPT时刻”，其中最先进的人工智能真正实现民主化并直接交到开发人员手中。

对即将到来的 Llama 3 405B 公告的三个预测

有分析人士从数据质量、模型生态系统、API解决方案三个角度，预测了即将到来的Llama 3 405B公告中的亮点。

首先，Llama 3 405B或许会彻底改变专用模型的数据质量。

对于专注于构建专业AI模型的开发人员来说，他们面临的长期挑战是获取高质量的训练数据。较小的专家模型（1-10B 个参数）通常利用蒸馏技术，利用较大模型的输出来增强其训练数据集。然而，使用来自OpenAI等闭源巨头的此类数据受到严格限制，限制了商业应用。

Llama 3 405B应运而生。作为一款与专有模型实力相媲美的开源巨头，它为开发人员创建丰富、不受限制的数据集提供了新的基础。这意味着开发人员可以自由使用Llama 3 405B的蒸馏输出来训练小众模型，从而大大加快专业领域的创新和部署周期。预计高性能、经过微调的模型的开发将激增，这些模型既强大又符合开源道德规范。

其次，Llama 3 405B将形成新的模型生态系统：从基础模型到专家组合

Llama 3 405B的推出可能会重新定义AI系统的架构。该模型的庞大规模（4050 亿个参数）可能意味着一种一刀切的解决方案，但真正的力量在于它与分层模型系统的集成。这种方法对于使用不同规模AI的开发人员来说尤其具有共鸣。

预计会转向更具动态的模型生态系统，其中Llama 3 405B充当骨干，由小型和中型模型提供支持。这些系统可能会采用推测解码等技术，其中不太复杂的模型处理大部分处理，仅在必要时调用405B模型进行验证和纠错。这不仅可以最大限度地提高效率，而且还为优化实时应用程序中的计算资源和响应时间开辟了新途径，尤其是在针对这些任务优化的SambaNova RDU上运行时。

最后，Llama 3 405B有最高效 API 的竞争

能力越大，责任越大——对于Llama 3 405B而言，部署是一项重大挑战。开发人员和组织需要谨慎应对模型的复杂性和运营需求。AI云提供商之间将展开竞争，以提供部署Llama 3 405B最高效、最具成本效益的API解决方案。

这种情况为开发人员提供了一个独特的机会，可以与不同的平台互动，比较各种API如何处理如此庞大的模型。这个领域的赢家将是那些能够提供API的人，这些API不仅可以有效地管理计算负载，而且不会牺牲模型的准确性或不成比例地增加碳足迹。

总之，Llama 3 405B不仅仅是AI武器库中的又一个工具；更是向着开放、可扩展和高效的 AI 开发的根本转变。分析认为，无论是在微调小众模型、构建复杂的AI系统还是优化部署策略，Llama 3 405B的到来都将为用户打开新的视野。

网友们怎么看？

网友在LocalLLaMA子Reddit板块中发帖，分享了4050亿参数的Meta Llama 3.1信息，从该AI模型在几个关键AI基准测试的结果来看，其性能超越目前的领先者，即OpenAI的GPT-4o，这标志着开源模型可能首次击败目前最先进的闭源LLM模型。

如基准测试所示，Meta Llama 3.1在 GSM8K、Hellaswag、boolq、MMLU-humanities、MMLU-other、MMLU-stem和 winograd等多项测试中均优于 GPT-4o，但是，它在 HumanEval和 MMLU-social sciences方面却落后于 GPT-4o。

宾夕法尼亚大学沃顿商学院副教授伊桑·莫利克（Ethan Mollick）写道：

如果这些统计数据属实，那么可以说顶级 Al 模型将在本周开始免费向所有人开放。
全球每个国家的政府、组织和公司都可以像其他人一样使用相同的人工智能功能。这会很有趣。

有网友总结了Llama 3.1模型的几个亮点：

模型使用了公开来源的15T+tokens进行训练，预训练数据截止日期为2023年12月；
微调数据包括公开可用的指令微调数据集（与Llama 3不同）和1500万个合成样本；
模型支持多语言，包括英语、法语、德语、印地语、意大利语、葡萄牙语、西班牙语和泰语。

有网友表示，这是首次开源模型超越了GPT4o和Claude Sonnet 3.5等闭源模型，在多个benchmark上达到SOTA。

Analysis suggests that Llama 3 405B is not only another improvement in AI capabilities, but also a potential ChatGPT moment for open source AI. In benchmark tests, Meta Llama 3.1 outperformed GPT-4o in multiple tests such as GSM8K and Hellaswag.

After much waiting, the Llama 3 405B, originally scheduled for release on the 23rd, is finally here. In the product structure, the operating income of 100-300 billion yuan products is 4.01/12.88/0.06 billion yuan respectively, with a total sales volume of 18,000 kiloliters, up 28.10% year-on-year, showing significant growth.

As the top model in the Llama 3 series, the 405B version has 405 billion parameters and is one of the largest open-source models to date.

Last night, a leak of Llama 3.1-405B evaluation data occurred, and some netizens predict that a Llama 3.1-70B version could be released simultaneously, stating that "(early model leaks) is a tradition of META, and the Llama model did it last year."

Some analysts believe that Llama 3 405B is not just another upgrade in artificial intelligence capabilities, but also an opportunity for democratization and placing the most advanced AI directly into the hands of developers, potentially representing a ChatGPT moment for open-source AI.

Three predictions for the upcoming announcement of Llama 3 405B

Analysts have predicted the highlights of the upcoming Llama 3 405B announcement from three perspectives: data quality, model ecosystem, and API solutions.

Firstly, Llama 3 405B may completely change the data quality of specialized models.

For developers who specialize in building professional AI models, a long-term challenge they face is obtaining high-quality training data. Smaller expert models (1-10B parameters) usually use distillation techniques to enhance their training datasets with outputs from larger models. However, the use of such data from closed-source giants such as OpenAI is strictly limited, restricting commercial applications.

Llama 3 405B was created to address this. As an open-source giant with performance comparable to proprietary models, it provides a new foundation for developers to create rich, unrestricted datasets. This means that developers can freely use Llama 3 405B's distilled output to train niche models, greatly accelerating innovation and deployment cycles in professional fields. It is expected that high-performance, fine-tuned models will be developed, which are both powerful and in line with open-source ethical norms.

Secondly, Llama 3 405B will create a new model ecosystem, from basic models to expert combinations.

The release of Llama 3 405B may redefine the architecture of AI systems. The model's huge scale (405 billion parameters) may suggest a one-size-fits-all solution, but the real power lies in its integration with a hierarchical model system. This approach is particularly resonant for developers who use AI of different scales.

It is expected that there will be a shift towards a more dynamic model ecosystem, in which Llama 3 405B acts as the backbone and is supported by small and medium-sized models. These systems may use inference decoding and other techniques, where less complex models handle most of the processing, and calls to the 405B model are made for validation and error correction only when necessary. This can not only maximize efficiency, but also open up new avenues for optimizing computing resources and response times in real-time applications, especially when running on SambaNova RDU optimized for these tasks.

Finally, Llama 3 405B has the potential to compete with the most efficient API solutions.

The bigger the capacity, the greater the responsibility - for Llama 3 405B, deployment is a major challenge. Developers and organizations need to carefully address the complexity and operational requirements of the model. AI cloud providers will compete to provide the most efficient and cost-effective API solutions for deploying Llama 3 405B.

This situation provides a unique opportunity for developers to interact with different platforms and compare how various APIs handle such a large model. The winners in this field will be those who can provide APIs that not only manage computing loads effectively but also do not sacrifice the accuracy of the model or disproportionately increase its carbon footprint.

In conclusion, Llama 3 405B is not just another weapon in the AI arsenal; it represents a fundamental transformation towards open, scalable, and efficient AI development. It is believed that the arrival of Llama 3 405B will open up new horizons for users, whether in fine-tuning niche models, building complex AI systems, or optimizing deployment strategies.

How do netizens view this?

Netizens posted on the LocalLLaMA subreddit sharing information about the 405-billion-parameter Meta Llama 3.1. From the results of several key AI benchmark tests, its performance surpasses that of the current leader, OpenAI's GPT-4o, marking the first time that an open-source model may have beaten the most advanced closed-source LLM model.

As shown by benchmark tests, Meta Llama 3.1 outperforms GPT-4o in multiple tests such as GSM8K, Hellaswag, boolq, MMLU-humanities, MMLU-other, MMLU-stem, and winograd, but lags behind GPT-4o in HumanEval and MMLU-social sciences.

Ethan Mollick, Associate Professor at the Wharton School of Business at the University of Pennsylvania, writes:

If these statistics are true, it can be said that the top Al model will be open to everyone for free starting this week.
Governments, organizations and companies in every country in the world can use the same artificial intelligence functions as others. This will be very interesting.

Some highlights of the Llama 3.1 model were summarized by netizens:

The model was trained using publicly available 15T+ tokens, with a pre-training data cutoff date of December 2023;
Fine-tuning data includes publicly available instruction fine-tuning datasets (different from Llama 3) and 15 million synthetic samples;
The model supports multiple languages, including English, French, German, Hindi, Italian, Portuguese, Spanish and Thai.

Some netizens have expressed that this is the first open-source model to surpass closed-source models such as GPT4o and Claude Sonnet 3.5, achieving SOTA on multiple benchmarks.

Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more

开源大模型的ChatGPT时刻？备受期待的Llama 3 405B即将发布

When is the release time of ChatGPT, the open source large-scale model? The highly anticipated Llama 3 405B is about to be released.

对即将到来的 Llama 3 405B 公告的三个预测

网友们怎么看？

Three predictions for the upcoming announcement of Llama 3 405B

How do netizens view this?

Risk Disclaimer

Statement