The new era of AI has begun! OpenAI's reasoning model that can 'think and solve problems logically' has appeared.

cls.cn · Sep 12 19:30

①OpenAI o1模型（“草莓”大模型）标志着人工智能在复杂推理任务领域的崭新水平；②通过改变AI模型的行为方式，新模型能够有效提高回答质量，同时避免一些机制性缺陷；③OpenAI首发推出o1-预览版和o1 mini两款模型。

财联社9月13日讯（编辑史正丞）北京时间周五凌晨1时许，AI时代迎来崭新的起点——能够进行通用复杂推理的大模型终于走到台前。

OpenAI在官网发布公告称，开始向全体订阅用户开始推送OpenAI o1预览模型——也就是此前被广泛期待的“草莓”大模型。OpenAI表示，对于复杂推理任务而言，新模型代表着人工智能能力的崭新水平，因此值得将计数重置为1，给它一个有别于“GPT-4”系列的全新名号。

推理大模型的特点，就是AI会在回答之前花更多时间进行思考，就像人类思考解决问题的过程一样。以往的大模型，背后的逻辑是通过学习大量数据集中的模式，来预测单词生成的序列，严格来说并不是真正理解提问。

认知将跃升至“理科博士生水准”

OpenAI曾解释过，2023年发布的GPT-4类似于高中生的智能水平，而GPT-5则是完成AI从“高中生跃升至博士”的成长。这个o1模型就是其中关键的一步。

相较于GPT-4o等现有的大模型，OpenAI o1能够解决更加困难的推理问题，同时改善过往模型中存在的机制性缺陷。

举例而言，这个新模型能够数清楚strawberry里到底有几个“r”。

同时AI在解答编程问题时也会更有条理，在着手写代码前，把整个回答的流程全部思考完一遍，再动手输出代码。

例如在预设条件的写诗任务（例如第二句的最后一个单词需要以i收尾）中，“拿起笔就写”的GPT-4o的确给出了回答，但往往只会满足了一部分条件，同时不会自我纠正。这意味着AI必须在第一次生成时就能碰上正确的答案，否则就一定会出错。但在o1模型中，AI会不断试错并打磨答案，从而显著提高生成结果的准确率和质量。

有趣的是，点开AI思考的过程，还会出现AI表示“我在思考这个事情这么做行不行”、“啊时间不够了得尽快给出答案”等。OpenAI确认，这里展示的并不是原始的思维链，而是“模型生成的摘要”，公司也坦率承认这里有保持“竞争优势”的因素。

OpenAI的研究负责人Jerry Tworek透露，o1模型背后的训练与之前的产品有着根本性的区别。之前的GPT模型旨在模仿其训练数据中的模式，而o1的训练旨在让其独立解决问题。在强化学习的过程中，使用奖励和惩罚机制来“教育”AI使用“思维链”来处理问题，就像人类习得拆解、分析问题的方式一样。

根据测试，o1模型在国际数学奥林匹克的资格考试中，能够拿到83%的分数，而GPT-4o只能正确解决13%的问题。而在编程能力比赛Codeforces中，o1模型拿到89%百分位的成绩，而GPT-4o只有11%。

（图上显示，o1模型预览版的能力会比正式版低一截）

OpenAI表示，根据测试，在下一个更新的版本中，AI在物理、化学和生物学的挑战性基准测试中，表现能够与博士生水平类似。

该讲讲缺点和局限性了

不难理解，会自己思考问题的AI模型，对于程序员、创意工作者，以及几乎所有的理科相关专业工作者而言是有益的升级，但这个新模型也有局限性。

首先，OpenAI o1模型（至少目前）还不是多模态大模型，同时在回答事实性问题时也不如其他模型。所以在图像互动、常识问答、互联网搜索方面，GPT-4o依然是更胜一筹的选择。当然，OpenAI明确表示未来会给这个模型增加联网、文件和图像上传等功能。

另一个问题则是贵，而且是非常贵。o1-预览模型的定价是每百万个输入tokens 15美元，每百万个输出tokens 则要60美元，分别是GPT-4o的3倍和4倍。一百万tokens大致相当于75万个英文单词。

除了OpenAI o1-预览版外，OpenAI也同步推出了o1-mini模型。后者是一个更快、更便宜的模型，定价也会便宜80%，适用于需要推理但不需要广泛世界知识的场景。

而且从OpenAI“抠抠索索”的举动来看，这个推理模型恐怕非常消耗算力。公司宣布，从9月12日起，ChatGPT订阅用户可以访问这两个新模型，但目前o1-preview每周消息数量限制为30条，o1-mini则为50条。

企业版ChatGPT和教育用户可以从下周开始访问这两个模型。API使用等级达到5级的开发者可以立即开始使用这两个模型，每分钟的速率限制为20次。OpenAI未来准备向免费用户提供o1-mini模型，但目前还没有时间表。

编辑/Somer

① The OpenAI o1 model (“strawberry” large model) marks a new level of artificial intelligence in the field of complex inference tasks; ② By changing the behavior of the AI model, the new model can effectively improve the quality of responses while avoiding some mechanical flaws; ③ OpenAI first launched the O1-preview and O1 mini models.

Financial Services Association, September 13 (Editor Shi Zhengcheng) At around 1 a.m. Beijing time on Friday, the AI era ushered in a fresh start — big models capable of general and complex reasoning finally came to the front of the stage.

OpenAI announced on its official website that it has begun to push the OpenAI o1 preview model to all subscribers — that is, the “strawberry” big model that has been widely anticipated before. OpenAI said that for complex inference tasks, the new model represents a new level of artificial intelligence capability, so it is worth resetting the count to 1 and giving it a new name different from the “GPT-4” series.

The characteristic of the big inference model is that AI takes more time to think before answering, just like a human thinking and solving a problem. The logic behind the big models of the past was to predict the sequence of word generation by learning patterns in large data sets. Strictly speaking, it didn't really understand the question.

(Clearly perceptible “thinking” process, source: OpenAI)

Perception will leap to “the level of a doctoral student in science”

OpenAI has explained that GPT-4 released in 2023 is similar to the intelligence level of high school students, while GPT-5 completes the growth of AI from a “high school student to a PhD.” This o1 model is a critical step in that.

Compared to existing large models such as GPT-4o, OpenAI o1 can solve more difficult inference problems while improving the mechanical flaws in previous models.

For example, this new model can count exactly how many “r's” are there in strawberry.

At the same time, AI will also be more organized when answering programming questions. Before starting to write the code, it will think through the entire answer process and then output the code by hand.

For example, in a pre-set poetry writing task (for example, the last word in the second sentence needs to end with i), GPT-4o, which “pick up the pen and write”, did give an answer, but often only met some of the conditions and did not correct itself. This means that the AI must be able to run into the right answer the first time it is generated, otherwise it will definitely make an error. However, in the O1 model, AI will continuously try and error and polish the answers, thereby significantly improving the accuracy and quality of the generated results.

What's interesting is that when you click on the AI thinking process, AI will also appear saying “I'm thinking about this is okay to do this” and “ah, I don't have enough time to give an answer as soon as possible.” OpenAI confirmed that what is shown here is not an original thought chain, but a “summary of model generation,” and the company also frankly admits that there are factors that maintain a “competitive advantage” here.

Jerry Tworek, head of research at OpenAI, revealed that the training behind the O1 model is fundamentally different from previous products. The previous GPT model was designed to mimic the pattern in its training data, while o1's training was designed to allow it to solve problems independently. In the process of reinforcement learning, reward and punishment mechanisms are used to “teach” AI to use “thought chains” to handle problems, just as humans learn how to disassemble and analyze problems.

According to tests, the O1 model was able to get 83% of the points in the International Mathematical Olympiad qualifying exam, while GPT-4o only correctly solved 13% of the problems. In the Codeforces programming ability competition, the O1 model scored 89% of the percentile, while GPT-4o was only 11%.

(The picture shows that the capabilities of the O1 model preview version will be a bit lower than the official version)

OpenAI said that according to tests, in the next updated version, AI can perform similarly to the level of doctoral students in challenging benchmarks in physics, chemistry, and biology.

Time to talk about shortcomings and limitations

Understandably, an AI model that can think for itself is a useful upgrade for programmers, creative workers, and almost all science-related professionals, but this new model also has limitations.

First, the OpenAI o1 model (at least for now) isn't a multi-modal big model, and it's not as good as other models when it comes to answering factual questions. Therefore, in terms of image interaction, general knowledge quiz, and internet search, GPT-4o is still a better choice. Of course, OpenAI has made it clear that in the future it will add features such as networking, file and image upload to this model.

The other problem is that it's expensive, and it's very expensive. The o1-preview model is priced at $15 per million input tokens, and $60 per million output tokens, which is 3 and 4 times GPT-4o, respectively. One million tokens is roughly equivalent to 0.75 million English words.

In addition to the OpenAI O1-preview, OpenAI also launched the O1-mini model at the same time. The latter is a faster, cheaper model that is also 80% cheaper, and is suitable for scenarios that require reasoning but don't require extensive knowledge of the world.

Also, judging from OpenAI's “fumbling” behavior, I'm afraid this inference model consumes a lot of computing power. The company announced that from September 12, ChatGPT subscribers will have access to these two new models, but currently o1-preview is limited to 30 messages per week, while o1-mini is 50.

Both models can be accessed by enterprise ChatGPT and education users starting next week. Developers who have reached API usage level 5 can immediately start using these two models, with a rate limit of 20 times per minute. OpenAI is preparing to make the O1-mini model available to free users in the future, but there is no timeline yet.

Editor/Somer

Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more

AI新时代揭幕！会“思考解题逻辑”的OpenAI推理大模型登场

The new era of AI has begun! OpenAI's reasoning model that can 'think and solve problems logically' has appeared.

认知将跃升至“理科博士生水准”

该讲讲缺点和局限性了

Perception will leap to “the level of a doctoral student in science”

Time to talk about shortcomings and limitations

Risk Disclaimer

Statement