share_log

AI新时代揭幕!会“思考解题逻辑”的OpenAI推理大模型登场

The new era of AI has begun! OpenAI's reasoning model that can 'think and solve problems logically' has appeared.

cls.cn ·  Sep 12 19:30

① The OpenAI o1 model (“strawberry” large model) marks a new level of artificial intelligence in the field of complex inference tasks; ② By changing the behavior of the AI model, the new model can effectively improve the quality of responses while avoiding some mechanical flaws; ③ OpenAI first launched the O1-preview and O1 mini models.

Financial Services Association, September 13 (Editor Shi Zhengcheng) At around 1 a.m. Beijing time on Friday, the AI era ushered in a fresh start — big models capable of general and complex reasoning finally came to the front of the stage.

OpenAI announced on its official website that it has begun to push the OpenAI o1 preview model to all subscribers — that is, the “strawberry” big model that has been widely anticipated before. OpenAI said that for complex inference tasks, the new model represents a new level of artificial intelligence capability, so it is worth resetting the count to 1 and giving it a new name different from the “GPT-4” series.

The characteristic of the big inference model is that AI takes more time to think before answering, just like a human thinking and solving a problem. The logic behind the big models of the past was to predict the sequence of word generation by learning patterns in large data sets. Strictly speaking, it didn't really understand the question.

(Clearly perceptible “thinking” process, source: OpenAI)
(Clearly perceptible “thinking” process, source: OpenAI)

Perception will leap to “the level of a doctoral student in science”

OpenAI has explained that GPT-4 released in 2023 is similar to the intelligence level of high school students, while GPT-5 completes the growth of AI from a “high school student to a PhD.” This o1 model is a critical step in that.

Compared to existing large models such as GPT-4o, OpenAI o1 can solve more difficult inference problems while improving the mechanical flaws in previous models.

For example, this new model can count exactly how many “r's” are there in strawberry.

At the same time, AI will also be more organized when answering programming questions. Before starting to write the code, it will think through the entire answer process and then output the code by hand.

For example, in a pre-set poetry writing task (for example, the last word in the second sentence needs to end with i), GPT-4o, which “pick up the pen and write”, did give an answer, but often only met some of the conditions and did not correct itself. This means that the AI must be able to run into the right answer the first time it is generated, otherwise it will definitely make an error. However, in the O1 model, AI will continuously try and error and polish the answers, thereby significantly improving the accuracy and quality of the generated results.

What's interesting is that when you click on the AI thinking process, AI will also appear saying “I'm thinking about this is okay to do this” and “ah, I don't have enough time to give an answer as soon as possible.” OpenAI confirmed that what is shown here is not an original thought chain, but a “summary of model generation,” and the company also frankly admits that there are factors that maintain a “competitive advantage” here.

Jerry Tworek, head of research at OpenAI, revealed that the training behind the O1 model is fundamentally different from previous products. The previous GPT model was designed to mimic the pattern in its training data, while o1's training was designed to allow it to solve problems independently. In the process of reinforcement learning, reward and punishment mechanisms are used to “teach” AI to use “thought chains” to handle problems, just as humans learn how to disassemble and analyze problems.

According to tests, the O1 model was able to get 83% of the points in the International Mathematical Olympiad qualifying exam, while GPT-4o only correctly solved 13% of the problems. In the Codeforces programming ability competition, the O1 model scored 89% of the percentile, while GPT-4o was only 11%.

(The picture shows that the capabilities of the O1 model preview version will be a bit lower than the official version)

OpenAI said that according to tests, in the next updated version, AI can perform similarly to the level of doctoral students in challenging benchmarks in physics, chemistry, and biology.

Time to talk about shortcomings and limitations

Understandably, an AI model that can think for itself is a useful upgrade for programmers, creative workers, and almost all science-related professionals, but this new model also has limitations.

First, the OpenAI o1 model (at least for now) isn't a multi-modal big model, and it's not as good as other models when it comes to answering factual questions. Therefore, in terms of image interaction, general knowledge quiz, and internet search, GPT-4o is still a better choice. Of course, OpenAI has made it clear that in the future it will add features such as networking, file and image upload to this model.

The other problem is that it's expensive, and it's very expensive. The o1-preview model is priced at $15 per million input tokens, and $60 per million output tokens, which is 3 and 4 times GPT-4o, respectively. One million tokens is roughly equivalent to 0.75 million English words.

In addition to the OpenAI O1-preview, OpenAI also launched the O1-mini model at the same time. The latter is a faster, cheaper model that is also 80% cheaper, and is suitable for scenarios that require reasoning but don't require extensive knowledge of the world.

Also, judging from OpenAI's “fumbling” behavior, I'm afraid this inference model consumes a lot of computing power. The company announced that from September 12, ChatGPT subscribers will have access to these two new models, but currently o1-preview is limited to 30 messages per week, while o1-mini is 50.

Both models can be accessed by enterprise ChatGPT and education users starting next week. Developers who have reached API usage level 5 can immediately start using these two models, with a rate limit of 20 times per minute. OpenAI is preparing to make the O1-mini model available to free users in the future, but there is no timeline yet.

Editor/Somer

Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more
    Write a comment