share_log

Sora劲敌!Meta最强沉浸式AI媒体模型来了,300亿参数模型支持Movie Gen视频

Sora's rival! The most powerful immersive AI media model, Meta, has arrived, with a 30 billion parameter model supporting Movie Gen videos.

wallstreetcn ·  Oct 4 13:10

Meta stated that Movie Gen is the "most advanced, with the best immersive storytelling model kit", trained based on authorized and publicly available data, capable of generating videos at a speed of 16 frames per second and up to 16 seconds long; a 13 billion parameter model supports audio generation; in human evaluations, Movie Gen's video generation capability outperforms Sora by 8.2 net wins. Meta did not specify the release date, but Zuckerberg mentioned it will launch on Instagram next year.

Author of this article: Li Dan.

The weather is good today The weather is good today.

OpenAI's Sora faces a tough opponent as Meta launches the Movie Gen, claimed to be the most advanced media-based model.

Meta claims that Movie Gen is the company's groundbreaking generative AI research for media, encompassing modes such as images, videos, and audio. Users can create custom videos and sounds, edit existing videos, and transform personal images into unique videos simply by entering text. Movie Gen's performance in these tasks in human evaluations surpasses similar models in the industry.

Meta introduces Movie Gen as the most advanced and immersive storytelling model suite, combining the first wave of the company's generative AI media research with the Make-A-Scene series models. It can create models for images, audio, videos, and 3D animations, as well as the second wave of research models focusing on the Llama Image base model with the emergence of diffusion models. This enables higher-quality image and video generation as well as image editing.

Documentary videos up to 16 seconds, 13 billion-parameter audio generative models, and human evaluation of video generation with Sora have a net win rate of 8.2.

In summary, Movie Gen has four main functions: video generation, personalized video generation, precise video editing, and audio generation.

For video generation, Meta introduces that users only need to provide a text prompt. Movie Gen can use a joint model optimized for text-to-image and text-to-video to create high-definition, high-quality images and videos. Movie Gen's video model has 30 billion parameters, and this conversion model can generate videos up to 16 seconds at a speed of 16 frames per second.

Meta states that these models can infer object movements, interactions between the shooting subject and objects, as well as camera movements. They can also learn various concepts to understand reasonable movements, making them the most advanced models in the same category. When introducing this feature, Meta showcased several 10-second video clips, including a small hippo swimming like 'Moo Deng,' a bouncing pig that has gone viral on social media.

large
large

Huawei noticed that based solely on the maximum length of generated videos, Movie Gen is not as good as Sora released by OpenAI in February this year. What amazed the industry about Sora is its ability to create up to 60-second natural videos. However, compared to Meta's video model Emu Video announced in November last year, Movie Gen has indeed made significant progress. Emu Video could only generate videos up to 4 seconds at a speed of 16 frames per second.

In addition to directly creating natural videos, Movie Gen also excels in personalized video production. Meta introduces that it has expanded the aforementioned basic models to support the generation of personalized videos. Users can provide an image of a person, along with text prompts, allowing Movie Gen to generate videos that include the person from the reference image and visual details matching the text prompts. Meta states that its model has achieved the most advanced results in creating personalized videos that retain human identity and movements.

In a video demonstrated by Meta, users can provide a photo of a girl, input the text 'a female DJ wearing a pink vest playing records, accompanied by a cheetah,' and then generate a DJ spinning records resembling the girl in the photo, along with a cheetah companion.

large
large

In terms of precise video editing, Meta stated that Movie Gen uses a variant model of the same underlying model, accurately performing tasks based on user-input videos and text prompts to generate the desired output. It combines video generation with advanced image editing, performing local edits such as adding, removing, or replacing elements, as well as global changes like background or style modifications. Unlike traditional tools that require professional skills or lack precision in generation, Movie Gen retains the original content and focuses solely on relevant pixel edits.

One example provided by Meta is when users input a request for a penguin to wear clothing in the style of Queen Victoria's era in the United Kingdom. Movie Gen generated an image of the penguin wearing a red female skirt with lace.

large
large

For audio generation, Meta mentioned training an audio generation model with 13 billion parameters, capable of accepting videos and optional text prompts to produce high-quality, high-fidelity audio lasting up to 45 seconds. This includes environmental sounds, Foley sound effects, instrumental background music, all synchronized with the video content. Additionally, Meta introduced an audio extension technology that can generate coherent audio for videos of any length, achieving overall state-of-the-art performance in audio quality, video-to-audio alignment, and text-to-audio alignment.

One example provided by Meta involves generating the sound of an ATV engine accelerating under the accompaniment of guitar music. Another example includes orchestral sounds with rustling leaves and snapping twigs.

large
large

Meta also presented the results of A/B comparative tests of the above four capabilities, the net positive win rate shown in the figure represents that, compared to competitors like Sora, human evaluators prefer the results generated by the Movie Gen model. In terms of the feature of directly generating videos, Movie Gen's net win rate compared to Sora reaches 8.2.

large

Based on authorized and publicly available data training, it is not clear when it will be released. Zuckerberg said Instagram will go live next year.

What information is Movie Gen trained on? Meta's statement does not specify specific details, only saying: "We trained these models based on authorized and publicly available datasets."

Some comments point out that for generative AI tools, the source of training data and what data is scraped from the internet are still controversial issues. Moreover, the public rarely knows what text, video, or audio clips were used to create any large models.

Other comments suggest that Meta stated that the dataset used for training is "proprietary/commercially sensitive", without providing details, so one can only speculate that the data includes many videos from Instagram and Facebook platforms, along with some content from Meta's partners, as well as many other inadequately protected contents, namely "publicly available" content.

Regarding the release time, Meta did not specify when Movie Gen will be launched to the public this Friday, just vaguely saying "possible release in the future." In February of this year, OpenAI announced Sora, which has not yet been truly made available to the public, and has not revealed any planned release date.

However, Meta CEO Mark Zuckerberg stated that Movie Gen will be launched on Meta's social media Instagram next year. He posted a video generated by Movie Gen on his personal Instagram account, showing him using a leg press machine. As he started exercising, the background changed. It first showed him working out in a futuristic neon-lit gym, then changed to him exercising in gladiator armor, followed by him pushing a burning pure gold machine, and finally, him using his legs to press a box of chicken nuggets surrounded by fries.

Zuckerberg added that Meta's new MovieGen AI model can create and edit videos, and every day is a leg day. The model will be launched on Instagram next year.

large
On social media X, Meta officially announced and demonstrated Movie Gen. Below the post, some highly liked comments show that netizens are already urging Meta to officially release the model. Some users ask if everyone will have the opportunity to try it.large
Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more
    Write a comment