share_log

谷歌大幅更新生成式AI,推出视频模型VEO 2和最新版Imagen3

Google has significantly updated its generative AI, launching the video model VEO 2 and the latest version Imagen 3.

wallstreetcn ·  03:54

Google's flagship AI research lab, Google DeepMind, significantly upgraded its AI-driven content generation tools on Monday, launching the Veo 2 video generation model and the enhanced Imagen 3 image model, challenging OpenAI's leading position in AI image and video generation. Google stated that these updates are expected to profoundly transform creative workflows, offering video and image creators a higher degree of realism and customization.

Author: Zhao Yuhua

Source: Hard AI.

Google's flagship AI research lab, Google DeepMind, significantly upgraded its AI-driven content generation tools on Monday, launching the Veo 2 video generation model and the enhanced Imagen 3 image model, challenging OpenAI's leading position in AI image and video generation. Google stated that these updates are expected to profoundly transform creative workflows, offering video and image creators a higher degree of realism and customization.

Google introduces Veo 2, which is Google's video generation tool capable of creating high-quality videos with diverse themes and styles. Google stated in a blog that this model performs excellently in realism, able to capture details such as human expressions and cinematic effects. Its enhanced understanding of physics and cinematography allows users to generate stunning content, including tracking shots and wide-angle compositions.

For example, Veo 2 is familiar with cinematic language; users can request a certain type of style, specify shots, and suggest cinematic effects, and Veo 2 will present them at resolutions up to 4K and extend to several minutes in length. For instance, requests like "low-angle tracking shot through the center of the scene" or "close-up of a scientist observing through a microscope" can all be realized by Veo 2. A prompt for "18mm lens" indicates that Veo 2 knows to generate a wide-angle shot; a request for "shallow depth of field" will blur the background to emphasize the subject.

It is worth noting that this resolution is four times that of the OpenAI Sora model, with a video length that is more than six times its duration.

However, these advantages are still theoretical at this stage. In Google's experimental video creation tool, VideoFX, videos generated by Veo 2 are limited to a resolution of 720p and a length of 8 seconds. (In comparison, Sora's maximum output is 1080p, with short films lasting 20 seconds.)

Google stated that while video generation models often create unwanted details, such as extra fingers or unexpected objects, Veo 2 performs more realistically in this regard, with a lower frequency of errors.

Additionally, the videos generated by Veo 2 include an invisible SynthID watermark to mark them as AI-generated content, thereby reducing the risk of misuse or misattribution.

Eli Collins, Vice President of Product at DeepMind, told the media that as the model becomes ready for large-scale use, Google will provide Veo 2 through its Vertex AI developer platform.

In the coming months, feedback from users will be continuously iterated upon, and efforts will be made to integrate the updated capabilities of Veo 2 into relevant applications within the Google ecosystem... More updates are expected to be shared next year.

Developers and creators can currently access this tool through Google Labs, and it is expected to be widely integrated into platforms such as YouTube Shorts by 2025.

At the same time, the Imagen 3 model has been enhanced in terms of image composition and detail accuracy, supporting a range of styles from realistic to abstract, allowing for richer textures and more faithful responses to user prompts.

big

Currently, Imagen 3 has launched in over 100 countries through Google Labs' ImageFX tool, allowing global users to experiment with its cutting-edge features.

In addition, Google has launched Whisk, a creative tool that combines the visual analysis capabilities of Imagen 3 and Gemini. Users can input images to generate detailed textual descriptions, remix styles, or design personalized works such as digital dolls or enamel badges.

Google introduced that Whisk combines the Imagen 3 model with Gemini's visual understanding and description capabilities. The Gemini model automatically generates detailed textual descriptions for the user's images and transmits these descriptions to Imagen 3. This process allows users to remix themes, scenes, and styles in interesting new ways.

big

This article is from the WeChat public account, "YingAI." For more AI news, please click here.

Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more
    Write a comment