Google has significantly updated its generative AI, launching the video model VEO 2 and the latest version Imagen 3.

wallstreetcn · 03:54

谷歌的旗舰AI研究实验室Google DeepMind周一大幅升级其人工智能驱动的内容生成工具，推出了Veo 2视频生成模型和增强版Imagen 3图像模型，挑战OpenAI在AI图像和视频生成的领先地位。谷歌表示，这些更新有望彻底改变创意工作流程，为视频和图像创作者提供更高的真实感和定制化体验。

作者：赵雨荷

来源：硬AI

谷歌介绍，Veo 2是谷歌的视频生成工具，能够生成多样化主题和风格的高质量视频。谷歌在博客中表示，这款模型在真实感方面表现卓越，能够捕捉到人类表情和电影效果等细节。其增强的物理和电影学理解能力使用户能够生成令人惊叹的内容，包括跟踪镜头和广角构图等。

例如，Veo 2熟悉电影拍摄语言，用户可以要求某种类型的风格，指定镜头，建议电影效果，Veo 2都会以高达4K分辨率并延长到数分钟的视频长度来呈现。比如，要求“低角度跟踪镜头穿越场景中央”或“特写科学家通过显微镜观察”的镜头，Veo 2都能实现。提示“18mm镜头”，Veo 2知道生成广角镜头；要求“浅景深”，它会模糊背景，突出主体。

值得注意的是，这一分辨率是OpenAI Sora模型的4倍，视频时长更是其6倍以上。

不过，目前这些优势仍是理论上的。在谷歌的实验性视频创作工具VideoFX中，Veo 2生成的视频被限制为720p分辨率、8秒的长度。（相比之下，Sora的最大输出为1080p、20秒的短片。）

谷歌表示，虽然视频生成模型往往会“幻象化”出不需要的细节，例如多余的手指或意外的物体，但Veo 2在这一方面的表现更为真实，生成错误的频率较低。

此外，Veo 2生成的视频包括不可见的SynthID水印，用于标记它们为AI生成的内容，从而减少误用或错误归属的风险。

DeepMind产品副总裁Eli Collins对媒体表示，随着模型逐渐具备规模化使用的准备，谷歌将通过其Vertex AI开发者平台提供Veo 2。

“未来几个月，我们将根据用户反馈持续迭代，并寻求将Veo 2的更新能力整合到谷歌生态系统中的相关应用中……我们预计明年会分享更多的更新内容。”

开发者和创作者目前可以通过谷歌实验室（Google Labs）访问该工具，预计到2025年，它将广泛集成至诸如YouTube Shorts等平台。

同时，Imagen 3模型在图像构图和细节准确性方面得到了增强，支持从写实到抽象的各种风格，能够生成更丰富的纹理，并更加忠实地回应用户提示。

目前，Imagen 3已经通过谷歌实验室的ImageFX工具在100多个国家上线，全球用户可以试验其尖端功能。

此外，谷歌还推出了Whisk，这是一款结合了Imagen 3和Gemini视觉分析能力的创意工具。用户可以输入图像，生成详细的文字描述、重新混合风格，或设计个性化作品，如数字玩偶或搪瓷徽章。

谷歌介绍，Whisk结合了Imagen 3模型和Gemini的视觉理解与描述能力。Gemini模型会自动为用户的图像生成详细的文字描述，并将这些描述传递给Imagen 3。这一过程让用户能够以有趣的新方式重新混合主题、场景和风格。

本文来自微信公众号“硬AI”，关注更多AI前沿资讯请移步这里

Google's flagship AI research lab, Google DeepMind, significantly upgraded its AI-driven content generation tools on Monday, launching the Veo 2 video generation model and the enhanced Imagen 3 image model, challenging OpenAI's leading position in AI image and video generation. Google stated that these updates are expected to profoundly transform creative workflows, offering video and image creators a higher degree of realism and customization.

Author: Zhao Yuhua

Source: Hard AI.

Google introduces Veo 2, which is Google's video generation tool capable of creating high-quality videos with diverse themes and styles. Google stated in a blog that this model performs excellently in realism, able to capture details such as human expressions and cinematic effects. Its enhanced understanding of physics and cinematography allows users to generate stunning content, including tracking shots and wide-angle compositions.

For example, Veo 2 is familiar with cinematic language; users can request a certain type of style, specify shots, and suggest cinematic effects, and Veo 2 will present them at resolutions up to 4K and extend to several minutes in length. For instance, requests like "low-angle tracking shot through the center of the scene" or "close-up of a scientist observing through a microscope" can all be realized by Veo 2. A prompt for "18mm lens" indicates that Veo 2 knows to generate a wide-angle shot; a request for "shallow depth of field" will blur the background to emphasize the subject.

It is worth noting that this resolution is four times that of the OpenAI Sora model, with a video length that is more than six times its duration.

However, these advantages are still theoretical at this stage. In Google's experimental video creation tool, VideoFX, videos generated by Veo 2 are limited to a resolution of 720p and a length of 8 seconds. (In comparison, Sora's maximum output is 1080p, with short films lasting 20 seconds.)

Google stated that while video generation models often create unwanted details, such as extra fingers or unexpected objects, Veo 2 performs more realistically in this regard, with a lower frequency of errors.

Additionally, the videos generated by Veo 2 include an invisible SynthID watermark to mark them as AI-generated content, thereby reducing the risk of misuse or misattribution.

Eli Collins, Vice President of Product at DeepMind, told the media that as the model becomes ready for large-scale use, Google will provide Veo 2 through its Vertex AI developer platform.

In the coming months, feedback from users will be continuously iterated upon, and efforts will be made to integrate the updated capabilities of Veo 2 into relevant applications within the Google ecosystem... More updates are expected to be shared next year.

Developers and creators can currently access this tool through Google Labs, and it is expected to be widely integrated into platforms such as YouTube Shorts by 2025.

At the same time, the Imagen 3 model has been enhanced in terms of image composition and detail accuracy, supporting a range of styles from realistic to abstract, allowing for richer textures and more faithful responses to user prompts.

Currently, Imagen 3 has launched in over 100 countries through Google Labs' ImageFX tool, allowing global users to experiment with its cutting-edge features.

In addition, Google has launched Whisk, a creative tool that combines the visual analysis capabilities of Imagen 3 and Gemini. Users can input images to generate detailed textual descriptions, remix styles, or design personalized works such as digital dolls or enamel badges.

Google introduced that Whisk combines the Imagen 3 model with Gemini's visual understanding and description capabilities. The Gemini model automatically generates detailed textual descriptions for the user's images and transmits these descriptions to Imagen 3. This process allows users to remix themes, scenes, and styles in interesting new ways.

This article is from the WeChat public account, "YingAI." For more AI news, please click here.

Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more

谷歌大幅更新生成式AI，推出视频模型VEO 2和最新版Imagen3

Google has significantly updated its generative AI, launching the video model VEO 2 and the latest version Imagen 3.

Risk Disclaimer

Statement