查看原文
其他

AGI 里程碑:OpenAI Sora 从文字图像到视频

lencx 浮之静 2024-03-10

Sora

Sora[1] 是 OpenAI 开发的一个先进的 AI 文本到视频模型,旨在根据文本指令创造出既真实又充满想象力的视频场景。该模型的开发着眼于使 AI 能够理解和模拟运动中的物理世界,以帮助解决需要与现实世界互动的问题。Sora 能生成长达一分钟的高质量视频,并能精确地根据用户的提示进行创作。

Sora 目前对红队成员开放,用于评估可能的风险和危害,并向视觉艺术家、设计师和电影制作人提供使用,以收集反馈,进一步改进模型,使其更好地服务于创意行业。通过与外部人士合作并获取反馈,OpenAI 希望分享其研究进展,并让公众了解即将到来的 AI 能力。

该模型能够理解用户的需求并生成包含复杂场景、多个角色、特定运动类型和背景的详细视频。Sora 不仅对语言有深刻理解,还能根据文本提示生成表现丰富情感的角色和保持一致性的多个镜头。尽管如此,Sora 在模拟复杂物理场景和理解特定因果关系方面仍存在局限。

为确保安全使用,OpenAI 计划在将 Sora 纳入其产品前采取多项安全措施,包括与红队专家合作测试模型、开发检测误导内容的工具(如包含:C2PA 元数据),以及利用已为 DALL·E 3 开发的安全方法(DALL·E 3 System Card[2])。此外,OpenAI 将与全球的政策制定者、教育工作者和艺术家合作,以确保对新技术的正面应用和避免潜在滥用。

技术方面,Sora 是基于扩散模型(Diffusion Model)和变压器架构(Transformer Architecture),能够生成或延长视频内容,确保即便主题暂时离开视线也能保持一致性。该模型通过将视频和图像分解为数据补丁来实现,这一方法提高了处理不同持续时间、分辨率和纵横比视觉数据的能力。Sora 的开发建立在 DALL·E 和 GPT 模型的基础之上,使用了高度描述性的重标记技术,以更准确地遵循用户的文本指示。此外,Sora 能够从静态图像生成视频,或对现有视频进行延长和帧填充。

OpenAI 认为,Sora 的开发是理解和模拟现实世界能力的重要里程碑,对实现通用人工智能(AGI)具有重要意义。

颠覆性

OpenAI Sora 不仅仅是一个创造性的工具,它实际上是一个复杂的模拟系统,能够模拟现实或想象中的世界。它通过学习如何正确地渲染场景、模拟物理行为、进行长期推理和理解场景的含义,从而创建出逼真的 3D 场景和动画。这个过程涉及到复杂的数学计算,比如去噪数据和应用梯度方法来改进模型的预测。

例如,当 Sora 被用来创建两艘海盗船在咖啡杯中战斗的视频时,它不仅需要生成逼真的 3D 模型,并且还要让这些模型根据物理规则动画化,模拟液体的动力学,并应用高级渲染技术来实现照片级真实感。即使这个场景在现实中不存在,Sora 也能够应用我们期望的物理规则来创造一个令人信服的模拟环境。

Sora 的目标是通过增加更多的输入方式和条件,成为一个全面的数据驱动工具,最终可能会取代目前需要大量手工操作的图形制作流程。

📌 海盗视频分析

对于 Sora,@DrJimFan[3] 则给出了高度评价,以下是推文内容:

如果你认为 OpenAI Sora 只是像 DALL-E 那样的创意玩具……,那你需要重新认识它。Sora 是一个基于数据的物理引擎。它模拟了许多真实或虚构的世界。这个模拟器通过一些去噪和梯度数学学习了复杂的渲染、"直观"物理、长视界推理和语义基础。如果 Sora 大量使用虚幻引擎 5 生成的合成数据进行训练,我一点也不会感到惊讶。它必须如此!

让我们来分析以下视频。提示:“两艘海盗船在咖啡杯里互相战斗的逼真特写视频。”

  • 模拟器实例化了两个精美的 3D 资产:具有不同装饰的海盗船。Sora 必须在其潜在空间中隐含地解决文本到 3D 的问题。

  • 3D 对象在航行和避开彼此的路径时保持一致动画。

  • 咖啡的流体动力学,甚至是围绕船只形成的泡沫。流体模拟是计算机图形学的一个整体子领域,传统上需要非常复杂的算法和方程。

  • 几乎像使用光线追踪渲染的照片级真实感。

  • 模拟器考虑到杯子与海洋相比的小尺寸,并应用了倾斜移位摄影技术,以给出一种“微缩”感觉。

  • 场景的语义在现实世界中并不存在,但引擎仍然实现了我们期望的正确物理规则。

视频演示

OpenAI 给出的视频提示词覆盖了一系列多样化和引人入胜的场景,我按照场景构图将其大致分为以下几类:现实(以人物,动物,城市,自然风光为构图要素),科幻(虚实结合),动漫(卡通风格)以及微观场景(细节特写)等。

现实场景

以人物,动物,城市街道和自然风光等为构图要素,包含丰富的交互动作处理。

  1. 一位时尚的女士在东京充满温暖霓虹灯光和动画城市标志的街道上行走。

  2. 美丽雪景下的东京市繁忙景象,相机跟随几位享受美好雪天和在附近摊位购物的人们。

  3. 中国农历新年庆祝视频,特色为中国龙。

  4. 展示 2056 年尼日利亚拉各斯人民的家庭制作视频,使用手机相机拍摄。

  5. 一部电影预告片,特色为一位穿着红色羊毛编织摩托车头盔的 30 岁太空人的冒险故事。

  6. 一位 60 多岁、有着灰白头发和胡须的男士的特写镜头,他在巴黎一家咖啡馆深思宇宙历史。

  7. 一只猫叫醒它的睡眠主人要求早餐。

  8. 相机围绕一大堆显示不同节目的复古电视机旋转,设置在纽约的一个大型博物馆画廊内。

  9. 艺术画廊之旅,展示多种风格的美丽艺术作品。

  10. 苏格兰的格伦芬南高架桥是一座历史悠久的铁路桥,一列蒸汽火车正从桥上驶过。

  11. 通过东京郊区旅行的火车窗口反射。

  12. 相机跟随一辆白色的复古 SUV 沿着陡峭的土路加速行驶,周围是松树。

  13. 加利福尼亚州淘金热的历史影像。

  14. 无人机围绕建在阿马尔菲海岸岩石上的一座美丽的历史教堂飞行。

  15. 圣托里尼在蓝色时刻的航拍视图,展示白色圆顶建筑的惊艳建筑。

  16. 大苏尔加雷角海滩崎岖悬崖边海浪冲击的无人机视角。

  17. 金巴坦河上的婆罗洲野生动物。

  18. 几头巨大的羊毛猛犸象穿过雪地,背景是雪覆盖的树木和雪山。

  19. 直面意大利布拉诺的彩色建筑,一只可爱的达尔马提亚犬从一楼的窗户向外看。

  20. 一只萨摩耶和一只金毛寻回犬在夜晚的未来主义霓虹城市中嬉戏。

  21. 一只柯基在热带的茂宜岛上自拍。

  22. 一窝金毛寻回犬小狗在雪地里玩耍,它们的头从雪中探出。

  23. 一只白色和橙色的猫快乐地穿过茂密的花园。

  24. 一只大型橙色章鱼在海底休息,周围是沙质和岩石地形。

  25. 一只维多利亚冠鸽的特写镜头,展示其惊人的蓝色羽毛和红色胸脯。

  26. 一只变色龙的特写镜头,展示其惊艳的变色能力。

详细 Prompt

📌 Prompt 1

A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

📌 Prompt 2

Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.

📌 Prompt 3

A Chinese Lunar New Year celebration video with Chinese Dragon.

📌 Prompt 4

A beautiful homemade video showing the people of Lagos, Nigeria in the year 2056. Shot with a mobile phone camera.

📌 Prompt 5

A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.

📌 Prompt 6

An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt , he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film.

📌 Prompt 7

A cat waking up its sleeping owner demanding breakfast. The owner tries to ignore the cat, but the cat tries new tactics and finally the owner pulls out a secret stash of treats from under the pillow to hold the cat off a little longer.

📌 Prompt 8

The camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery.

📌 Prompt 9

Tour of an art gallery with many beautiful works of art in different styles.

📌 Prompt 10

The Glenfinnan Viaduct is a historic railway bridge in Scotland, UK, that crosses over the west highland line between the towns of Mallaig and Fort William. It is a stunning sight as a steam train leaves the bridge, traveling over the arch-covered viaduct. The landscape is dotted with lush greenery and rocky mountains, creating a picturesque backdrop for the train journey. The sky is blue and the sun is shining, making for a beautiful day to explore this majestic spot.

📌 Prompt 11

Reflections in the window of a train traveling through the Tokyo suburbs.

📌 Prompt 12

The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.

📌 Prompt 13

Historical footage of California during the gold rush.

📌 Prompt 14

A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.

📌 Prompt 15

Aerial view of Santorini during the blue hour, showcasing the stunning architecture of white Cycladic buildings with blue domes. The caldera views are breathtaking, and the lighting creates a beautiful, serene atmosphere.

📌 Prompt 16

Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.

📌 Prompt 17

Borneo wildlife on the Kinabatangan River

📌 Prompt 18

Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.

📌 Prompt 19

The camera directly faces colorful buildings in burano italy. An adorable dalmation looks through a window on a building on the ground floor. Many people are walking and cycling along the canal streets in front of the buildings.

📌 Prompt 20

A Samoyed and a Golden Retriever dog are playfully romping through a futuristic neon city at night. The neon lights emitted from the nearby buildings glistens off of their fur.

📌 Prompt 21

A corgi vlogging itself in tropical Maui.

📌 Prompt 22

A litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in.

📌 Prompt 23

A white and orange tabby cat is seen happily darting through a dense garden, as if chasing something. Its eyes are wide and happy as it jogs forward, scanning the branches, flowers, and leaves as it walks. The path is narrow as it makes its way between all the plants. the scene is captured from a ground-level angle, following the cat closely, giving a low and intimate perspective. The image is cinematic with warm tones and a grainy texture. The scattered daylight between the leaves and plants above creates a warm contrast, accentuating the cat’s orange fur. The shot is clear and sharp, with a shallow depth of field.

📌 Prompt 24

A large orange octopus is seen resting on the bottom of the ocean floor, blending in with the sandy and rocky terrain. Its tentacles are spread out around its body, and its eyes are closed. The octopus is unaware of a king crab that is crawling towards it from behind a rock, its claws raised and ready to attack. The crab is brown and spiny, with long legs and antennae. The scene is captured from a wide angle, showing the vastness and depth of the ocean. The water is clear and blue, with rays of sunlight filtering through. The shot is sharp and crisp, with a high dynamic range. The octopus and the crab are in focus, while the background is slightly blurred, creating a depth of field effect.

📌 Prompt 25

This close-up shot of a Victoria crowned pigeon showcases its striking blue plumage and red chest. Its crest is made of delicate, lacy feathers, while its eye is a striking red color. The bird’s head is tilted slightly to the side, giving the impression of it looking regal and majestic. The background is blurred, drawing attention to the bird’s striking appearance.

📌 Prompt 26

This close-up shot of a chameleon showcases its striking color changing capabilities. The background is blurred, drawing attention to the animal’s striking appearance.

科幻场景

这些场景从科幻到自然美景,从超现实到手工艺术,展现了广泛的主题和创意表达方式。每个提示都开启了一个独特的故事空间,激发观众对不同世界的想象和探索。

  1. 一个机器人在赛博朋克环境中的生活故事。这个场景融入了未来主义和科技反乌托邦的元素,探讨了机器人在高度发展的科技社会中的角色和经历。

  2. 像亚特兰蒂斯一样沉没的纽约市。鱼类、鲸鱼、海龟和鲨鱼穿梭在纽约街道中游动。这个场景以其壮观的视觉效果和对传统城市景观的幻想性重塑,展现了一个奇妙的水下世界。

  3. 一群纸飞机在浓密的丛林中飘扬,它们绕着树木飞舞,仿佛是正在迁徙的鸟群。这个场景通过非传统的物体在自然环境中的行为,创造了一个既诗意又梦幻的画面。

  4. 一位 20 多岁的年轻人坐在天空中的一片云朵上,正在读书。这个场景蕴含着宁静和逃避现实的主题,通过将人物置于超现实的环境中,强调了阅读带来的想象力和自由。

  5. 一个精美渲染的纸艺世界,展现了一个珊瑚礁,其中充满了色彩斑斓的鱼类和海洋生物。这个场景通过精细的纸艺技巧,展现了海底世界的美丽和多样性,引发人们对自然美和环境保护的思考。

详细 Prompt

📌 Prompt 1

The story of a robot’s life in a cyberpunk setting.

📌 Prompt 2

New York City submerged like Atlantis. Fish, whales, sea turtles and sharks swim through the streets of New York.

📌 Prompt 3

A flock of paper airplanes flutters through a dense jungle, weaving around trees as if they were migrating birds.

📌 Prompt 4

A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.

📌 Prompt 5

A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.

动漫场景

这些场景涵盖了从奇幻森林探险到动物的趣味活动,从神秘生物的好奇探索到自然现象的壮观展示,再到动物间的情感连接,展现了广泛的主题和艺术表现形式,引人入胜且充满想象力。

  1. 一只小巧、圆滚滚、毛茸茸的生物,拥有大而富有表情的眼睛,在一个充满活力的魔法森林中探险。这个生物是兔子和松鼠的奇妙混合体,拥有柔软的蓝色皮毛和一条条纹浓密的尾巴。它在一条闪闪发光的小溪旁跳跃,眼睛里充满了惊奇。森林充满了魔法元素:变色的发光花朵、紫色和银色叶子的树木,以及类似萤火虫的小浮光。这个生物与一群围绕着蘑菇环跳舞的小仙子般的生物玩耍互动,对一棵似乎是森林心脏的大树发出敬畏地仰望。

  2. 一只可爱又快乐的水獭穿着黄色救生衣,自信地站在冲浪板上,在靠近郁郁葱葱的热带岛屿的绿松石色热带水域上冲浪,3D 数字渲染艺术风格。

  3. 动画场景特写展示了一只短毛绒绒的怪物跪在一根融化的红色蜡烛旁。艺术风格为 3D 且逼真,专注于光照和纹理。画面氛围充满了好奇和惊奇,怪物睁大眼睛、张开嘴巴凝视着火焰。它的姿势和表情传达了一种纯真和好奇心,仿佛它第一次探索周围的世界。温暖的色彩和戏剧性的光照进一步增强了图像的舒适氛围。

  4. 一只卡通袋鼠在迪斯科舞会上跳舞。

  5. 一个形状如同人类的巨大云团在地球上空盘旋,云人向地球发射闪电。

  6. 一部美丽的剪影动画展示了一只狼在月光下孤独地嚎叫,直到它找到了它的狼群。

详细 Prompt

📌 Prompt 1

3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant, enchanted forest. The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and a bushy, striped tail. It hops along a sparkling stream, its eyes wide with wonder. The forest is alive with magical elements: flowers that glow and change colors, trees with leaves in shades of purple and silver, and small floating lights that resemble fireflies. The creature stops to interact playfully with a group of tiny, fairy-like beings dancing around a mushroom ring. The creature looks up in awe at a large, glowing tree that seems to be the heart of the forest.

📌 Prompt 2

An adorable happy otter confidently stands on a surfboard wearing a yellow lifejacket, riding along turquoise tropical waters near lush tropical islands, 3D digital render art style.

📌 Prompt 3

Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.

📌 Prompt 4

A cartoon kangaroo disco dances.

📌 Prompt 5

A giant, towering cloud in the shape of a man looms over the earth. The cloud man shoots lighting bolts down to the earth.

📌 Prompt 6

A beautiful silhouette animation shows a wolf howling at the moon, feeling lonely, until it finds its pack.

微观场景

这些场景通过不同的艺术和拍摄技巧展现了从现实到幻想的各种主题,从微观世界到壮观的自然景观,每个场景都以其独特的方式激发观众的想象力和感受。

  1. 两艘海盗船在一杯咖啡内部战斗的逼真特写视频。这个场景以其奇特的概念和高度的视觉现实感吸引观众,将传统的海战场景置于一个微型且不寻常的环境中。

  2. 在摩洛哥马拉喀什的魔法时刻,一位 24 岁女性的眼睛眨动的特写镜头,使用 70mm 胶片拍摄的电影式影片,具有深度的场景、鲜艳的颜色和电影质感。

  3. 使用倾斜移位技术拍摄的建筑工地,场地中充满了工人、设备和重型机械,这种技术创造了一种独特的视觉效果,使真实的场景看起来像是微缩模型。

  4. 一个培养皿内生长着的竹林,里面有细小的红熊猫奔跑。这个想象力丰富的场景将自然界的元素与科学实验的概念相结合,创造出一个既迷人又令人惊奇的微型世界。

  5. 一个内有禅园的玻璃球特写视图。球内有一个小矮人正在耙沙子并在沙中创造图案。这个场景以其精细和平静的细节捕捉了禅宗花园的精神,同时加入了奇幻元素。

  6. 一朵花在郊区房屋的窗台上生长的定格动画。这个简单而美丽的场景通过定格动画技术生动地展现了自然生长的奇迹,带给人们关于成长和新生命的启示。

详细 Prompt

📌 Prompt 1

Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.

📌 Prompt 2

Extreme close up of a 24 year old woman’s eye blinking, standing in Marrakech during magic hour, cinematic film shot in 70mm, depth of field, vivid colors, cinematic.

📌 Prompt 3

Tiltshift of a construction site filled with workers, equipment, and heavy machinery.

📌 Prompt 4

A petri dish with a bamboo forest growing within it that has tiny red pandas running around.

📌 Prompt 5

A close up view of a glass sphere that has a zen garden within it. There is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand.

📌 Prompt 6

A stop motion animation of a flower growing out of the windowsill of a suburban house.

模型缺陷

当前模型存在弱点。它可能在准确模拟复杂场景的物理特性方面遇到困难,可能不理解特定的因果关系。例如,一个人可能咬了一口饼干,但之后,饼干可能没有咬痕。模型还可能混淆提示中的空间细节,例如,将左右搞混,可能难以精确描述随时间发生的事件,如遵循特定的摄影机轨迹。

以下提示和缺陷概括了 Sora 模型在生成具体场景时可能遇到的一些问题(如:处理动作、实体数量、物理准确性和复杂互动方面):

  • 场景:以电影风格拍摄的 35mm 胶片中,一个人物跑步的逐帧打印场景。

    • 缺陷:Sora 有时会创造出物理上不合理的动作。

  • 场景:五只灰色的狼崽在一条偏远的碎石路上嬉戏追逐,周围是草地。小狼崽们跑来跑去,互相追逐,轻咬彼此,玩耍。

    • 缺陷:尤其在包含许多实体的场景中,动物或人物可能会突然出现。

  • 场景:篮球穿过篮圈后爆炸。

    • 缺陷:这是不准确的物理建模和物体“变形”的一个例子。

  • 场景:考古学家在沙漠中发现了一把普通的塑料椅,他们小心地挖掘并清扫它。

    • 缺陷:在这个例子中,Sora 未能将椅子建模为一个刚体,导致物理交互不准确。

  • 场景:一位梳理整齐的灰发祖母站在一张木餐桌后,桌上摆着一个带有许多蜡烛的五彩生日蛋糕,脸上是纯粹的喜悦和幸福的表情,眼中闪烁着幸福的光芒。她向前倾,轻轻一吹熄灭了蜡烛,蛋糕上涂有粉红色的糖霜和彩色糖屑,蜡烛熄灭了,祖母穿着一件印有花朵图案的淡蓝色衬衫,桌边坐着几位模糊的欢笑的朋友和家人。这个场景以电影质感捕捉了祖母和餐厅的 3/4 视角。温暖的色调和柔和的光线增强了氛围。

    • 缺陷:模拟多个对象和多个角色之间的复杂互动通常对模型来说是一个挑战,有时会导致产生有趣的结果。

详细 Prompt

😅 1

Prompt: Step-printing scene of a person running, cinematic film shot in 35mm.

Weakness: Sora sometimes creates physically implausible motion.

😅 2

Prompt: Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping at each other, playing.

Weakness: Animals or people can spontaneously appear, especially in scenes containing many entities.

😅 3

Prompt: Basketball through hoop then explodes.

Weakness: An example of inaccurate physical modeling and unnatural object “morphing.”

😅 4

Prompt: Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care.

Weakness: In this example, Sora fails to model the chair as a rigid object, leading to inaccurate physical interactions.

😅 5

Prompt: A grandmother with neatly combed grey hair stands behind a colorful birthday cake with numerous candles at a wood dining room table, expression is one of pure joy and happiness, with a happy glow in her eye. She leans forward and blows out the candles with a gentle puff, the cake has pink frosting and sprinkles and the candles cease to flicker, the grandmother wears a light blue blouse adorned with floral patterns, several happy friends and family sitting at the table can be seen celebrating, out of focus. The scene is beautifully captured, cinematic, showing a 3/4 view of the grandmother and the dining room. Warm color tones and soft lighting enhance the mood..

Weakness: Simulating complex interactions between objects and multiple characters is often challenging for the model, sometimes resulting in humorous generations.

用户生成

这部分分享一些用户使用 Sora 生成的视频,包含赛事、冒险、魔法和未来城市等广泛的主题。

  1. 在海洋上举行的自行车比赛,不同的动物作为运动员骑着自行车,场景通过无人机视角捕捉。

  2. 在火星上的日落时分举行的未来主义无人机比赛。

  3. 一位作为社交媒体影响者的祖母主持的自制诺奇面烹饪教学,设置在具有电影级照明的乡村托斯卡纳厨房中。

  4. 两只金毛寻回犬在山顶上进行播客。

  5. 一只半鸭半龙在美丽的日落中飞行,背上载着一只穿着冒险装备的仓鼠。

  6. 通过一个街道级别的游览,展示了一个与自然和谐共存的未来城市,同时拥有赛博朋克/高科技的特征。城市干净整洁,拥有先进的未来主义电车、美丽的喷泉、无处不在的巨型全息投影和遍布的机器人。视频展示了一个来自未来的人类导游向一群外星人展示人类能够建造的最酷最辉煌的城市。

  7. 一个穿着尖顶帽和蓝色长袍、长袍上绣着白色星星的巫师,用一只手施法发出闪电,另一只手持着一本古老的书。

详细 Prompt

📌 Prompt 1

A bicycle race on ocean with different animals as athletes riding the bicycles with drone camera view

📌 Prompt 2

a futuristic drone race at sunset on the planet mars

📌 Prompt 3

A instructional cooking session for homemade gnocchi hosted by a grandmother social media influencer set in a rustic Tuscan country kitchen with cinematic lighting

📌 Prompt 4

Two golden retrievers podcasting on top of a mountain

📌 Prompt 5

A half duck half dragon flies through a beautiful sunset with a hamster dressed in adventure gear on its back

📌 Prompt 6

A street-level tour through a futuristic city which in harmony with nature and also simultaneously cyperpunk / high-tech.

The city should be clean, with advanced futuristic trams, beautiful fountains, giant holograms everywhere, and robots all over.

Have the video be of a human tour guide from the future showing a group of extraterrestial aliens the coolest and most glorious city that humans are capable of building.

📌 Prompt 7

a wizard wearing a pointed hat and a blue robe with white stars casting a spell that shoots lightning from his hand and holding an old tome in his other hand

C2PA

内容出处和真实性联盟(C2PA[4]: Coalition for Content Provenance and Authenticity)是一个旨在通过开发技术标准来认证媒体内容来源和历史的联合项目。该联盟由 Adobe、Arm、Intel、Microsoft 和 Truepic 等行业领导者通过合作建立,作为联合开发基金会的一部分,这是一个位于华盛顿的非盈利组织。C2PA 旨在通过技术手段解决数字时代信息真实性的挑战。通过开发和推广内容出处和真实性的技术规范,致力于为数字媒体的创建、发布和分享建立一个更加透明和可信的环境。

C2PA 将 Adobe 领头的内容真实性倡议(CAI[5])和 Microsoft 及 BBC 领头的 Project Origin[6] 倡议的努力合而为一。CAI 专注于为数字媒体提供背景和历史系统,而 Project Origin 致力于打击数字新闻生态系统中的虚假信息。通过这些合作,C2PA 致力于创建一个统一的框架,为出版商、创作者和消费者提供追踪不同类型媒体来源的能力。

2022 年 1 月 26 日,C2PA 举办了一次重要活动(Digital Content Provenance Event: January 26, 2022[7]),邀请政策制定者、学者和行业领袖共同探讨负责任的数字媒体创作、发布和分享的未来。此次活动凸显了数字媒体领域面临的挑战以及 C2PA 在促进媒体真实性和透明度方面的潜在作用。

C2PA 规范及相关文档可公开获取,提供了技术规范的详细信息,包括如何实施这些标准以验证媒体内容的来源和历史。这些文档旨在帮助理解和采用 C2PA 标准,进而提高媒体内容的真实性和信任度。C2PA 还鼓励社区通过 GitHub issues[8] 反馈意见,以不断完善和更新这些技术规范。

OpenAI 实施 C2PA 标准

C2PA(内容出处和真实性联盟)标准是一个开放的技术规范,旨在通过嵌入元数据来验证媒体的来源和相关信息,不仅适用于 AI 生成的图像,还被相机制造商、新闻组织等采纳,以确保媒体内容的来源和历史得到证明。

OpenAI 已在其 DALL·E 3 模型和通过 ChatGPT 生成的图像中实现了 C2PA 元数据的嵌入,使得用户能够验证图像是否由 OpenAI 的工具生成。这种验证可以通过特定网站进行,尽管需要注意的是,元数据可能被意外或故意移除(如:截图),因此,缺少元数据的图像仍可能是通过 OpenAI 技术生成的。

目前,仅 DALL·E 3 模型生成的图像包含 C2PA 元数据,而通过 ChatGPT 或 OpenAI API 生成的文本或语音则不包含这种元数据。C2PA 元数据的添加对文件大小的影响相对较小,例如,通过 API 生成的 PNG 图像大小可能仅增加 3%,而通过 ChatGPT 生成的 WebP 图像大小可能增加高达 32%。

嵌入的 C2PA 元数据提供了一种机制,使得生成的图像包含一个签名,表明它们是由 DALL·E 3 模型生成的,而通过 ChatGPT 生成的图像还将包含一个额外的清单,指出内容是使用 ChatGPT 创建的,从而为图像提供了双重出处的证明。这种实践被认为是增加数字信息可信度的关键方法。

初始元数据清单表明图像是使用 DALL-E 3 创建
初始元数据清单表明图像是使用 ChatGPT 创建
初始元数据清单表明图像是使用 DALL-E 3 创建

了解更多:C2PA in DALL·E 3[9]

References

[1]

Sora: https://openai.com/sora

[2]

DALL·E 3 System Card: https://cdn.openai.com/papers/DALL_E_3_System_Card.pdf

[3]

@DrJimFan: https://twitter.com/DrJimFan

[4]

C2PA: https://c2pa.org

[5]

CAI: https://contentauthenticity.org

[6]

Project Origin: https://www.originproject.info

[7]

Digital Content Provenance Event: January 26, 2022: https://c2pa.org/jan-2022_event

[8]

GitHub issues: https://github.com/c2pa-org/specifications/issues

[9]

C2PA in DALL·E 3: https://help.openai.com/en/articles/8912793-c2pa-in-dall-e-3

继续滑动看下一个

AGI 里程碑:OpenAI Sora 从文字图像到视频

lencx 浮之静
向上滑动看下一个

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存