查看原文
其他

一手测评丨Midjourney V6 上线,现在绘图可以说人话了

晓龙 AI科技评论
2024-09-06

省流:Midjourney 在2023年12月21日发布了 V6 版本,是 Midjourney 团队从头开始训练的第三个模型。目前 V5.2 和 V6 之间暂无 benchmark 对比,因此本次大部分改进都是通过肉眼的感官去体验。



1
Midjourney V6 新特性

据Midjourney的公告,重要更新为:

“对提示词的理解更加准确,并支持更长更详细的自然描述,同时输出的图像细节更符合真实逻辑”

“新增subtle 和 creative 模式,可输出图片分辨率提高了一倍”

“文字识别大幅进步,已经可以准确的绘画出简单的文本信息”

一句话总结:目前最新版的Midjourney更倾向要你对它讲人话,同时他会反馈给你更符合你想象力的图。

Midjourney官方公告

在公告的内容中,官方着重提示,新版本的改进的“绘图”方式与以往差别很大,杜绝使用“逼真、4K、8K”等无意义的“垃圾词”。最好重新学习如何去“念咒”,过去很长一段时间已经养成了“标签式”提示词使用习惯的用户现在可以重新拾起自己的英语写作能力。

但这并不代表标签式关键词就完全被否定。高Stylize的参数设置仍适合标签式关键词生成“风格强烈的艺术图像”。

比如尝试“描绘”出深圳平安金融中心,给出的图还是很炫技的。



2
正式开始试用Midjourney V6

实测进入Midjourne V6 操作

在/setting指令中,目前已支持切换至V6[ALPHA]模型。

让我们跑一下全新版本的Midjourney。首先就快速的看看V6现在细节的处理能力。

财务拍给我她的喂猫日常,与我今天的AI喂猫

向公司财务要了一张她儿子王小胖的日常照,然后根据这张照片的信息进行Ai绘画。

将王小胖的图片上传到任意图床,最便捷的方式就是在Discord聊天中发送。以此可快速的生成一个图床提供给Midjourney用于理解我的想法(垫图准备完成)。

在V6版本中,在垫图的基础上直接陈述脑子里所想的画面。

最后生成的图已经不再是“一眼AI”。

在语义清晰,对图像细节描写详尽的基础上,AI图像和原相机之间已经零距离。

首次可以清晰的把文字完整的加入AI绘图中

单独测试Midjourney V6的文字生成能力

站在设计师的角度,随手生成两组AItechtalk的logo,甲方爸爸是否能过稿这点无法下定论,但这两组logo生成合共只需90秒。

  • prompt:a logo for "AI Tech Talk" --v 6.0 --s 100 Variations (Strong)

AItechtalk logo

AItechtalk logo

根据Midjourney目前的官方指引,文字绘制建议在低stylize下使用,通过目前的测试,设置在--s 100的情况下对文字绘画出的准确度最高。

当设置更高的stylize进行绘图,生成的文字准确度明显下降,会出现拼写错误。

  • prompt:A monitor displaying the text "AItechtalk" in clear, bold font. --v 6.0 --s 1000

在更复杂的构图中使用低stylize的文字绘画也能保证准确度。
绘制一间Aitechtalk的概念酒吧。
  • prompt:The facade of "AItechtalk" is a captivating blend of old-world charm and futuristic allure, set in a cozy corner of the street. The exterior is striking, with a sleek, modern sign bearing the bar's name in luminescent letters that glow against the night sky. The entrance is framed by a pair of high-tech, frosted glass doors, etched with subtle, digital patterns that hint at the technological theme inside. As patrons approach, they notice smart panels displaying scrolling text of the latest tech news and snippets of code, engaging the tech-savvy crowd. The windows are tinted but occasionally flicker with the silhouettes of people and the ambient, changing lights from within, suggesting a lively and dynamic environment inside. The overall look is minimalist yet intriguing, inviting those with a curiosity for technology and a love for the night to step into a world where innovation meets relaxation.
绘制一间Aitechtalk的概念咖啡厅。
  • prompt:Picture a stylish and sophisticated café, its large, bold sign reading "AI Tech Talk" making a statement in sleek, modern lettering against the backdrop of an elegantly designed façade. The café exudes a sense of upscale yet welcoming ambiance, attracting a clientele of tech aficionados and casual coffee lovers alike. The exterior is tastefully decorated, perhaps with a combination of natural wood and industrial materials, reflecting a blend of warmth and innovation. Large windows offer a transparent view into the interior, where the lighting is cozy and inviting, casting a soft glow on the chic furniture and decor. The "AI Tech Talk" sign is not just a name but a declaration of the café's theme, possibly featuring an element of technology like a digital display or interactive component that hints at the artificial intelligence focus within. Inside, the café might feature artwork or installations related to technology and AI, creating a stimulating environment for conversation and contemplation. The seating is comfortable and arranged to encourage both intimate gatherings and larger group discussions, with perhaps a special corner or stage area for tech talks, workshops, or presentations. The overall atmosphere is one of refined taste and intellectual curiosity, making "AI Tech Talk" a destination for those who appreciate the finer things in life and have a keen interest in the future of technology.

Aitechtalk的概念咖啡厅

经过几个方面的测试,Midjourney V6在绘画英文上表现不俗,从简单到复杂的场景都可渲染出文字细节。但其他语种包括汉字的文本设计上就无法更好的理解,输入的是“咒语”画出的是“咒符”。
  • prompt:Picture a stylish and sophisticated café, its large, bold sign reading "AI科技评论" making a statement in sleek, modern lettering against the backdrop of an elegantly designed façade. The café exudes a sense of upscale yet welcoming ambiance, attracting a clientele of tech aficionados and casual coffee lovers alike. The exterior is tastefully decorated, perhaps with a combination of natural wood and industrial materials, reflecting a blend of warmth and innovation. Large windows offer a transparent view into the interior, where the lighting is cozy and inviting, casting a soft glow on the chic furniture and decor. The "AI科技评论" sign is not just a name but a declaration of the café's theme, possibly featuring an element of technology like a digital display or interactive component that hints at the artificial intelligence focus within. Inside, the café might feature artwork or installations related to technology and AI, creating a stimulating environment for conversation and contemplation. The seating is comfortable and arranged to encourage both intimate gatherings and larger group discussions, with perhaps a special corner or stage area for tech talks, workshops, or presentations. The overall atmosphere is one of refined taste and intellectual curiosity, making "AI Tech Talk" a destination for those who appreciate the finer things in life and have a keen interest in the future of technology.
此外,即使是AI在文字识别方面也不是绝对的公平的,如果你简单的输入全球知名品牌,V6是几乎完美无误的给出了最佳答卷。
  • prompt:cocacola
当然,这里提到的是“几乎完美无缺”,如果想找出问题所在,可以拿起手边的星巴克,对比一下下面这张图的破绽在哪里。
  • prompt:Starbucks
此外,还有笑不露齿的KFC老爷爷
  • KFC
很难界定在AI的眼中文字和图像的分界线到底在哪里,越聪明越模糊。

Midjourney v5.2 VS Midjourney v6

v6是Midjourney从零开始训练而成的第三套模型,对比V5.2,其构图、色彩光影细节、以及物理材质的表达都比V5.2更加出色
1、prompt:Yangzhou Fried Rice

扬州炒饭

2、prompt:lady Photo booth

女性大头贴

3、prompt:chinese lady Photo booth

中国女性大头贴

4、prompt:Girl at the window.

窗前的女人

5、prompt:Girl with hair blowing in the wind.

长发飘飘的女人

讲述一个故事,让AI理解你

目前Midjourney V6,可以通过350个词以上的短文,做更详细的描述,反馈更接近真实图像。比如图中人物的每一件衣着打扮、举手投足每个动作,假定图片拍摄的逻辑,构图中的每一个结构细节。
简而言之,如果你擅长讲故事,就可以用讲故事的方式生成图,而后用图来向更多的人展示你讲的故事。
1、你可以描述自己正在地震搜救现场休息,拿着自己的手机坐在那随手一拍,心中祈愿大家平安
  • prompt:I captured a photograph while at the earthquake scene, where I am seated on the rubble. In the lens's frame, only my feet are visible, surrounded by the debris and remnants of the disaster. This perspective offers a personal glimpse into the aftermath, focusing on the point where I am physically connected to the scene, amidst the devastation.

愿地震中的同胞早日恢复正常生活

2、你可以在12月的东北玩雪,拍下的照片是上千次快门才能抓到的快乐瞬间
  • prompt:Envision a photograph you've captured of a snowy landscape in China's Northeast, the scene filled with the serenity and intensity of a heavy snowfall. In the image, your arm is extended towards the camera, your palm open and facing upwards as delicate snowflakes drift down from the grey, cloud-filled sky, landing softly on your skin. Each snowflake is unique, perhaps visible in detail against the contrasting backdrop of your glove or bare hand.Beyond your hand, the scene opens up to a vast expanse of a winter wonderland. Snow blankets everything in sight, covering trees, fields, and structures in a pristine white layer. The snow continues to fall heavily, blurring the lines between sky and land, creating a sense of quiet isolation and beauty. The world seems hushed and still, except for the dance of the snowflakes.

2023年末的南方小土豆都去北方看雪

3、这只是做个梦
  • prompt:Imagine a photograph you've taken, capturing a tender and intimate moment between you and your girlfriend. She is walking ahead of you, her black hair cascading down her back, a symbol of grace and movement. The focus of the image is her back, as she moves forward, perhaps slightly turned to the side, unaware or coyly acknowledging the camera. Your hand extends into the frame, reaching forward to gently grasp her hand, a gesture of connection and affection. The viewer can see only her back and your hand, creating a sense of closeness and companionship. There's a contrast in the image between the movement suggested by her walking and the stillness of the hand-holding moment, capturing the dynamic of your relationship in a single, frozen frame. The background might be softly blurred, emphasizing the focus on the two of you, with the details of the surrounding environment fading into the periphery. The photo tells a story of a shared journey, a moment of tenderness, and a personal connection that speaks louder than words. It's a snapshot of life, love, and the simple, yet profound act of walking together, hand in hand.

快牵住她的手!

4、龙门前关键的一脚拦截
  • prompt:A soccer player's first-person perspective as they swiftly approach the goal, maneuver past defenders, and take a powerful shot, sending the ball arching into the top corner of the net.

2023年12月23日英超曼城对布伦特福德延期比赛

5、喜怒哀乐四种情绪中的金发女性
  • prompt:The image features a 25-year-old woman with golden long hair against a pure white background. She is depicted displaying a spectrum of emotions:(happiness,) (anger)(sorrow)(joy) . Each emotion is vividly portrayed through her expressive facial features and body language, with a particular focus on the subtleties of her eyes and the flow of her hair. The stark white background accentuates her figure and the rich, dynamic expressions that cross her face, capturing the essence of each feeling.

喜怒哀乐四种表情

6、假设自己用摄像机拍下了一个合照,从整个画面的构图到人物的年龄到服饰
  • prompt:35mm film still, two-shot of a 50 year old black man with a grey beard wearing a brown jacket and red scarf standing next to a 20 year old white woman wearing a navy blue and cream houndstooth coat and black knit beanie. They are walking down the middle of the street at midnight, illuminated by the soft orange glow of the street lights --ar 7:5 --style raw --v 6.0

35mm的胶片记录两人对视一眼

7、在日本秋叶原的街头
  • prompt:In the image, imagine a person with a stout figure walking through Akihabara, Tokyo's bustling district known for its electronic stores and pop culture. The individual is wearing casual, comfortable clothing, perhaps adorned with vibrant anime graphics, reflective of the area's otaku culture. They might be seen carrying shopping bags filled with gadgets, manga, or anime merchandise, looking content and absorbed in the lively atmosphere of the streets lined with colorful signs and bustling with fellow enthusiasts. The surrounding scenery is a vivid array of neon lights and posters, emblematic of Akihabara's unique vibe.

这是本文作者心目中的自己

当然,Midjourney V6绝对不能说是完美,甚至不能说正式开始逐步代替人工了。拿出实验过程上千幅案例中,唯一让我可以描述为“难受”的案例:
  • prompt:In the scene, two adorable children, a boy and a girl, are playing by a riverside, completely immersed in their joyful activity. They are covered in mud from head to toe, a testament to their uninhibited exploration and fun. The boy, with a mischievous glint in his eyes, is in the midst of splashing in a shallow puddle, his laughter echoing in the air. Beside him, the girl, with a bright, carefree smile, is shaping the mud with her small hands, perhaps building castles or imaginary shapes. Their clothing is simple and casual, suited for play, and now adorned with the natural art of the riverside. The background captures the gentle flow of the river and the soft glow of the late afternoon sun, casting a warm, golden light on the scene, highlighting their youthful exuberance and the simple joys of childhood.

细节清晰,河面倒影细节清晰,光影拉满

在初次给出的图中,大部分细节都满足预期,因此直接进行二创,期望增加更多的面部细节,并且着重向“Chinese children”进行优化,此时出现翻车。

典型的扁平脸,眯眯眼

点到为止,AI生成的图片,毕竟我们还是只探讨技术问题。
Midjourney V6此次将AI与人之间的交互感推上了一个新的高度,成品的逻辑性和质量也树立了新的里程碑,但目前还远远未达到可以正式产能投入。

多人场景的复杂场景中容易出现不合逻辑比例

可以看出Midjourney V6大部分让人满意的模型优化都集中在近景和特写,对远景组合、人物表情的细节以及部分颜色搭配的理解就一言难尽。
除此之外,复杂场景下人物的头身比例,在到具体的动作,手部交互,握手牵手之间的区别都不尽人意。此外还有人种之间差异化处理目前也表现的不尽人意。

手部肢体一直是AI模型的硬伤

但值得肯定的是,自Midjourney上一个版本 V5.2发布半年以来又交出了一份高分的答卷,只不过AI这条路上满分线仍在几何级的增长中。

更多内容,点击下方关注:

未经「AI科技评论」授权,严禁以任何方式在网页、论坛、社区进行转载!


公众号转载请先在「AI科技评论」后台留言取得授权,转载时需标注来源并插入本公众号名片。

继续滑动看下一个
AI科技评论
向上滑动看下一个

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存