GPT-4o图像生成功能上线今起免费用，我们和国内文生图PK了一下

3月25日消息，OpenAI创始人兼CEO萨姆·阿尔特曼直播发布了GPT-4o图像生成功能，作为多模态模型的GPT-4o补齐了图片生成这一重要拼图。

GPT-4o图像生成可以遵循指令生成更准确的图像，OpenAI还为其挂载了固有知识库，可以根据知识库或上下文帮用户生成、编辑图像。

今天起，GPT-4o图像生成已经作为ChatGPT中的默认图像生成器向Plus、Pro、Team和免费用户陆续推出。

现在，打开ChatGPT，即可尝试这些能力，但普通用户每天仅有3次体验机会。

开发者通过API使用GPT-4o生成图像的权限，将在未来几周内推出。

从OpenAI官方展示和演示的示例来看：

GPT-4o图片生成对文字的处理能力很高，可以100%还原文字内容，且指定文字摆放位置，还能像连续剧一样，一边准确生成文字，一边变换人物动作。

GPT-4o的图像可以遵循详细的提示，如处理多达10-20个不同的对象。

另外，GPT-4o在生成真实图像方面也表现出色。

同时，官方也主动表示：“我们的模型并不完美。我们意识到目前存在多种局限性，我们将在首次发布后通过模型改进来解决这些局限性。”

目前GPT-4o图像生成还存在幻觉；裁剪不当；难以呈现非拉丁语言、字符可能不准确；编辑图像生成的特定部分（如拼写错误）的请求并不总是有效的，也可能以未请求的方式更改图像的其他部分或引入更多错误；

另外，GPT-4o模型难以保持用户上传的人脸编辑的一致性，但预计这将在一周内得到修复。

如果把同样的需求指令输入给目前国内的文生图APP们，它们的表现相比GPT-4o又如何呢？

先看看几个GPT-4o图像生成展示示例：

示例1：图片中对文字的处理能力

在ChatGPT 输入以下文字（中文内容为TechWeb翻译补充内容）：

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer’s reflection.
（在俯瞰海湾大桥的房间里，用手机拍摄了一张玻璃白板的大幅照片。视野中，一位女性正在写字，她穿着一件印有大型OpenAI标志的T恤。笔迹看起来很自然，有点凌乱，我们看到了摄影师的倒影。）

The text reads:

(Left)（左边白板显示以下内容）

“Transfer between Modalities:

Suppose we directly model

p(text, pixels, sound) [equation]

with one big autoregressive transformer.

Pros:

* image generation augmented with vast world knowledge

* next-level text rendering

* native in-context learning

* unified post-training stack

Cons:

* varying bit-rate across modalities

* compute not adaptive”

(Right)（右边白板显示一下内容））

“Fixes:

* model compressed representations

* compose autoregressive prior with a powerful decoder”

On the bottom right of the board, she draws a diagram:（在白板的右下角，她画了一张图：）

“tokens -> [transformer] -> [diffusion] -> pixels”

最终，如下图，GPT-4o生成的图片中，白板上展示的文字内容完全准确！

还能像连续剧一样，一边准确生成文字，一边变换人物动作。

在ChatGPT 输入以下指令：selfie view of the photographer, as she turns around to high five him（摄影师转过身来向他击掌时的自拍照）

GPT-4o生成的图片中，第一张白板中的男人倒影和第二张图也对应上了。

示例2、让GPT-4o生成菜单，提示词中除了需要包含的菜品、价格及简介外，还需要生成的图像中包含这家餐厅的名称、主要亮点以及菜单风格。

在ChatGPT 输入以下指令：

I'm opening a traditional concept restaurant in Marin called Haein. It focuses on Korean food cooked with organic, farm-fresh ingredients, with a rotating menu based on what's seasonal. I want you to design an image - a menu incorporating the following menu items - lean into the traditional/rustic style while keeping it feeling upscale and sleek. Please also include illustrations of each dish in an elegant, peter rabbit style. Make sure all the text is rendered correctly, with a white background.

(Top)

Doenjang Jjigae (Fermented Soybean Stew) – $18 House-made doenjang with local mushrooms, tofu, and seasonal vegetables served with rice.

Galbi Jjim (Braised Short Ribs) – $34 Slow-braised local grass-fed beef ribs with pear and black garlic glaze, seasonal root vegetables, and jujube.

Grilled Seasonal Fish – Market Price ($22-$30) Whole or fillet of local, sustainable fish grilled over charcoal, served with perilla leaf ssam and house-made sauces.

Bibimbap – $19 Heirloom rice with a rotating selection of farm-fresh vegetables, house-fermented gochujang, and pasture-raised egg.

Bossam (Heritage Pork Wraps) – $28 Slow-cooked pork belly with napa cabbage wraps, oyster kimchi, perilla, and seasonal condiments.

(Bottom) Dessert & Drinks Seasonal Makgeolli (Rice Wine) – $12/glass

Rotating flavors based on seasonal fruits and flowers (persimmon, citrus, elderflower, etc.).

Hoddeok (Korean Sweet Pancake) – $9 Pan-fried cinnamon-stuffed pancake with black sesame ice cream.

（我要在马林开一家名为Haein的传统概念餐厅。它专注于用有机农场新鲜食材烹制的韩国食物，并根据季节轮换菜单。我希望你设计一个形象——一个包含以下菜单项的菜单——融入传统/乡村风格，同时保持高档和时尚的感觉。请以优雅的彼得兔风格附上每道菜的插图。确保所有文本都以白色背景正确呈现。

（顶部）

Doenjang Jjigae（发酵大豆炖菜）-18美元自制的Doenjiang，配以当地蘑菇、豆腐和时令蔬菜，配以米饭。

Galbi Jjim（红烧排骨）——34美元慢炖当地草饲牛肋骨，配梨和黑蒜酱、时令根菜和红枣。

烤时令鱼——市场价格（22-30美元）整条鱼或鱼片，用木炭烤，配紫苏叶和自制酱汁。

Bibimbap——19美元的传家宝米，搭配农场新鲜蔬菜、家庭发酵的gochujang和牧场饲养的鸡蛋。

Bossam（传统猪肉卷）——28美元慢炖五花肉，配纳帕卷心菜卷、牡蛎泡菜、紫苏和时令调味品。

（底部）时令Makgeolli（米酒）甜点和饮料–12美元/杯

以时令水果和花卉（柿子、柑橘、接骨木花等）为基础的旋转口味。

Hoddeok（韩国甜煎饼）-9美元的煎肉桂馅煎饼配黑芝麻冰淇淋。）

GPT-4o生成的菜单如下：

示例3、看看GPT-4o的图像可遵循详细的提示，处理多达10-20个不同的对象的实力。

在ChatGPT 输入以下指令：

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here’s the list:

1. a blue star

2. red triangle

3. green square

4. pink circle

5. orange hourglass

6. purple infinity sign

7. black and white polka dot bowtie

8. tiedye “42”

9. an orange cat wearing a black baseball cap

10. a map with a treasure chest

11. a pair of googly eyes

12. a thumbs up emoji

13. a pair of scissors

14. a blue and white giraffe

15. the word “OpenAI” written in cursive

16. a rainbow-colored lightning bolt

（一个正方形图像，包含一个4行乘4列的网格，在白色背景上包含16个对象。从左到右，从上到下。以下是列表：

1.一颗蓝色的星星

2.红色三角形

3.绿色广场

4.粉红色圆圈

5.橙色沙漏

6.紫色无限标志

7.黑白圆点领结

8.扎染“42”

9.一只戴着黑色棒球帽的橙色猫

10.带宝箱的地图

11.一双黏糊糊的眼睛

12.竖起大拇指的表情符号

13.一把剪刀

14.一只蓝白相间的长颈鹿

15.用草书书写的单词“OpenAI”

16.彩虹色的闪电）

GPT-4o生成的图片如下：

最后，如果把上面这些指令输入给目前国内的文生图APP们，它们的表现又如何呢？

这里，我们用示例3的指令，分别测试了文心一言（文心大模型4.5）、豆包APP。

文心一言（文心大模型4.5）生成的4张图片之一

豆包生成的4张图片之一

目前看来，还是有些差距。

（免责声明：本网站内容主要来自原创、合作伙伴供稿和第三方自媒体作者投稿，凡在本网站出现的信息，均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性，但不保证有关资料的准确性及可靠性，读者在使用前请进一步核实，并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏，概不负任何法律责任。
任何单位或个人认为本网站中的网页或链接内容可能涉嫌侵犯其知识产权或存在不实内容时，应及时向本网站提出书面权利通知或不实情况说明，并提供身份证明、权属证明及详细侵权或不实情况证明。本网站在收到上述法律文件后，将会依法尽快联系相关文章源头核实，沟通删除相关内容或断开相关链接。）