

What Does an AI Do When It Sees an Optical Illusion?

Part 1
When Dimitris Papailiopoulos first asked ChatGPT to interpret colours in images, he was thinking about “the dress”—the notoriously confusing optical-illusion photograph that took the internet by storm in 2015. Papailiopoulos, an associate professor of computer engineering at the University of Wisconsin–Madison, studies the type of artificial intelligence that underlies chatbots such as OpenAI's ChatGPT and Google's Gemini. He was curious about how these AI models might respond to illusions that trick the human brain.
当季米特里斯・帕帕伊利奥普洛斯首次让ChatGPT解读图像中的颜色时,他想到的是“那条裙子”——2015年风靡互联网的那张著名的令人困惑的视错觉照片。帕帕伊利奥普洛斯是威斯康星大学麦迪逊分校计算机工程副教授,他研究的是OpenAI的ChatGPT和谷歌的Gemini等聊天机器人背后的人工智能类型。他很好奇这些人工智能模型会如何应对欺骗人脑的视错觉。
Part 2
The human visual system is adapted to perceive objects as having consistent colours so that we can still recognise items in different lighting conditions. To our eyes, a leaf looks green in bright midday and in an orange sunset—even though the leaf reflects different light wavelengths as the day progresses. This adaptation has given our brain all sorts of nifty ways to see false colours, and many of these lead to familiar optical illusions, such as checkerboards that seem consistently patterned (but aren't) when shadowed by cylinders—or objects such as Coca-Cola cans that falsely appear in their familiar colours when layered with distorting stripes.
人类视觉系统能适应将物体感知为颜色一致,从而使我们能在不同光照条件下识别物体。对我们来说,一片叶子在明亮的正午和橙色的日落时分看起来都是绿色的——尽管随着时间的推移,叶子反射的光波长不同。这种适应性使大脑掌握了各种巧妙的方法,让我们能够看到错误的颜色,其中许多方法导致了我们熟悉的视错觉,例如,被圆柱体遮挡后看似图案规整(实则不然)的棋盘——或者像可口可乐罐这样的物体,当可口可乐罐表面叠加了带有扭曲条纹的图案时,会错误地呈现出我们熟悉的颜色。
Part 3
In a series of tests, Papailiopoulos observed that GPT-4V (a recent version of ChatGPT) seems to fall for many of the same visual deceptions that fool people. The chatbot's answers often match human perception by not identifying the actual colour of the pixels in an image but describing the same colour that a person likely would. That was even true with photographs that Papailiopoulos created, such as one of sashimi that still looks pink despite a blue filter. This particular image, an example of what's known as a colour-constancy illusion, hadn't previously been posted online and therefore could not have been included in any AI chatbot's training data.
在一系列测试中,帕帕伊利奥普洛斯观察到,GPT-4V(ChatGPT的最新版本)似乎容易落入许多能愚弄人类的相同视觉陷阱。聊天机器人的答案通常与人类的感知相符,它不会去识别图像中像素的实际颜色,而是描述出人类可能认为的颜色。帕帕伊利奥普洛斯创作的照片也存在类似情况,比如一张生鱼片的照片,尽管有蓝色滤镜,但看起来仍然是粉红色的。这张特殊图像是典型的颜色恒常错觉示例,此前并未在网络上发布过,因此不可能包含在任何人工智能聊天机器人的训练数据中。
Part 4
“This was not a scientific study,” Papailiopoulos notes—just some casual experimentation. But he says that the chatbot's surprisingly humanlike responses don't have clear explanations. At first, he wondered whether ChatGPT cleans raw images to make the data it processes more uniform. OpenAI told Scientific American in an e-mail, however, that ChatGPT does not fine-tune the colour temperature or other features of an input image before GPT-4V interprets it. Without that straightforward explanation, Papailiopoulos says it's possible that the vision-language transformer model has learned to interpret colour in context, assessing the objects within an image in comparison to each other and evaluating pixels accordingly, similar to what the human brain does.
“这不是一项科学研究,”帕帕伊利奥普洛斯指出——这不过是一些随性的实验。但他表示聊天机器人表现出令人惊讶的类似人类的反应,目前没有明确的解释。起初,他想知道ChatGPT是否会清理原始图像以使处理的数据更规整。然而,OpenAI在一封电子邮件中告诉《科学美国人》杂志,在GPT-4V对输入图像进行解释之前,ChatGPT不会微调输入图像的色温或其他特征。帕帕伊利奥普洛斯表示,若没有这种直接明确的解释,视觉语言transformer模型可能已经学会了在上下文中解释颜色,通过相互对比评估图像中的各个对象,并据此对像素进行评估,类似于人脑所做的。
Part 5
Blake Richards, an associate professor of computer science and neuroscience at McGill University, agrees the model could have learned colour contextually like humans do, identifying an object and responding to how that type of item generally appears. In the case of “the dress,” for instance, scientists think that different people interpreted the colours in two disparate ways (as gold and white or blue and black) based on their assumptions about the light source illuminating the fabric.
麦吉尔大学计算机科学和神经科学副教授布莱克·理查兹认为,该模型或许能像人类一样,基于上下文来学习颜色,识别物体,并根据该类型物品通常呈现的样子作出相应反应。就“裙子”事件而言,科学家认为不同的人根据他们对照亮织物的光源的假设,以两种不同的方式解读颜色(如金色和白色或蓝色和黑色)。
Part 6
The fact that an AI model can interpret images in a similarly nuanced way informs our understanding of how people likely develop the same skill set, Richards says. “It tells us that our own tendency to do this is almost surely the result of simple exposure to data,” he explains. If an algorithm fed lots of training data begins to interpret colour subjectively, it means that human and machine perception may be closely aligned—at least in this one regard.
理查兹说人工智能模型能够以同样细致入微的方式解读图像,这一事实有助于我们理解人们是如何形成这种相同技能(解读图像技能)的。“它告诉我们,我们自身出现这种倾向几乎肯定是单纯接触数据的结果,”他解释道。如果输入大量训练数据的算法开始以主观的方式解读颜色,这意味着人类和机器的感知至少在这一方面可能高度契合。
重点词汇
notoriously adv. 众所周知地;声名狼藉地
optical adj. 视觉的;光学的
underlie v.构成……的基础
deception n. 欺骗;骗局;假象
pixel n. 像素
constancy n. 稳定性;恒定性
fine-tune vt. 微调;精调
往期推荐
文章来源丨Scientific American: What Does an AI Do When It Sees an Optical Illusion?
图片来源丨谷歌图片
译者丨赵敏
译审丨王春渝、陈凌霄
复审丨李小辉
执行编辑丨赵敏
审核编辑丨王春渝

夜雨聆风