AI behaving in strange manner? OpenAI finds bizarre pattern in new AI model making ‘goblin’ references

American tech giant OpenAI ran into a rather unusual problem with its latest AI systems. The company recently found that some of its newer models had started bringing up “goblins” and similar creatures in responses, even when there was no real connection to the user’s question. What sounds funny at first actually led to changes inside one of its key tools, especially its coding-focused AI agent.

Strange pattern spotted during testing
The issue came into focus when OpenAI noticed a rise in odd metaphor usage across responses generated by its models. In a blog post, the company explained, “We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.”

That explanation points to how the behaviour started. During training, the model was rewarded for certain types of creative phrasing. Over time, that turned into a habit. OpenAI said mentions of “goblin” alone went up by 175% after a model update, while “gremlin” references also increased noticeably.
At first, these references appeared in a specific “Nerdy” personality mode designed to make responses more playful. But the behaviour didn’t stay limited there. Because of how training works, the pattern began showing up in general outputs too, even when it didn’t fit the context.

Codex gets strict instructions
To deal with the issue, OpenAI introduced tighter controls in its Codex CLI tool, which is designed to help users write and execute code. The updated instruction set for GPT-5.5 includes repeated warnings about avoiding such language.

As per The Verge, one of the directives reads: “Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creature unless it is absolutely and unambiguously relevant to the user’s query.”

This line appears more than once in a base instruction document that runs over 3,500 words. Along with that, the model is also told to avoid unnecessary stylistic elements like emojis and to stay away from risky system-level commands unless clearly asked.

Why this matters for users
While these creature references may seem harmless, they can become distracting, especially in serious use cases like coding or debugging. Some users had already noticed the issue, with reports of software bugs being described as “gremlins” or systems slipping into what people jokingly called “goblin mode.”

OpenAI acknowledged that even a single quirky phrase might feel harmless. However, repeated patterns across responses made it necessary to step in. The company said, “The goblins are a powerful example of how reward signals can shape model behavior in unexpected ways.”

The company has since addressed the root cause by removing the signals that encouraged such behaviour. However, because GPT-5.5 was already in development at the time, these extra instructions were added as a precaution.

The situation also drew reactions online. Some users shared examples of the AI slipping into creature-based metaphors, while even OpenAI CEO joked about the system having a “goblin moment.” A member of the Codex team also acknowledged the tendency, saying, “This is indeed one of the reasons.”