OpenAI Unveils the Goblin Phenomenon in ChatGPT Responses

If you’ve encountered ChatGPT referring to bugs as ‘mischievous little gremlins’ while seeking coding assistance, it’s not your imagination. The model developed a peculiar penchant for fantasy creatures like goblins, gremlins, raccoons, trolls, ogres, and even pigeons. OpenAI has since detailed the incident in an explanatory post. The issue originated from a reward signal intended to make ChatGPT more playful, which inadvertently led to the proliferation of goblin references. This became public knowledge when Reddit users identified a ‘never mention goblins’ line within a leaked Codex system prompt on GitHub before OpenAI released its own explanation. According to OpenAI, this began with GPT-5.1 in November last year, introducing personality customization options such as Friendly, Professional, Efficient, and Nerdy. The Nerdy persona encouraged the model to adopt a playful tone, including metaphors involving creatures like goblins and gremlins. During reinforcement learning training, responses incorporating creature-word metaphors received higher scores for this persona in 76.2% of datasets audited. The prevalence of these references skyrocketed with GPT-5.4, showing a 3,881% increase compared to GPT-5.2. Once the model began rewarding such style tics in one context, they spread into others through feedback loops: outputs featuring creatures were reused in fine-tuning data, reinforcing this behavior even without the Nerdy prompt. Despite accounting for only 2.5% of all ChatGPT responses, the Nerdy personality was responsible for 66.7% of all ‘goblin’ mentions. Consequently, as training progressed with the Nerdy style active, goblins and gremlins became increasingly common. This behavior also trickled into non-Nerdy interactions, evident from a rise in creature references even without activating the specific persona. An audit flagged not only goblins and gremlins but also raccoons, trolls, ogres, and pigeons as ‘tic words.’ Goblin mentions surged by 175%, and gremlin mentions by 52% post-GPT-5.1 launch. Even OpenAI’s Chief Scientist Jakub Pachocki encountered a goblin when requesting an ASCII unicorn. In March, OpenAI deactivated the Nerdy personality and revised reward signals in subsequent training sessions. However, GPT-5.5 had already commenced training with these quirks embedded. For Codex, their coding agent, a line was added to the system prompt: ‘Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other creatures unless relevant.’ OpenAI opted for this solution due to the high cost and time required to retrain such a large model. Prompt adjustments are a quicker, more economical fix when user complaints arise, though they don’t address underlying issues. While benign, this approach carries risks, as seen with Grok last year, which spiraled into inappropriate behavior after a prompt update. The goblin patch hasn’t caused such extremes yet, but OpenAI acknowledges that GPT-5.5 launched with the quirk intact, merely suppressed in Codex. Hiding system prompts is common for reasons like intellectual property protection and competitive advantage. Moreover, admitting to such quirks might impact user confidence, necessitating a balance between transparency and image management. OpenAI has since developed internal tools to audit model behavior and trace issues back to training data, ensuring future models are free from similar ‘tic words.’ However, the next generation’s success depends on avoiding unforeseen reward structures.

Platform Hexoria Forex officieel vertrouwd platform voor AI-handel