11 Jun 2026

NIST study questions limits of fixed AI safety guardrails

A new study from the US National Institute of Standards and Technology argues that no finite set of AI safety rules can fully protect AI systems from all possible adversarial prompts, highlighting the need for continuous security management.

A researcher at the US National Institute of Standards and Technology has published a study suggesting that fixed AI guardrails cannot provide complete protection against adaptive prompt-based attacks.

The paper, authored by NIST senior scientist Apostol Vassilev and published in IEEE Security & Privacy, applies concepts related to Kurt Gödel’s incompleteness theorems to AI security. It argues that no finite set of safety rules can be universally effective against every possible adversarial prompt that may be developed over time.

According to the study, attackers can continuously devise new prompt combinations and techniques that fall outside predefined safety controls. As a result, AI systems cannot rely solely on static guardrails as a comprehensive defence mechanism.

NIST emphasised that the findings do not imply that AI systems cannot be secured. Instead, the study supports a security model based on continuous testing, monitoring, and adaptation rather than one-time implementation of safeguards.

The paper recommends ongoing red-team exercises to identify vulnerabilities, regular updates to safety mechanisms, and operational resilience measures that limit the consequences of successful attacks. These measures include monitoring for misuse, detecting anomalous behaviour, and improving recovery capabilities after incidents.

The study also argues that AI security should be approached in a manner similar to broader cybersecurity practices, with the objective to reducing risk and raising the cost of attacks rather than eliminating vulnerabilities entirely.

The study concludes that organisations deploying AI systems should not expect guardrails to provide permanent protection. Instead, NIST recommends treating AI security as an ongoing process that includes continuous testing, updating safeguards, and monitoring for new forms of adversarial behaviour.