You tell an AI, 'respond in exactly three sentences.' Does it? You tell it, 'don't mention X,' does it avoid X? Adherence is whether the system actually does what you ask. It sounds simple. It's not. Adherence testing requires knowing what the instruction is and measuring if it was followed. Sometimes instructions are unambiguous (respond in three sentences). Often they're ambiguous (be professional, be creative, be accurate). What does professional mean? Different people have different definitions. The first challenge: does the system understand the instruction? Did it parse 'respond in JSON format' correctly or did it misunderstand? The second challenge: does it actually follow through? Even if it understood, it might forget mid-response or intentionally ignore the instruction. The third challenge: measuring. If the instruction is 'be helpful,' how do you measure adherence? You need rubrics. The rubric specifies: a helpful response includes X, Y, Z. A non-adherent response lacks these. Different instructions need different rubrics. Simple instructions (format, length, language) are easy to measure. Complex instructions (tone, perspective, complexity) are harder. Behavioral adherence is interesting. Does the system consistently adhere? Or does it adhere sometimes and not others? Consistency matters. Users trust systems that reliably follow instructions. Systems that sometimes follow instructions are unpredictable. Failure modes matter too. If the system fails to adhere, why? Does it misunderstand? Forget? Lack capability? Different root causes need different fixes. Synap's adherence eval framework lets you specify instructions and automatically measure whether AI systems follow them consistently across diverse inputs.
Why It Matters
Instruction adherence is foundational to usability. Users give instructions expecting them to be followed. If the system ignores instructions, it's basically uncontrollable. Building trustworthy systems requires reliable instruction following. It's also important for safety (you tell the system 'don't generate hateful content,' it should actually not).
Example
You tell an AI writing assistant: 'summarize this article in exactly 200 words.' Good adherence: it summarizes and lands at 195-205 words. Bad adherence: it summarizes but generates 350 words, ignoring the length constraint. Measuring adherence: count words, check if within bounds.