Same Instruction File, Same Score, Completely Different Failures
Two AI coding agents were given the same task with the same 10-rule instruction file. Both scored 70% adherence. Here's the breakdown: Rule Agent A Agent B camelCase variables PASS FAIL No any type...

Source: DEV Community
Two AI coding agents were given the same task with the same 10-rule instruction file. Both scored 70% adherence. Here's the breakdown: Rule Agent A Agent B camelCase variables PASS FAIL No any type FAIL PASS No console.log FAIL PASS Named exports only PASS FAIL Max 300 lines PASS FAIL Test files exist FAIL PASS Agent A had a type safety gap. It used any for request parameters even though it defined the correct types in its own types.ts file. Agent B had a structural discipline gap. It used snake_case for a variable, added a default export following Express conventions over the project rules, and generated a 338-line file by adding features beyond the task scope. Same score. Completely different engineering weaknesses. That table came from RuleProbe. About this case study The comparison uses simulated agent outputs with deliberate violations, not live agent runs. Raw JSON reports are in the repo under docs/case-study-data/. This is documented in the case study. What RuleProbe is RulePro