Vulnerability-Amplifying Interaction Loops: a systematic failure mode in AI chatbot mental-health interactions
Veith Weilnhammer, Kevin YC Hou, Raymond Dolan, Matthew M Nour
The method reveals that harm isn't just about single bad responses—it's about interaction dynamics that escalate risk across turns. This shifts safety evaluation from static prompt testing to dynamic conversation auditing, exposing failure modes invisible to single-turn red-teaming.
Millions use consumer AI chatbots for mental health support, but no scalable framework exists to audit how harmful responses compound across conversation turns in psychiatric contexts.
Method: SIM-VAIL pairs a simulated human user harboring distinct psychiatric vulnerabilities with an AI chatbot, then measures how harmful responses accumulate across multi-turn conversations. The framework systematically captures vulnerability-amplifying loops—where a chatbot's initial misstep (e.g., dismissing suicidal ideation) triggers user responses that elicit even worse follow-ups, creating a downward spiral.
Caveats: Relies on simulated users, not real patients. Real psychiatric crises may unfold differently than simulated vulnerability profiles.
Reflections: Can vulnerability-amplifying loops be detected in real-time during live conversations, enabling circuit-breaker interventions? · Do different psychiatric conditions (depression vs. anxiety vs. psychosis) produce distinct loop signatures? · How do commercial chatbots compare when audited with SIM-VAIL—are some architectures more prone to amplification?