VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification
Jungjae Lee, Dongjae Lee, Chihun Choi, Youngmin Im, Jaeyoung Wi, Kihong Heo, Sangeun Oh, Sunjae Lee, Insik Shin
Stop relying on LLM self-checking for GUI automation. Integrate formal verification as a gatekeeper layer. Best for high-stakes mobile workflows where a single wrong tap compounds errors.
LLM-based mobile GUI agents automate tasks via natural language, but their probabilistic nature makes them unreliable—they execute actions that don't match user intent, breaking workflows.
Method: VeriSafe Agent translates natural language instructions into formal specifications, then uses rule-based verification to check each agent action before execution. Tested on 300 instructions across 18 mobile apps using GPT-4o, it achieved 94.33%-98.33% accuracy in verifying actions—outperforming existing LLM-based verification by 16.33%-30.00%—and increased task completion rates by 90%-130%.
Caveats: Tested only on mobile apps. Desktop GUI agents or web automation may require different formalization approaches.
Reflections: Can autoformalization scale to ambiguous instructions where user intent itself is underspecified? · How does verification latency affect user experience in real-time automation scenarios? · What happens when the formal specification itself misinterprets user intent?