Major
Computer Science
Anticipated Graduation Year
2026
Access Type
Open Access
Abstract
TLA+ is a programming language used by companies like Amazon and Microsoft to mathematically prove a system is correct via specifications. We evaluate 30 LLMs on 205 public TLA+ specifications, achieving only 8.6% semantic correctness. Model size doesn't predict quality, DeepSeek r1:8B outperforms it's 70B variant. ChatTLA+, our fine-tuned GPT-OSS-20B model, uses TLC model checking as a path toward reliable formal specification synthesis through reinforcement learning.
Faculty Mentors & Instructors
Konstantin Laufer, PhD, Computer Science; Mohammed Abuhamed, PhD, Computer Science; Taining Wang, PhD, Computer Science; Arslan Bisharat, PhD Student, Computer Science; Brian Ortiz, Masters, Computer Science
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.
ChatTLA+ - Can LLMs Write Correct TLA+ Specifications?
TLA+ is a programming language used by companies like Amazon and Microsoft to mathematically prove a system is correct via specifications. We evaluate 30 LLMs on 205 public TLA+ specifications, achieving only 8.6% semantic correctness. Model size doesn't predict quality, DeepSeek r1:8B outperforms it's 70B variant. ChatTLA+, our fine-tuned GPT-OSS-20B model, uses TLC model checking as a path toward reliable formal specification synthesis through reinforcement learning.