Presenter Information

Major

Computer Science

Anticipated Graduation Year

2026

Access Type

Open Access

Abstract

TLA+ is a programming language used by companies like Amazon and Microsoft to mathematically prove a system is correct via specifications. We evaluate 30 LLMs on 205 public TLA+ specifications, achieving only 8.6% semantic correctness. Model size doesn't predict quality, DeepSeek r1:8B outperforms it's 70B variant. ChatTLA+, our fine-tuned GPT-OSS-20B model, uses TLC model checking as a path toward reliable formal specification synthesis through reinforcement learning.

Faculty Mentors & Instructors

Konstantin Laufer, PhD, Computer Science; Mohammed Abuhamed, PhD, Computer Science; Taining Wang, PhD, Computer Science; Arslan Bisharat, PhD Student, Computer Science; Brian Ortiz, Masters, Computer Science

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Share

COinS
 

ChatTLA+ - Can LLMs Write Correct TLA+ Specifications?

TLA+ is a programming language used by companies like Amazon and Microsoft to mathematically prove a system is correct via specifications. We evaluate 30 LLMs on 205 public TLA+ specifications, achieving only 8.6% semantic correctness. Model size doesn't predict quality, DeepSeek r1:8B outperforms it's 70B variant. ChatTLA+, our fine-tuned GPT-OSS-20B model, uses TLC model checking as a path toward reliable formal specification synthesis through reinforcement learning.