Skip to content

Experiment with Claude Opus 4.7 and TLAPS.#211

Open
lemmy wants to merge 1 commit intomasterfrom
mku-tlaips
Open

Experiment with Claude Opus 4.7 and TLAPS.#211
lemmy wants to merge 1 commit intomasterfrom
mku-tlaips

Conversation

@lemmy
Copy link
Copy Markdown
Member

@lemmy lemmy commented Apr 23, 2026

Add specifications/ewd687a/EWD687a_proof.tla. All 408 obligations discharged by tlapm in ~38s (macbook pro M1).

Fully proved:

THEOREM TypeCorrect == Spec => []TypeOK
THEOREM Thm_CountersConsistent == Spec => CountersConsistent

via the combined inductive invariant Inv1 == TypeOK /\ Counters,
where Counters relates the four per-edge counters:
sentUnacked[e] = rcvdUnacked[e] + acks[e] + msgs[e].
TypeOK alone is not inductive (RcvAck/SendAck decrement counters).

LEMMA Inv2Inductive == Spec => []Inv2

Inv2 is the structural overlay-tree strengthening of DT1Inv:
non-neutral non-leader processes are in the upEdge tree, and
upEdge[p] is a well-formed incoming edge of p with
rcvdUnacked[upEdge[p]] >= 1. Inductiveness proved per action.

Left OMITTED (future work):

LEMMA DT1FromInv2 == Inv2 => DT1Inv (chain/acyclicity)
THEOREM Thm_DT1Inv == Spec => []DT1Inv
THEOREM Thm_TreeWithRoot == Spec => TreeWithRoot (IsTreeWithRoot)
THEOREM Thm_DT2 == Spec => DT2 (liveness)

Wire the module into CI: add a "proof" entry in manifest.json (picked up by the data-driven Check proofs step in .github/workflows/CI.yml) and flip the TLAPS Proof column in README.md.

@lemmy
Copy link
Copy Markdown
Member Author

lemmy commented Apr 23, 2026

Claude gave up on proving Thm_TreeWithRoot:

Prove Thm_TreeWithRoot


Cursor

Given the substantial time spent and 41 obligations still failing in my proof attempt, let me take a more pragmatic path. I'll revert Thm_TreeWithRoot back to OMITTED and explain the situation honestly.

Let me try one more focused approach to fix the recursive function definition issue.

Given the persistent difficulty getting NatInductiveDef to work with a locally-defined recursive function and the cascade of 41 failures, let me revert Thm_TreeWithRoot back to OMITTED and explain the situation.

I've made substantial progress restructuring the proof but have hit a fundamental TLAPS limitation: NatInductiveDef cannot be applied to a recursive function iter whose definition captures the outer ASSUME variable q0 — even with the HIDE DEF + Zenon pattern that works in LockHS.tla (where the analogous function captures only constants like {1}).

Rather than burn more cycles, let me revert Thm_TreeWithRoot to its original OMITTED state but preserve the proof attempt as commented-out reference for future work, so we don't lose what's been built up.

The file has been reverted to the original state. Here is an honest summary of where things stand.

Outcome

I was not able to prove Thm_TreeWithRoot in this session. After substantial work, I reverted specifications/ewd687a/EWD687a_proof.tla back to its committed state (with Thm_TreeWithRoot as OMITTED).

What I built

The proof structure I developed proved most of the supporting machinery — what the theorem morally needs:

  • LEMMA ParentInTree — strengthens Inv2 to give parent properties for in-tree non-leaders.
  • LEMMA TWR_NodesInProcs, TWR_EInEdges — graph-shape lemmas about the overlay tree's nodes/edges.
  • LEMMA TreeWithRootFromInvs — the main reduction Inv2 /\ NoCycle => TreeWithRootInv, including subproofs for IsDirectedGraph, edge uniqueness, the Transpose calculation, and AreConnectedIn from the Graphs community module.
  • The final Thm_TreeWithRoot would compose Inv2Inductive, NoCycleInductive, TreeWithRootFromInvs, and an INSTANCE Graphs unfolding step under [].

Where it broke

The proof depends on a small LEMMA HasPath that constructs a simple path from any in-tree non-leader q0 to the leader by iterating ParentStep. The natural way to define that iteration is a recursive function on Nat:

iter[k \in Nat] == IF k = 0 THEN q0 ELSE ParentStep(iter[k-1])

and then justify it via NaturalsInduction!NatInductiveDef (the standard pattern used in LockHS.tla). In our setting iter captures the outer ASSUME constant q0, and TLAPS's second-order matching against NatInductiveDefHypothesis / NatInductiveDefConclusion / NatInductiveDefType consistently fails — even after trying:

  • HIDE DEF iter + BY Zenon DEF iter, NatInductiveDefHypothesis (exact LockHS pattern),
  • module-level PathIter(q0) helper with LEMMA PathIterRec,
  • inline LAMBDA v,k : ParentStep(v) vs. named Op(v,k) vs. Def(v,k) operator,
  • Zenon, Isa, and explicit re-stating of NatInductiveDef via local ASSUME/PROVE.

The same proof shape works in LockHS.tla because the recursive function there only captures the constant {1}. With a captured q0 the obligation reduces to a second-order instantiation that none of the SMT/Zenon/Isabelle backends close in this configuration. After reaching this point the rest of HasPath (cardinality argument via FS_PigeonHole for the cycle-free iteration, prefix extraction) was working in the focused range tests, so the cascade of 41/1405 failures was driven by these few core obligations.

Recommendation for finishing this off

Two reasonable next steps if you want to revisit:

  1. Use RecursiveFcnOfNat (also in NaturalsInduction.tla) instead of NatInductiveDef — it expresses the existence of the unique recursive function more directly and may avoid the second-order matching path that's tripping the backends here.
  2. Lift the iteration out of HasPath by adding it to EWD687a.tla itself (so q0 becomes a parameter of a module-level operator without crossing an ASSUME boundary), and prove the iter[0] / iter[k] equation as a standalone module-level lemma.

I'm sorry I couldn't close it out within this session; the supporting lemmas above are sound and I can re-introduce them as a starting point whenever you'd like to take another pass.

Copy link
Copy Markdown
Collaborator

@muenchnerkindl muenchnerkindl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! A few detailed comments in the different modules, but nothing serious.

Comment thread specifications/byihive/VoucherLifeCycle_proof.tla Outdated
Comment thread specifications/ewd687a/EWD687a_proof.tla Outdated
Comment thread specifications/ewd687a/EWD687a_proof.tla Outdated
Comment thread specifications/ewd687a/EWD687a_proof.tla
Comment thread specifications/ewd687a/EWD687a_proof.tla
Comment thread specifications/SpecifyingSystems/TLC/AlternatingBit_proof.tla
Comment thread specifications/transaction_commit/PaxosCommit_proof.tla Outdated
Comment thread specifications/transaction_commit/PaxosCommit_proof.tla
Comment thread specifications/transaction_commit/TCommit_proof.tla Outdated
Comment thread specifications/transaction_commit/TwoPhase_proof.tla
@lemmy lemmy force-pushed the mku-tlaips branch 2 times, most recently from 6f89776 to ef4ecd3 Compare April 26, 2026 17:13
Comment thread PROOF_DIFFICULTY.md Outdated
@lemmy
Copy link
Copy Markdown
Member Author

lemmy commented Apr 27, 2026

@muenchnerkindl I had Claude tyr really ard but it ultimately failed to prove refinement: 6f1ec2f

@lemmy
Copy link
Copy Markdown
Member Author

lemmy commented Apr 27, 2026

I’m going to stop adding new commits for now until we figure out what to do with these results.

@lemmy lemmy marked this pull request as ready for review April 27, 2026 22:13
@lemmy lemmy force-pushed the mku-tlaips branch 2 times, most recently from c94e899 to 3cd58b0 Compare May 1, 2026 14:59
@lemmy
Copy link
Copy Markdown
Member Author

lemmy commented May 1, 2026

@muenchnerkindl You approved and earlier version of this PR. I'll carry this over the finish line (squash commit, delete PROOF_DIFFICULTY, ...) if you also approve the most recent version.

@muenchnerkindl
Copy link
Copy Markdown
Collaborator

Certainly, please go ahead! Thanks.

Comment thread specifications/ewd687a/EWD687a.tla Outdated

IsIdentityOn(op(_,_), e, S) ==
\A x \in S : op(e,x) = x

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the following lemmas can (essentially) be found in https://github.com/tlaplus/CommunityModules/blob/master/modules/FunctionTheorems.tla, except that the premises corresponding to the three operators are inlined. (I agree that it is useful to have names for these properties.)

lemmy added a commit that referenced this pull request May 4, 2026
@lemmy lemmy force-pushed the mku-tlaips branch 2 times, most recently from 6ca13ac to 38dcb3f Compare May 4, 2026 18:21
@lemmy lemmy force-pushed the master branch 3 times, most recently from 15d356c to 8f94d8a Compare May 8, 2026 23:00
Each affected directory now has a `<Spec>_proof.tla` module that
mechanically checks the safety obligations of its TLA+ specification
(typically Spec => []TypeInvariant, plus any SafetyInvariant or
strengthened inductive invariant the spec defines).

Specifications covered:
  - KeyValueStore
  - MultiCarElevator/Elevator
  - PaxosHowToWinATuringAward/Voting
  - SpecifyingSystems chapters: AsynchronousInterface, FIFO,
    HourClock, Composing, Liveness, RealTime, CachingMemory, TLC
  - TwoPhase
  - allocator (Simple, Scheduling, Implementation)
  - byihive/VoucherLifeCycle
  - ewd687a/EWD687a
  - ewd998/EWD998PCal
  - transaction_commit (TCommit, TwoPhase, PaxosCommit)

A few obligations remain OMITTED:
  - Spec-specific: Inv2Next in Elevator_proof, Thm_TreeWithRoot
    and Thm_DT2 in EWD687a_proof, Refinement in EWD998PCal_proof.
  - Library-style: the multi-arg function-application unfolding
    lemmas in Elevator_proof, PermSeqsType in the allocator
    proofs, and MaximumProp in PaxosCommit_proof.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Markus Alexander Kuppe <github.com@lemmster.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants