fix: skip storer-being-paid-now checks for replication receipts#130
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an explicit verification context to differentiate live client PUT payment verification from replication/repair receipt verification, so replication receipts aren’t rejected by “storer-being-paid-now” checks that become invalid as the network evolves (record counts grow, close-group churn, DHT changes).
Changes:
- Add
VerificationContext { ClientPut, Replication }and thread it throughPaymentVerifier::verify_paymentinto EVM and merkle verification paths. - Under
Replication, skip the present-tense checks (own-quote price freshness, local recipient, merkle candidate closeness) while keeping receipt authenticity + on-chain settlement checks. - Update PUT and replication handlers to pass the appropriate context; add unit tests validating the context-gated behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/storage/handler.rs | Passes VerificationContext::ClientPut for client PUT payment verification. |
| src/replication/mod.rs | Passes VerificationContext::Replication for replication PoP verification. |
| src/payment/verifier.rs | Adds VerificationContext, gates specific checks by context, and adds context-specific tests. |
| src/payment/mod.rs | Re-exports VerificationContext from the payment module. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
A proof-of-payment presented during replication is a receipt for a sale that already closed, but the verifier ran the full client-PUT check set against it. Two of those checks interrogate the present and therefore guarantee false rejections for replicated records: the own-quote price-freshness gate (record counts only grow, so every receipt's quoted price eventually drops below the verifier's live floor) and the local-recipient check (close groups churn, so a post-churn member receiving a record via replication was never a payee on the original receipt). The merkle candidate-closeness check has the same shape: it validates the winner pool against the live DHT, but the pool was sampled from the DHT of the original sale. On DEV-01 (2026-06-05) this rejected nearly 100% of replication proof-of-payment transfers within an hour of launch: 4M+ "PoP verification error ... stale" rejections at ~300k/hour, records pinned below target redundancy, close-group record counts diverging 150x (75..=11,231 per service), and a permanent ~500 MB/s fleet-wide re-offer storm (~25 TB egress in 16h). The divergence the rejections caused is also what made the client-PUT freshness gate (fixed for the heterogeneous-neighbour case in #127) keep firing: the two failure modes fed each other. Introduce VerificationContext { ClientPut, Replication } and thread it through verify_payment. Under Replication the verifier skips only the storer-being-paid-now checks (own-quote freshness, local recipient, merkle candidate closeness). Every receipt-authenticity check still runs in both contexts: quote structure, content binding to the exact address, peer-ID/ pub-key bindings, ML-DSA signatures, and the on-chain settlement lookup — a record cannot be admitted via replication without an authentic, settled payment for that record. The verified-XorName cache is context-aware to match: each entry records whether its verification ran the full client-PUT check set, a Replication-verified entry satisfies later replication lookups (re-offers of the same key are routine) but never a later ClientPut fast-path, and a full ClientPut verification upgrades the entry without ever being downgraded back. Without this, a replication receipt would let a later proof-less client PUT bypass the context-gated checks via the cache. Deliberate trade-off (documented on the enum): skipping the recipient and closeness checks for replication admits receipts from self-dealing payers who settle the median payment to their own wallet on-chain. The client-PUT path still rejects such pools, replication admission still requires responsibility for the key, and the abuse costs a settled on-chain payment per chunk; closing it properly belongs in quote issuance / payment policy rather than in the replication hot path, where the equivalent defence provably destroys the network's ability to heal. Call sites: the chunk PUT handler passes ClientPut (behaviour unchanged); the fresh-offer and paid-notify replication handlers pass Replication. Test results: payment::verifier 66/66 (5 new context tests: stale own quote and non-recipient receipts pass the gated checks under Replication, failing at the later binding/signature stage; content mismatch rejected under both contexts; duplicate-candidate merkle pool rejected under ClientPut but past the closeness check under Replication; Replication-verified cache entry does not satisfy a ClientPut fast-path, upgrades on full verification, never downgrades), replication 230/230, storage 29/29. cargo clippy --all-targets clean. Note: config::tests::test_bootstrap_peers_discover_env_var fails on machines with a real ~/.config/ant/bootstrap_peers.toml — pre-existing on main, unrelated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ecee0a6 to
5f8e81d
Compare
Code review — mostly good, one payment-cache concern + minor doc nitThanks for the testnet forensics and the clear context write-up — this is a real
|
|
Thanks — these were already addressed in 5f8e81d (the review ran against the pre-amend revision, ecee0a6).
Current branch state: payment::verifier 66/66 (5 context tests), replication 230/230, storage 29/29, clippy clean. |
Summary
A proof-of-payment presented during replication is a receipt for a sale that already closed, but the verifier ran the full client-PUT check set against it. Two of those checks interrogate the present and therefore guarantee false rejections for replicated records:
On DEV-01 (2026-06-05) this rejected nearly 100% of replication PoP transfers within an hour of launch: 4M+
PoP verification error ... stalerejections at ~300k/hour, records pinned below target redundancy, close-group record counts diverging 150× (75..=11,231 per service), and a permanent ~500 MB/s fleet-wide re-offer storm (~25 TB egress in 16h). The divergence the rejections caused is also what kept the client-PUT freshness gate firing — the two failure modes fed each other.Changes
VerificationContext { ClientPut, Replication }threaded throughverify_payment→verify_evm_payment/verify_merkle_payment.Replication, skip only the storer-being-paid-now checks: own-quote freshness, local recipient, merkle candidate closeness.XorNamecache is context-aware to match (raised by Copilot review): each entry records whether its verification ran the full client-PUT check set. AReplication-verified entry satisfies later replication lookups (re-offers of the same key are routine) but never a laterClientPutfast-path; a fullClientPutverification upgrades the entry and is never downgraded.ClientPut(behaviour unchanged); fresh-offer and paid-notify replication handlers →Replication.Test plan
cargo test --lib payment::verifier— 66/66, including 5 new context tests:ClientPut, passes the gate underReplication(fails at the later peer-binding stage, proving the skip);ClientPut, passes the gate underReplication(fails at the later signature stage);ClientPut, proceeds past it underReplication;Replication-verified cache entry: satisfies later replication lookups but never aClientPutfast-path; upgrades on full verification, never downgrades.cargo test --lib replication— 230/230;cargo test --lib storage— 29/29.cargo clippy --all-targetsclean;cargo fmtapplied.config::tests::test_bootstrap_peers_discover_env_varfails on machines with a real~/.config/ant/bootstrap_peers.toml— pre-existing onmain, unrelated to this change.🤖 Generated with Claude Code