Fix soundness issues in translated benchmark statements#9
Open
MantasBaksys wants to merge 1 commit into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates a collection of Dafny lemma statements in the dataset to correct formalization issues such as operator precedence, missing preconditions, type mismatches, and incorrect expected values. It also extends definitions.dfy with a new zmod_inv axiom and tightens the spec for divisors.
Changes:
- Fixed mathematical formalizations across many
valid/andtest/lemmas (parenthesization, types, preconditions, corrected RHS values). - Refactored
divisorsaxiom to require positivity and added a newzmod_invaxiom indefinitions.dfy. - Adjusted
mathd_numbertheory_764to use modular arithmetic viazmod_invrather than real-valued fractions.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| dataset/definitions.dfy | Tightened divisors postcondition to positive divisors; added zmod_inv axiom. |
| dataset/valid/mathd_numbertheory_403.dfy | Use filter to exclude 198 itself when summing proper divisors. |
| dataset/valid/mathd_numbertheory_202.dfy | Added parentheses to fix operator precedence around %. |
| dataset/valid/mathd_numbertheory_155.dfy | Corrected expected count from 52 to 48. |
| dataset/valid/algebra_amgm_prod1toneq1_sum1tongeqn.dfy | Added nonnegativity precondition on a. |
| dataset/test/numbertheory_notequiv2i2jasqbsqdiv8.dfy | Reformulated to a closed statement with universally quantified a, b. |
| dataset/test/mathd_numbertheory_764.dfy | Switched to integer/zmod_inv formulation and removed unused parameter. |
| dataset/test/mathd_numbertheory_728.dfy | Corrected expected remainder from 0 to 3. |
| dataset/test/mathd_numbertheory_618.dfy | Added n > 0 precondition. |
| dataset/test/mathd_numbertheory_227.dfy | Changed x, y to real to make division meaningful. |
| dataset/test/mathd_algebra_276.dfy | Corrected target expression to a*b + b. |
| dataset/test/imo_2007_p6.dfy | Added nonnegativity precondition; fixed grouping of summand. |
| dataset/test/imo_1985_p6.dfy | Fixed parenthesization in recursive definition. |
| dataset/test/imo_1981_p6.dfy | Corrected exponent expression f(4)(y)+3. |
| dataset/test/imo_1962_p2.dfy | Corrected RHS to sqrt(31.0)/8.0. |
| dataset/test/amc12b_2020_p2.dfy | Switched integer division to real-valued arithmetic. |
| dataset/test/amc12a_2020_p15.dfy | Replaced Complex.Real.abs with Complex.norm. |
| dataset/test/amc12_2000_p1.dfy | Added distinctness precondition on i, m, o. |
| dataset/test/algebra_amgm_sum1toneqn_prod1tonleq1.dfy | Added nonnegativity precondition on a. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -280,4 +280,8 @@ function {: verify false} of_digits(b: nat, xs: seq<nat>): nat | |||
| ensures forall n :: of_digits(b, [n]) == n | |||
|
|
|||
Comment on lines
+285
to
+286
| function {:axiom} zmod_inv(k: int, p: nat): (r: int) | ||
| ensures p > 1 ==> 0 <= r < (p as int) |
| requires 7 <= p | ||
| ensures Real.sum(set k: int | 1 <= k <= p-2, (k: int) => if k % p == 0 then 0.0 else 1.0/((k % p) as real) * 1.0/((k % p) as real + 1.0)) == 2.0 | ||
| {} No newline at end of file | ||
| ensures Int.sum(set k: int | 1 <= k <= p-2, (k: int) => zmod_inv(k, p) * zmod_inv(k + 1, p)) % (p as int) == 2 |
| lemma mathd_numbertheory_403() | ||
| ensures Int.sum(divisors(198), k => k) == 270 | ||
| {} No newline at end of file | ||
| ensures Int.sum(filter(k => k < 198, divisors(198)), k => k) == 270 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses #8 by correcting benchmark statements that were unsound or mistranslated from the Lean miniF2F sources. The fixes fall into three groups: expression translation bugs, missing domain hypotheses, and number-theory API mismatches between Lean and this Dafny library.
Issue #8 Checklist
Translation and Parsing Bugs
dataset/test/imo_1962_p2.dfy: fixes the endpoint from1 - sqrt(31/8)to1 - sqrt(31)/8, matching the Lean expression1 - Real.sqrt 31 / 8.dataset/test/imo_1985_p6.dfy: restores the recurrence parentheses sof(n+1)(x) = f(n)(x) * (f(n)(x) + 1/n)rather thanf(n)(x)^2 + 1/n.dataset/test/imo_1981_p6.dfy: moves+ 3out of the argument tof; the exponent isf(4)(y) + 3, notf(4)(y+3).dataset/valid/mathd_numbertheory_202.dfy: applies% 10to the whole sum(19^19 + 99^99), not only to99^99.dataset/test/mathd_algebra_276.dfy: corrects the requested result toa * b + b == 12, matching the problem statement and Lean source.Missing Domain Assumptions
dataset/test/algebra_amgm_sum1toneqn_prod1tonleq1.dfy: adds nonnegativity for the sequence entries, reflecting LeanNNReal.dataset/valid/algebra_amgm_prod1toneq1_sum1tongeqn.dfy: adds the same nonnegativity assumption from LeanNNReal.dataset/test/imo_2007_p6.dfy: adds nonnegativity for the finite sequence entries, again reflecting LeanNNReal.dataset/test/mathd_numbertheory_618.dfy: adds the missingn > 0hypothesis, which removes then = 0counterexample and matches the Lean source.dataset/test/amc12_2000_p1.dfy: adds pairwise distinctness fori,m, ando, matching both the Lean formalization and the official AMC problem statement.Arithmetic Domain Corrections
dataset/test/mathd_numbertheory_227.dfy: changes the cup-volume equation to use real division while keepingnintegral. The old natural-number division admitted false examples such as integer quotients rounding to zero.dataset/test/amc12b_2020_p2.dfy: changes the expression to real arithmetic. The old integer division made the product evaluate incorrectly.Divisors and Modular Arithmetic
dataset/definitions.dfy: changesdivisorsto model positive divisors, matching LeanNat.divisorsrather than signed integer divisors.dataset/test/mathd_numbertheory_427.dfy: is fixed by the positive-divisor semantics above; the divisor sums now correspond to LeanNat.divisors.dataset/valid/mathd_numbertheory_543.dfy: is also fixed by the positive-divisor semantics above.dataset/valid/mathd_numbertheory_403.dfy: uses a proper-divisor filter because the Lean source usesNat.properDivisors 198, which excludes198itself.dataset/definitions.dfyanddataset/test/mathd_numbertheory_764.dfy: add and usezmod_invto represent inverses inZMod p; the old translation used real reciprocals, which is not the same theorem.Corrected Lean-Source Alignment
Several items in the issue were described as errors inherited from Lean, but the Kaiyu/Seed Lean sources contain the corrected statements. This PR aligns Dafny with those sources:
dataset/test/numbertheory_notequiv2i2jasqbsqdiv8.dfy: changes the theorem to negate the universal equivalence, rather than asserting that the equivalence fails for every fixed paira,b.dataset/test/mathd_numbertheory_728.dfy: changes the result from0to3modulo7.dataset/valid/mathd_numbertheory_155.dfy: changes the count from52to48for three-digit integers congruent to7 mod 19.Complex Norm
dataset/test/amc12a_2020_p15.dfy: replacesComplex.Real.abs, which does not exist in the local API, withComplex.norm, matching the Lean complex norm notation.