Skip to content

Fix soundness issues in translated benchmark statements#9

Open
MantasBaksys wants to merge 1 commit into
dafny-lang:mainfrom
MantasBaksys:mantas-fix-issue-8
Open

Fix soundness issues in translated benchmark statements#9
MantasBaksys wants to merge 1 commit into
dafny-lang:mainfrom
MantasBaksys:mantas-fix-issue-8

Conversation

@MantasBaksys

Copy link
Copy Markdown
Contributor

Summary

Addresses #8 by correcting benchmark statements that were unsound or mistranslated from the Lean miniF2F sources. The fixes fall into three groups: expression translation bugs, missing domain hypotheses, and number-theory API mismatches between Lean and this Dafny library.

Issue #8 Checklist

Translation and Parsing Bugs

  • dataset/test/imo_1962_p2.dfy: fixes the endpoint from 1 - sqrt(31/8) to 1 - sqrt(31)/8, matching the Lean expression 1 - Real.sqrt 31 / 8.
  • dataset/test/imo_1985_p6.dfy: restores the recurrence parentheses so f(n+1)(x) = f(n)(x) * (f(n)(x) + 1/n) rather than f(n)(x)^2 + 1/n.
  • dataset/test/imo_1981_p6.dfy: moves + 3 out of the argument to f; the exponent is f(4)(y) + 3, not f(4)(y+3).
  • dataset/valid/mathd_numbertheory_202.dfy: applies % 10 to the whole sum (19^19 + 99^99), not only to 99^99.
  • dataset/test/mathd_algebra_276.dfy: corrects the requested result to a * b + b == 12, matching the problem statement and Lean source.

Missing Domain Assumptions

  • dataset/test/algebra_amgm_sum1toneqn_prod1tonleq1.dfy: adds nonnegativity for the sequence entries, reflecting Lean NNReal.
  • dataset/valid/algebra_amgm_prod1toneq1_sum1tongeqn.dfy: adds the same nonnegativity assumption from Lean NNReal.
  • dataset/test/imo_2007_p6.dfy: adds nonnegativity for the finite sequence entries, again reflecting Lean NNReal.
  • dataset/test/mathd_numbertheory_618.dfy: adds the missing n > 0 hypothesis, which removes the n = 0 counterexample and matches the Lean source.
  • dataset/test/amc12_2000_p1.dfy: adds pairwise distinctness for i, m, and o, matching both the Lean formalization and the official AMC problem statement.

Arithmetic Domain Corrections

  • dataset/test/mathd_numbertheory_227.dfy: changes the cup-volume equation to use real division while keeping n integral. The old natural-number division admitted false examples such as integer quotients rounding to zero.
  • dataset/test/amc12b_2020_p2.dfy: changes the expression to real arithmetic. The old integer division made the product evaluate incorrectly.

Divisors and Modular Arithmetic

  • dataset/definitions.dfy: changes divisors to model positive divisors, matching Lean Nat.divisors rather than signed integer divisors.
  • dataset/test/mathd_numbertheory_427.dfy: is fixed by the positive-divisor semantics above; the divisor sums now correspond to Lean Nat.divisors.
  • dataset/valid/mathd_numbertheory_543.dfy: is also fixed by the positive-divisor semantics above.
  • dataset/valid/mathd_numbertheory_403.dfy: uses a proper-divisor filter because the Lean source uses Nat.properDivisors 198, which excludes 198 itself.
  • dataset/definitions.dfy and dataset/test/mathd_numbertheory_764.dfy: add and use zmod_inv to represent inverses in ZMod p; the old translation used real reciprocals, which is not the same theorem.

Corrected Lean-Source Alignment

Several items in the issue were described as errors inherited from Lean, but the Kaiyu/Seed Lean sources contain the corrected statements. This PR aligns Dafny with those sources:

  • dataset/test/numbertheory_notequiv2i2jasqbsqdiv8.dfy: changes the theorem to negate the universal equivalence, rather than asserting that the equivalence fails for every fixed pair a,b.
  • dataset/test/mathd_numbertheory_728.dfy: changes the result from 0 to 3 modulo 7.
  • dataset/valid/mathd_numbertheory_155.dfy: changes the count from 52 to 48 for three-digit integers congruent to 7 mod 19.

Complex Norm

  • dataset/test/amc12a_2020_p15.dfy: replaces Complex.Real.abs, which does not exist in the local API, with Complex.norm, matching the Lean complex norm notation.

Copilot AI review requested due to automatic review settings May 27, 2026 15:30

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates a collection of Dafny lemma statements in the dataset to correct formalization issues such as operator precedence, missing preconditions, type mismatches, and incorrect expected values. It also extends definitions.dfy with a new zmod_inv axiom and tightens the spec for divisors.

Changes:

  • Fixed mathematical formalizations across many valid/ and test/ lemmas (parenthesization, types, preconditions, corrected RHS values).
  • Refactored divisors axiom to require positivity and added a new zmod_inv axiom in definitions.dfy.
  • Adjusted mathd_numbertheory_764 to use modular arithmetic via zmod_inv rather than real-valued fractions.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
dataset/definitions.dfy Tightened divisors postcondition to positive divisors; added zmod_inv axiom.
dataset/valid/mathd_numbertheory_403.dfy Use filter to exclude 198 itself when summing proper divisors.
dataset/valid/mathd_numbertheory_202.dfy Added parentheses to fix operator precedence around %.
dataset/valid/mathd_numbertheory_155.dfy Corrected expected count from 52 to 48.
dataset/valid/algebra_amgm_prod1toneq1_sum1tongeqn.dfy Added nonnegativity precondition on a.
dataset/test/numbertheory_notequiv2i2jasqbsqdiv8.dfy Reformulated to a closed statement with universally quantified a, b.
dataset/test/mathd_numbertheory_764.dfy Switched to integer/zmod_inv formulation and removed unused parameter.
dataset/test/mathd_numbertheory_728.dfy Corrected expected remainder from 0 to 3.
dataset/test/mathd_numbertheory_618.dfy Added n > 0 precondition.
dataset/test/mathd_numbertheory_227.dfy Changed x, y to real to make division meaningful.
dataset/test/mathd_algebra_276.dfy Corrected target expression to a*b + b.
dataset/test/imo_2007_p6.dfy Added nonnegativity precondition; fixed grouping of summand.
dataset/test/imo_1985_p6.dfy Fixed parenthesization in recursive definition.
dataset/test/imo_1981_p6.dfy Corrected exponent expression f(4)(y)+3.
dataset/test/imo_1962_p2.dfy Corrected RHS to sqrt(31.0)/8.0.
dataset/test/amc12b_2020_p2.dfy Switched integer division to real-valued arithmetic.
dataset/test/amc12a_2020_p15.dfy Replaced Complex.Real.abs with Complex.norm.
dataset/test/amc12_2000_p1.dfy Added distinctness precondition on i, m, o.
dataset/test/algebra_amgm_sum1toneqn_prod1tonleq1.dfy Added nonnegativity precondition on a.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dataset/definitions.dfy
@@ -280,4 +280,8 @@ function {: verify false} of_digits(b: nat, xs: seq<nat>): nat
ensures forall n :: of_digits(b, [n]) == n

Comment thread dataset/definitions.dfy
Comment on lines +285 to +286
function {:axiom} zmod_inv(k: int, p: nat): (r: int)
ensures p > 1 ==> 0 <= r < (p as int)
requires 7 <= p
ensures Real.sum(set k: int | 1 <= k <= p-2, (k: int) => if k % p == 0 then 0.0 else 1.0/((k % p) as real) * 1.0/((k % p) as real + 1.0)) == 2.0
{} No newline at end of file
ensures Int.sum(set k: int | 1 <= k <= p-2, (k: int) => zmod_inv(k, p) * zmod_inv(k + 1, p)) % (p as int) == 2
lemma mathd_numbertheory_403()
ensures Int.sum(divisors(198), k => k) == 270
{} No newline at end of file
ensures Int.sum(filter(k => k < 198, divisors(198)), k => k) == 270
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants