Skip to content

Finalize humatheque extraction prompt and task constants#19

Open
gegedenice wants to merge 5 commits into
davanstrien:mainfrom
gegedenice:codex/adapt-code-for-metadata-extraction-and-evaluation-w92eo2
Open

Finalize humatheque extraction prompt and task constants#19
gegedenice wants to merge 5 commits into
davanstrien:mainfrom
gegedenice:codex/adapt-code-for-metadata-extraction-and-evaluation-w92eo2

Conversation

@gegedenice

Copy link
Copy Markdown

No description provided.

@davanstrien

Copy link
Copy Markdown
Owner

Hi @gegedenice thanks for this, and sorry it sat so long!

The benchmark core has moved quite a bit since April (a few merged PRs touching run.py/cli.py), so this branch now conflicts and wouldn't apply cleanly as-is.

The bigger qs is that the PR is closely tailored to the Humathèque thesis task whereas ocr-bench core tries to stay task-general. So I don't think it should merge into the core modules in this shape.

IMO there's a really good idea in here I'd love to keep: the deterministic, reference-based scorer (exact / list / fuzzy fields) as an alternative to the LLM-as-judge path. That generalises well and it's something the benchmark doesn't have yet.

Two ways forward, whichever suits you:

  • a smaller, config-driven version where the task specifics (dataset, fields, vocab) live in config rather than core, plus a short note on what it does and how you checked it; or
  • keep the Humathèque setup as a worked example/recipe rather than in core.

let me know what you think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants