Skip to content

Port HybridGym benchmarks to Harbor #724

Description

@neubig

Context

The HybridGym benchmarks (hybridgym_funclocalize, hybridgym_funcgen, hybridgym_depsearch, and hybridgym_issuelocalize) do not appear to have corresponding Harbor registry datasets/adapters yet. These should be ported as a benchmark family while keeping per-task scoring behavior intact.

Proposed scope

  • Create Harbor adapters/tasks for each HybridGym variant.
  • Preserve prompts, environment setup, and evaluator semantics for each variant.
  • Validate parity separately for function localization, function generation, dependency search, and issue localization.
  • Document any shared adapter code and per-variant differences.

Acceptance criteria

  • All four HybridGym variants can be run through Harbor.
  • Per-variant parity results are recorded against the current OpenHands benchmark workflows.
  • OpenHands wrappers can delegate HybridGym execution to Harbor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions