Skip to content

[SYNPY-1857] When creating RecordSets, the UpsertKeys are the first columns#1409

Open
andrewelamb wants to merge 2 commits into
developfrom
SYNPY-1857
Open

[SYNPY-1857] When creating RecordSets, the UpsertKeys are the first columns#1409
andrewelamb wants to merge 2 commits into
developfrom
SYNPY-1857

Conversation

@andrewelamb

@andrewelamb andrewelamb commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Problem:

When creating a RecordSet via create_record_based_metadata_task, the CSV template columns were generated directly from the order in which the schema properties were extracted from the schema. This meant the upsert key columns — which serve as the row identifiers in the Grid curation UI — could appear anywhere in the template rather than as the leftmost columns. Curators expect the columns that identify each row to lead the template for readability and ease of data entry.

Additionally, there was no validation that the supplied upsert_keys actually correspond to columns defined in the schema. An upsert key that did not match any schema property would silently produce a template that could not function as intended, with no clear error to the user.

Solution:

  • Added a _reorder_columns_with_upsert_keys_first helper that moves the upsert key columns to the front of the template DataFrame. The relative order of the upsert keys is preserved as provided by the caller, and the remaining columns keep their original order. Upsert keys not present among the columns are skipped, and empty DataFrames are handled gracefully.
  • In create_record_based_metadata_task, after the schema properties are extracted, the template is now validated and reordered:
    • If any of the requested upsert_keys are not found among the schema properties, a ValueError is raised naming the missing keys. This gives the user a clear, actionable error instead of a silently malformed template.
    • The template columns are then reordered so the upsert keys appear first, in the given order.

Testing:

Added unit tests in tests/unit/synapseclient/extensions/unit_test_curator.py:

  • test_create_record_based_metadata_task_reorders_upsert_keys_first — verifies the generated template has the upsert keys as the leftmost columns in the provided order.
  • test_create_record_based_metadata_task_raises_for_missing_upsert_keys — verifies a ValueError naming the missing key is raised when an upsert key is absent from the schema properties.
  • test_reorder_columns_with_upsert_keys_first — parametrized cases covering moving keys to the front, preserving provided key order, skipping keys missing from columns, and handling an empty DataFrame.

@andrewelamb andrewelamb requested a review from a team as a code owner June 19, 2026 15:26
@andrewelamb andrewelamb marked this pull request as draft June 19, 2026 15:26
@andrewelamb andrewelamb marked this pull request as ready for review June 19, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant