[SYNPY-1857] When creating RecordSets, the UpsertKeys are the first columns#1409
Open
andrewelamb wants to merge 2 commits into
Open
[SYNPY-1857] When creating RecordSets, the UpsertKeys are the first columns#1409andrewelamb wants to merge 2 commits into
andrewelamb wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem:
When creating a RecordSet via
create_record_based_metadata_task, the CSV template columns were generated directly from the order in which the schema properties were extracted from the schema. This meant the upsert key columns — which serve as the row identifiers in the Grid curation UI — could appear anywhere in the template rather than as the leftmost columns. Curators expect the columns that identify each row to lead the template for readability and ease of data entry.Additionally, there was no validation that the supplied
upsert_keysactually correspond to columns defined in the schema. An upsert key that did not match any schema property would silently produce a template that could not function as intended, with no clear error to the user.Solution:
_reorder_columns_with_upsert_keys_firsthelper that moves the upsert key columns to the front of the template DataFrame. The relative order of the upsert keys is preserved as provided by the caller, and the remaining columns keep their original order. Upsert keys not present among the columns are skipped, and empty DataFrames are handled gracefully.create_record_based_metadata_task, after the schema properties are extracted, the template is now validated and reordered:upsert_keysare not found among the schema properties, aValueErroris raised naming the missing keys. This gives the user a clear, actionable error instead of a silently malformed template.Testing:
Added unit tests in
tests/unit/synapseclient/extensions/unit_test_curator.py:test_create_record_based_metadata_task_reorders_upsert_keys_first— verifies the generated template has the upsert keys as the leftmost columns in the provided order.test_create_record_based_metadata_task_raises_for_missing_upsert_keys— verifies aValueErrornaming the missing key is raised when an upsert key is absent from the schema properties.test_reorder_columns_with_upsert_keys_first— parametrized cases covering moving keys to the front, preserving provided key order, skipping keys missing from columns, and handling an empty DataFrame.