Skip to content

fix(wandb): log different val datasets separately in wandb#2526

Open
grgkovac wants to merge 3 commits into
NVIDIA-NeMo:mainfrom
grgkovac:grgkovac/fix/wandb_multiple_val_datasets
Open

fix(wandb): log different val datasets separately in wandb#2526
grgkovac wants to merge 3 commits into
NVIDIA-NeMo:mainfrom
grgkovac:grgkovac/fix/wandb_multiple_val_datasets

Conversation

@grgkovac

Copy link
Copy Markdown
Contributor

What does this PR do ?

Fixes a bug with logging multiple val datasets to wandb. They all write in Charts/val_loss, and the last val dataset overwrites the previous ones. This fix logs different val datasets in different sections. See screenshots below.

Changelog

  • lines L1541-1545 in nemo_automodel/recipes/llm/train_ft.py
  • L1541 if val_name == "default", only one dataset is used without a specific name (validation dataset: in the yaml) -> same as before
  • L1544-1545: get val_name and use it as section name.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests? not needed
  • Did you add or update any necessary documentation? not needed

Additional Information

Two dummy val datasets are used for testing: dummy_number dummy_letters.

After the fix - two section one dummy_letters one dummy_numbers

image

Before the fix - one val_loss in Charts (dummy_letters under the name val_loss, dummy_numbers not logged)

image

After the fix one val dataset with custom name (validation_dataset_dummy_letters: in yaml) - separate section

image

After the fix one val dataset with default name (validation_dataset: in yaml) - val_loss in Charts as before

image

Signed-off-by: Grgur Kovac <grgur.kovac@noxtua.com>
@grgkovac grgkovac requested a review from a team as a code owner June 11, 2026 17:49
@grgkovac grgkovac changed the title log different val datasets separately in wandb fix(wandb): log different val datasets separately in wandb Jun 11, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@svcnvidia-nemo-ci svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label Jun 13, 2026
@akoumpa

akoumpa commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

/ok to test 1db9ea9

@grgkovac

Copy link
Copy Markdown
Contributor Author

@akoumpa the L2_HF_PEFT failure looks unrelated to this PR. Do you mind rerunning the tests?

@akoumpa

akoumpa commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

/ok to test bfbe695

@akoumpa

akoumpa commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Thanks @grgkovac ; restarted CI now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request waiting-on-maintainers Waiting on maintainers to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants