fix(wandb): log different val datasets separately in wandb by grgkovac · Pull Request #2526 · NVIDIA-NeMo/Automodel

grgkovac · 2026-06-11T17:49:12Z

What does this PR do ?

Fixes a bug with logging multiple val datasets to wandb. They all write in Charts/val_loss, and the last val dataset overwrites the previous ones. This fix logs different val datasets in different sections. See screenshots below.

Changelog

lines L1541-1545 in nemo_automodel/recipes/llm/train_ft.py
L1541 if val_name == "default", only one dataset is used without a specific name (validation dataset: in the yaml) -> same as before
L1544-1545: get val_name and use it as section name.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests? not needed
Did you add or update any necessary documentation? not needed

Additional Information

Two dummy val datasets are used for testing: dummy_number dummy_letters.

After the fix - two section one dummy_letters one dummy_numbers

Before the fix - one val_loss in Charts (dummy_letters under the name val_loss, dummy_numbers not logged)

After the fix one val dataset with custom name (validation_dataset_dummy_letters: in yaml) - separate section

After the fix one val dataset with default name (validation_dataset: in yaml) - val_loss in Charts as before

Signed-off-by: Grgur Kovac <grgur.kovac@noxtua.com>

copy-pr-bot · 2026-06-11T17:50:04Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

akoumpa · 2026-06-15T16:50:52Z

/ok to test 1db9ea9

grgkovac · 2026-06-17T15:46:13Z

@akoumpa the L2_HF_PEFT failure looks unrelated to this PR. Do you mind rerunning the tests?

akoumpa · 2026-06-17T19:34:23Z

/ok to test bfbe695

akoumpa · 2026-06-17T19:34:42Z

Thanks @grgkovac ; restarted CI now.

log different val datasets separately in wandb

9b1ab0e

Signed-off-by: Grgur Kovac <grgur.kovac@noxtua.com>

grgkovac requested a review from a team as a code owner June 11, 2026 17:49

grgkovac changed the title ~~log different val datasets separately in wandb~~ fix(wandb): log different val datasets separately in wandb Jun 11, 2026

github-actions Bot added the community-request label Jun 11, 2026

svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label Jun 13, 2026

Merge branch 'main' into grgkovac/fix/wandb_multiple_val_datasets

1db9ea9

copy-pr-bot Bot temporarily deployed to nemo-ci June 15, 2026 16:51 Inactive

copy-pr-bot Bot temporarily deployed to public June 15, 2026 16:51 Inactive

copy-pr-bot Bot temporarily deployed to test June 15, 2026 16:51 Inactive

copy-pr-bot Bot temporarily deployed to public June 15, 2026 16:53 Inactive

copy-pr-bot Bot temporarily deployed to public June 15, 2026 16:54 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 15, 2026 16:55 Inactive

svcnvidia-nemo-ci removed the waiting-on-maintainers Waiting on maintainers to respond label Jun 15, 2026

Merge branch 'main' into grgkovac/fix/wandb_multiple_val_datasets

bfbe695

copy-pr-bot Bot temporarily deployed to test June 17, 2026 19:34 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 17, 2026 19:34 Inactive

copy-pr-bot Bot temporarily deployed to public June 17, 2026 19:35 Inactive

copy-pr-bot Bot temporarily deployed to public June 17, 2026 19:37 Inactive

copy-pr-bot Bot temporarily deployed to public June 17, 2026 19:38 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 17, 2026 19:40 Inactive

svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label Jun 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(wandb): log different val datasets separately in wandb#2526

fix(wandb): log different val datasets separately in wandb#2526
grgkovac wants to merge 3 commits into
NVIDIA-NeMo:mainfrom
grgkovac:grgkovac/fix/wandb_multiple_val_datasets

grgkovac commented Jun 11, 2026

Uh oh!

copy-pr-bot Bot commented Jun 11, 2026

Uh oh!

akoumpa commented Jun 15, 2026

Uh oh!

grgkovac commented Jun 17, 2026

Uh oh!

akoumpa commented Jun 17, 2026

Uh oh!

akoumpa commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

grgkovac commented Jun 11, 2026

What does this PR do ?

Changelog

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Jun 11, 2026

Uh oh!

akoumpa commented Jun 15, 2026

Uh oh!

grgkovac commented Jun 17, 2026

Uh oh!

akoumpa commented Jun 17, 2026

Uh oh!

akoumpa commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants