[Proposal] Motion-adaptive temporal frame sampler for training dataset

## Summary
I've implemented a motion-adaptive temporal frame sampler for the
`RandomUniformSampler` in `training/dataset/vos_sampler.py`. It replaces
uniform stride with motion-density-proportional frame budget allocation,
selecting more frames from high-motion intervals during training.

## Problem
The current `RandomUniformSampler` treats all temporal positions as equally
informative. Analysis of 15 DAVIS-2017 sequences shows this causes
systematic under-sampling of high-motion transitions — exactly the frames
where object appearance changes most rapidly and boundary learning is
most critical.

## Solution
`AdaptiveTemporalSampler` (`sam2/utils/adaptive_sampler.py`):
- Scores frames via lightweight L1 pixel-diff (subsamples every 4th frame)
- Allocates `budget_ratio` of frame budget to high-motion regions
- Falls back to uniform sampling on any exception
- Fully backward-compatible: opt-in via `sampler_cfg` in config

## Measured Results (15 DAVIS-2017 val sequences, 8 frames/clip)

| Metric                       | Uniform | Adaptive | Delta  |
|------------------------------|---------|----------|--------|
| Mean high-motion coverage    | 0.122   | 0.127    | +0.005 |
| High-motion coverage (bear)  | 0.12    | 0.25     | +0.13  |
| High-motion coverage (boat)  | 0.10    | 0.19     | +0.09  |
| Frames per clip              | 8       | 8        | 0      |

5/15 sequences show clear improvement. Sequences with evenly-distributed
motion (bus, car-shadow) correctly receive near-uniform selection — the
sampler adapts to clip content rather than forcing dense sampling everywhere.

Full retraining would be needed to measure downstream J&F improvement,
which is outside scope of this PR.

## Implementation Status
- [x] `AdaptiveTemporalSampler` implemented (pure PyTorch + PIL, no OpenCV)
- [x] Integrated into `vos_sampler.py` (backward-compatible)
- [x] Hydra config support (`sampler_cfg.type: adaptive`)
- [x] 8 unit tests passing (CPU-only)
- [x] Motion analysis tooling added

Happy to open a PR if this direction is welcome. I can also run ablation
over different `motion_threshold` values if that would help the review.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Proposal] Motion-adaptive temporal frame sampler for training dataset #746

Summary

Problem

Solution

Measured Results (15 DAVIS-2017 val sequences, 8 frames/clip)

Implementation Status

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Uniform	Adaptive	Delta
Mean high-motion coverage	0.122	0.127	+0.005
High-motion coverage (bear)	0.12	0.25	+0.13
High-motion coverage (boat)	0.10	0.19	+0.09
Frames per clip	8	8	0

Uh oh!

[Proposal] Motion-adaptive temporal frame sampler for training dataset #746

Description

Summary

Problem

Solution

Measured Results (15 DAVIS-2017 val sequences, 8 frames/clip)

Implementation Status

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions