Cross-subject transfer learning via CrossSubjectEvaluation (calibration_size)#1093
Cross-subject transfer learning via CrossSubjectEvaluation (calibration_size)#1093bruAristimunha wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da2b30015c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| md["X_target_labeled" if labeled else "X_target_unlabeled"] = X[calib] | ||
| if labeled: | ||
| md["y_target_labeled"] = y[calib] |
There was a problem hiding this comment.
Reject labeled 100% calibration to avoid leaking test labels
When calibration_size=1.0 is combined with labeled=True, TransferSplitter makes calib identical to test, so this block routes y[test] as y_target_labeled before the same trials are scored. Any target-aware estimator that requests y_target_labeled can train on the labels of every evaluated sample, making the reported score invalid; please disallow this combination or avoid passing labels when the calibration slice overlaps the test slice.
Useful? React with 👍 / 👎.
…aluation
Run target-aware transfer protocols through the existing CrossSubjectEvaluation,
instead of a dedicated evaluation engine or a separate transfer module.
- CrossSubjectEvaluation gains calibration_size (+ calibration_labeled): when
> 0 it wraps its CrossSubjectSplitter in TransferSplitter, so each fold yields
(train, calibration, test).
- base.py consumes the split with `train, *cal, test` (parallel + serial paths);
the held-out calibration slice is routed RAW to the pipeline steps that
request it via sklearn metadata routing (subjects, X_target_unlabeled /
X_target_labeled). Plain pipelines request nothing and are unaffected -- the
estimator owns the target representation, so there is no transform-through-steps.
- TransferSplitter is the single generic transfer splitter (subject / session /
dataset).
Removes moabb/evaluations/transfer.py and CrossSubjectTransferSplitter.
Usage:
CrossSubjectEvaluation(..., calibration_size=0.2).process(pipelines)
da2b300 to
2b06658
Compare
- calibration_size / calibration_labeled now ride cv_kwargs instead of bespoke
CrossSubjectEvaluation __init__ params: read from self.cv_kwargs and stripped
before the inner CV. No __init__ override.
- Expose cv_class as a documented option (like WithinSessionEvaluation); it
composes with calibration.
- base.py / serial evaluate() read calibration_labeled from cv_kwargs.
Numerically identical to CrossSubjectTargetAwareEvaluation.process() on
BNCI2014_004 (max abs score diff 0.0) when the target-aware estimator covs the
raw target and declares matching fit/transform metadata requests.
Usage:
CrossSubjectEvaluation(..., cv_kwargs={"calibration_size": 0.2}).process(pipelines)
Move the transfer calibration into the splitter so CrossSubjectEvaluation's _create_splitter is the plain original (no .get / .pop / wrapper). - CrossSubjectSplitter gains a calibration_size param: yields (train, calib, test) when > 0, otherwise the usual (train, test). Removes TransferSplitter. - _resolve_cv now always merges self.cv_kwargs over the defaults (a latent fix), so calibration_size flows via cv_kwargs with the default cv_class too. - Drop calibration_labeled: _evaluate_fold offers all transfer kwargs (subjects, X_target_unlabeled, X_target_labeled, y_target_labeled) and metadata routing (consumes) keeps only what the estimator requested, so the estimator's set_fit_request decides labeled vs unlabeled. Numerically identical to CrossSubjectTargetAwareEvaluation.process() on BNCI2014_004 (max abs score diff 0.0).
|
I will need more time to analyze your code Monday. Here are my initial comments:
For example: would lead in the RPA transformer to: Not sure how this can be handled. I also remind you that we can do a Teams meeting and discuss. It will be easier to discuss some details in person. |
Addresses #1077. The split-based alternative to the dedicated evaluation engine in #1091, following the direction discussed in #1077: target-aware transfer learning runs through the existing
CrossSubjectEvaluationwith a few lines of change — no new evaluation class, no separate transfer module.Usage
What it adds
TransferSplitter(base_splitter, calibration_size)(splitters.py) — a generic wrapper that carves the firstcalibration_sizefraction off the held-out group of any leave-one-group-out splitter, yielding(train, calibration, test). Gives subject-, session-, and dataset-transfer from one mechanism.CrossSubjectEvaluationgainscalibration_size(+calibration_labeled); when> 0,_create_splitterwraps itsCrossSubjectSplitterinTransferSplitter.base.pyconsumes the split withtrain, *cal, test(parallel + serial paths). The held-out calibration slice is routed raw to the pipeline steps that request it via scikit-learn metadata routing (set_fit_request):subjects,X_target_unlabeled/X_target_labeled. Plain pipelines request nothing →{}→ the fit is unchanged.Design notes
base.pydoes not transform it through the pipeline (no transform-through-steps). The transfer estimator owns the target representation. This is what keeps the change minimal.subjects/target data to estimators.Verification
Full evaluation + splitter test suites pass. Integration test (
test_cross_subject_calibration_*): a target-aware step receives the routedsubjects+ non-emptyX_target_unlabeledon every fold, and a plain pipeline runs untouched atcalibration_size=0.5.cc @toncho11 — concrete, minimal counter-proposal to #1091; an existing target-aware estimator (e.g. RPA) drops in by declaring
set_fit_request(subjects=True, X_target_unlabeled=True, ...).