Skip to content

Chore/remove cdi#2394

Draft
Isteb4k wants to merge 105 commits into
mainfrom
chore/remove-cdi
Draft

Chore/remove cdi#2394
Isteb4k wants to merge 105 commits into
mainfrom
chore/remove-cdi

Conversation

@Isteb4k

@Isteb4k Isteb4k commented May 22, 2026

Copy link
Copy Markdown
Contributor

Description

Why do we need it, and what problem does it solve?

What is the expected result?

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: core
type: fix
summary: TODO

Isteb4k added 3 commits May 19, 2026 22:49
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
# Conflicts:
#	build/components/versions.yml

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
@Isteb4k Isteb4k added this to the v1.9.0 milestone May 22, 2026
Isteb4k added 5 commits May 22, 2026 13:12
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
…f config

The `importPackageImages` template emits entries that belong to the `import:`
section (they reference other images, not git sources). They were being
appended directly after the `git:` block, producing config where each
`- image:` / `before:` entry was parsed as an extra field of the git mount,
causing werf to reject it with `unknown fields: image, before`.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
…ixture

After the CDI-as-operator removal, the disk importer is provided by the
standalone `virtual-disk-importer` image. Its camelCase name
`virtualDiskImporter` is referenced from
`templates/virtualization-controller/_helpers.tpl` via
`helm_lib_module_image`, so the kubeconform fixture must define a digest
for it. Without it, `helm template` fails with
`Image virtualization.virtualDiskImporter has no digest`.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
…DI cleanup

The CDI removal commit added a `replace` directive in
images/virtualization-artifact/go.mod that pointed at a vendored copy of
the upstream API package under
images/cdi-artifact/containerized-data-importer/staging/..., but the
follow-up cleanup removed that entire `staging/` tree. As a result
`go mod download` (and every downstream `go build`/`go test`) failed
with: open .../staging/.../containerized-data-importer-api/go.mod: no
such file or directory.

Drop the dangling local replace so we resolve
kubevirt.io/containerized-data-importer-api v1.63.1 from the Go module
proxy, and restore the `fmt` import in cmd/virtualization-controller
that was accidentally dropped alongside the temporary debug print
removed by the same commit (the file still uses fmt.Sprintf).

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
@Isteb4k Isteb4k self-assigned this May 22, 2026
Isteb4k added 20 commits May 22, 2026 14:59
The changes that removed CDI and refactored VirtualDisk data sources to
the step-based pattern left a number of issues that golangci-lint v2.11
catches. Fix them in one pass:

- Drop unused helpers in pkg/controller/vd/internal/source/sources.go
  (setPhaseCondition*, getNodePlacement, retryPeriod,
  setQuotaExceededPhaseCondition, isStorageClassWFFC, the unused
  SupplementsCleaner / CleanUpSupplements / Cleaner). The step pattern
  handles these cases internally now.
- Drop the unused DiskService.isImmediateBindingMode helper.
- Auto-fix the gci/gofumpt/QF1008 issues in service/errors.go,
  storageprofile/storageprofile_controller.go, and the new VirtualDisk
  source tests.
- Replace the deprecated reconcile.Result.Requeue with RequeueAfter in
  Create{Importer,Uploader}Step, WaitForPVCImportStep, and the matching
  *_test.go assertions.
- Simplify VirtualImage reconcilePVCImportFromDVCR (its bool-result was
  always true) and inline the call sites in vi http/registry/upload.
- Cleanup gocritic findings in vd internal watchers (unlambda Pod
  watcher map func, singleCaseSwitch -> if in PVC watcher).
- test/e2e: drop the dangling local replace of
  containerized-data-importer-api (the staging tree was removed) so
  typecheck succeeds, and lowercase a few error strings in observer
  predicates flagged by ST1005.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
…ub to drop CVEs

Trivy reports five github.com/docker/docker CVEs against the
cdi-importer binary (CVE-2026-34040, CVE-2026-33997, CVE-2026-41567,
CVE-2026-42306, CVE-2026-41568). docker/docker is only pulled into CDI
transitively via github.com/containers/image/v5/manifest, which uses
exactly one symbol — github.com/docker/docker/api/types/versions — for
manifest version comparison.

Mirror the workaround already in place in deckhouse/3p-containerized-data-importer:
keep a minimal staging/src/github.com/docker/docker stub that only
provides api/types/versions and an empty registry package, and replace
the upstream module with it. go mod tidy correspondingly prunes the
otel/grpc/genproto/containerd transitive dependencies that were only
brought in by the full docker/docker module.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
The mocks regenerated in the previous chat were produced with a
different moq version that aliased k8s.io/api/storage/v1 as
storagev1. moq v0.5.3 (pinned in Taskfile.init.yaml) leaves the import
unaliased and uses *v1.StorageClass throughout, which is what the
"check auto-generated files are up-to-date" CI step expects. Run
`task controller:dev:gogenerate` and commit the resulting diff.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Move the vendored CDI importer code from
images/cdi-artifact/containerized-data-importer/ one level up to
images/cdi-artifact/ to remove the redundant nested directory. The
former unpack-bundle.sh and .gitignore tailored for the wrapper layout
are no longer needed. Werf and mount-point paths are updated to point
at the flattened tree.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
cdi-importer now emits kubevirt_cdi_import_progress_total covering
both internal phases: TransferScratch (download from DVCR to the
scratch PVC) maps to 0-49% and qemu-img Convert (scratch to target
PVC) maps to 50-100%.

The virtualization-controller picks up both registry_progress and
kubevirt_cdi_import_progress_total metrics and applies a per-source
scale when computing vd/vi progress:

  - ObjectRef CVI/VI imports (cdi-importer is the sole pipeline)
    surface raw 0-100%.
  - HTTP, Registry and Upload imports (DVCR-fed) surface cdi-importer
    progress as 50-100%, after the dvcr-importer/uploader phase has
    already filled 0-50%.

WaitForPVCImportStep now requeues every 2 seconds while an import is
in progress so disk/image status reflects up-to-date percentages.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
After the importer pod name was shortened to "d8v-<prefix>-importer-<uid>"
both the dvcr-importer Pod (started by importer_service.go for HTTP,
Registry and Upload sources) and the cdi-importer Pod (started by
disk_import_service.go to copy from DVCR to the target PVC) ended up
sharing the same name via sup.ImporterPod().

The collision caused EnsureSupplementPVCImport to fetch the already
Completed dvcr-importer Pod, treat it as the cdi-importer Pod, observe
phase Succeeded and call cleanupPVCImport on it. cleanupPVCImport
deletes the pod by name, but the dvcr-importer Pod still carries the
vi-/vd-protection finalizer, so it was stuck in Terminating and the
controller looped on "Waiting for supplements to be terminated".

Introduce a separate name template ("d8v-<prefix>-pvc-importer-<uid>")
and a new Generator.PVCImporterPod() helper. Switch the disk-import
service to use it so the two phases never share a Pod name.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
DiskService.CleanUpSupplements built a bare target PVC (no
annotations) and passed it to cleanupPVCImport. cleanupPVCImport
read the pod name from target.Annotations[AnnPVCImportPod] and fell
back to target.Name when the annotation was empty - i.e. it tried
to delete a pod sharing the target PVC's name. The actual
cdi-importer pod is named d8v-<prefix>-pvc-importer-<uid>, so the
delete was a silent no-op and the pod survived VD cleanup.

The orphan pod kept its volume mount on the target PVC, which
prevented kubernetes.io/pvc-protection from finalising the PVC and
in turn blocked the vd-cleanup finalizer, leaving deleted
VirtualDisks stuck in Terminating forever.

Thread the supplements.Generator through cleanupPVCImport and use
sup.PVCImporterPod() as the fallback pod name so cleanup targets
the correct pod whether or not the target PVC carries the
AnnPVCImportPod annotation.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
…rvice.Protect

The OwnerReference and the {vd,vi}-protection finalizer that disk import PVCs
need to participate in the controller-driven cleanup must be present from the
moment the PVC is created. Previously the finalizer was added later via
DiskService.Protect, opening a window where a PVC could be deleted before the
controller had stamped its protection on it.

Move the finalizer next to the OwnerReference inside StartPVCImport,
StartSupplementPVCImport and makePVCCloneTarget so every disk import target PVC
carries them at creation. Drop DiskService.Protect entirely along with all of
its callers in the VD ready step and the VI source handlers (http, registry,
upload, object_ref*, sources). Regenerate the VD source mocks and update the
VD source unit tests so they stop wiring the now-removed ProtectFunc.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
The dvcr-importer and uploader pods are already created with the
resource-quota-overrides.deckhouse.io/ignore=true label so they don't get
charged against namespace quotas. Apply the same label, at creation time, to
the scratch PVC and the cdi-importer pod that DiskService spawns to populate
target PVCs.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Introduce PersistentVolumeClaimService as the single entry point for filling
a target PVC. Its Import method picks the strategy (smart clone via
VolumeSnapshot, CSI clone via dataSource, or host-assisted via cdi-importer
pod) and provisions every helper resource (scratch PVC, cdi-importer pod,
DVCR auth/CA copies, VolumeSnapshot, etc.) with ownerRef, finalizer, and
the resource-quota-overrides.deckhouse.io/ignore label set at creation.

DiskService is slimmed down: StartPVCImport/EnsurePVCImport and clone
helpers are removed and replaced by disk.PersistentVolumeClaim().Import.
The VirtualDisk reconciler step builds the target PVC descriptor and hands
it off to the service; data sources (HTTP, Registry, Upload, ObjectRef
CVI/VI) and the VirtualImage source plumbing are updated accordingly.
Tests and mocks are regenerated to match the new contract.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Isteb4k added 13 commits June 23, 2026 22:09
Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Allow VirtualImages on PVC to use filesystem-capable storage classes and align e2e expectations with the resolved storage mode. Also remove obsolete VirtualMachineRestore resource wiring from generated rules and RBAC.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Boot ISO VirtualImages with a blank target disk and assert the installer framebuffer is visible, so regressions do not pass on Running alone.

Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
The Remove CDI DataVolume dependency commit accidentally removed the
virtualmachinerestores resource from the snapshot.kubevirt.io group
rules. VirtualMachineRestore belongs to the KubeVirt snapshot subsystem,
not CDI, and must be internalized so virt-operator can deploy it as an
internal CRD and virt-api can watch it through the proxy.

Without these rules, virt-operator looped trying to recreate the raw
CRD on every reconcile, and virt-api failed to start due to a forbidden
error watching virtualmachinerestores.snapshot.kubevirt.io.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
@Isteb4k Isteb4k force-pushed the chore/remove-cdi branch 2 times, most recently from 797f88c to c4390ad Compare June 26, 2026 12:59
Isteb4k and others added 4 commits June 29, 2026 15:15
The d8-snapshot-controller webhook rejects VolumeSnapshots where
VolumeSnapshotClassName is set but does not match the class annotated on
the StorageClass. Our code was selecting a class by matching the CSI
provisioner name, which could pick the wrong class when multiple
VolumeSnapshotClasses share the same driver.

Drop the explicit VolumeSnapshotClassName from clone snapshots in both
the PVC service (ensureCloneSnapshot) and the populator controller
(ensureSnapshot). The webhook will fill in the correct class from the
StorageClass annotation. The capability check in canSnapshotClone
(snapshotClassForProvisioner) is retained to avoid attempting snapshot
cloning when no class exists at all.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
When a test was interrupted (Ctrl+C), namespaces were left Active
because It-nodes blocked in obs.WaitFor with context.Background().
Ginkgo could not cancel the wait, so the user had to press Ctrl+C a
second time, killing the process before DeferCleanup could run.

Two changes fix this:

1. framework.After now accepts context.Context. DeferCleanup(f.After)
   lets Ginkgo inject an interruptible SpecContext, so cleanup polling
   stops immediately on interrupt instead of blocking for LongTimeout.

2. VirtualDiskCreation and VirtualImageCreation test nodes (BeforeEach,
   BeforeAll, It) now accept context.Context from Ginkgo. The shared
   ctx = context.Background() closure variable is removed. On the first
   Ctrl+C, Ginkgo cancels the SpecContext, obs.WaitFor returns, and
   DeferCleanup runs cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
…tifact from DMT license

After main's #2548 (allow names up to Kubernetes limit), MaxVirtualImageNameLen
was removed from the validate package. Remove the now-undefined reference from
vi_webhook.go.

Exclude images/cdi-artifact from DMT license check: these are upstream CDI files
with CDI Authors copyright, not Flant JSC.

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Isteb4k Isteb4k force-pushed the chore/remove-cdi branch from 0117038 to 77bef32 Compare June 29, 2026 18:11
Merges fix/volume-umount-retries (PR deckhouse/3p-kubevirt#132) into
v1.6.2-virtualization. Tags v1.6.2-v12n.50 and
staging/src/kubevirt.io/api/v1.6.2-v12n.50 created.

Update versions.yml to track v1.6.2-virtualization branch.

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Isteb4k Isteb4k force-pushed the chore/remove-cdi branch from c811a5f to 3259f98 Compare June 29, 2026 18:16
Isteb4k and others added 8 commits June 29, 2026 20:31
This reverts commit 3259f98.

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Refs deckhouse/3p-kubevirt#137 (fix: volume umount retries).
Update versions.yml to track v1.6.2-virtualization branch.

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ns.yml

Refs deckhouse/3p-kubevirt#137.

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove double blank line in populator_controller.go (gci)
- Propagate context through saveTestCaseDump call chain in e2e framework (contextcheck)

Signed-off-by: Dmitry Rakitin <dmitry.rakitin@flant.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…stem

Importer pods for VirtualImage/ClusterVirtualImage sourced from a
VirtualImage-on-PVC were unconditionally using VolumeDevices (block
device mount). If the source PVC has volumeMode: Filesystem, kubelet
rejects the pod with FailedMount, causing a 5-minute timeout.

Fix mirrors the existing pattern in object_ref_vd.go: look up the
source PVC's VolumeMode at pod-creation time and apply the matching
source settings (Filesystem vs Block).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ention

LINSTOR thin pool lock contention can stall all storage writes on a node
for over a minute without surfacing any error, making progress-rate checks
unreliable. Disable the time-budget enforcement until the underlying storage
issue is resolved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep CDI-free implementation in upload.go (steptaker-based).
Keep fix/volume-umount-retries kubevirt branch reference in versions.yml
(no CDI component reference).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants