Skip to content

runtime: Refactor factory/hypervisor save/restore paths for clearer s…#458

Draft
Camelron wants to merge 4 commits into
cameronbaird/upstream/runtime-go-clh-templatingfrom
cameronbaird/upstream/runtime-go-clh-templating-rearchitect
Draft

runtime: Refactor factory/hypervisor save/restore paths for clearer s…#458
Camelron wants to merge 4 commits into
cameronbaird/upstream/runtime-go-clh-templatingfrom
cameronbaird/upstream/runtime-go-clh-templating-rearchitect

Conversation

@Camelron

Copy link
Copy Markdown

DRAFT PR for comparison with current working branch upstream cameronbaird/upstream/runtime-go-clh-templating

Remove from HypervisorConfig:

  • BootToBeTemplate bool
  • BootFromTemplate bool
  • DevicesStatePath string (QEMU-specific; snapshot dir now carries this)
  • CheckTemplateConfig() validation method

Retain but repurpose:

  • MemoryPath string → subsumed by FileBackedMemory.Path

Add to HypervisorConfig:

  • FileBackedMemory *FileBackedMemoryConfig

Other files that reference BootToBeTemplate/BootFromTemplate and need updating:

  • factory/template/template_linux.go — sets the flags; rewrite to use FileBackedMemory
  • factory/factory_linux.go — resets the flags; remove
  • persist.go — serializes/deserializes the flags; update
  • vm.go — skips timestamp when BootFromTemplate; replace with restore-aware check
  • hypervisor.go — field definitions + CheckTemplateConfig(); remove/replace
  • qemu_amd64.go — memory flag logic referencing template booleans

The template factory (factory/template/template_linux.go) is the main consumer. It currently:

  1. Sets BootToBeTemplate=true, calls CreateVM + StartVM + Pause + Save
  2. Sets BootFromTemplate=true, calls CreateVM + StartVM (which internally restores)

After refactor:

  1. Sets FileBackedMemory = {Path: ..., Shared: true}, calls CreateVM + StartVM + PauseVM + SaveVM(templateDir) + factory-side shared→false patching (CLH) or no patching needed (QEMU)
  2. Sets FileBackedMemory = {Path: ..., Shared: false}, calls CreateVM + RestoreVM(ctx, templateDir) + ResumeVM + ReseedRNG + SyncTime

Summary

The hypervisor provides three generic primitives: SaveVM(dir), RestoreVM(ctx, dir), ResumeVM(ctx). The caller decides:

  • Where to save/restore
  • Whether memory is shared or private (via FileBackedMemoryConfig)
  • What post-restore housekeeping to do (reseed, sync time, or nothing)
  • Any post-save config modifications (shared→false patching for templates)

Camelron added 4 commits June 9, 2026 22:32
Add support for VM Template factory on the clh path.

In order to support snapshot/restore-based VM templating,
the following changes were needed:
1. For clh.go, implement SaveVM, PauseVM, restoreVM, ResumeVM
2. Remove initrd config check for VM Templating path. The
        root disk image (when using image mode) is created in memory
        and therefore captured in the VM snapshot.
3. Truncate the memory file to the size of the VM at factory VM
        create time. This allows CLH to use the memory file
        as the backing for the template VM memory, allowing O(1)
        snapshot times.
4. CLH uses memory zones as backing for its memory on the template paths
5. Update StartVM in CLH to use the restore path when template is
        configured and available

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Add k8s-vm-templating-test.bats which exercises pod create
with the factory initialized on the target node.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
The behavior we had before was that, for a starting k8s pod,
it sees enable_template=true and therefore:

1. Tries NewFactory with fetchOnly=true
2. When that fails (because template.Fetch fails to find the artifacts,
	we retry with fetchOnly=false. This creates a direct factory
	which creates the template from scratch
	(hence we pay a full pod sandbox boot time here)
	and then restores from that. Hence the boot times
	are strictly worse on this path.

Now, even when enable_template=true, we don't try to force a direct factory.
Instead we just revert to the standard sandbox boot path.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
@Camelron Camelron force-pushed the cameronbaird/upstream/runtime-go-clh-templating-rearchitect branch from d775385 to a910ff5 Compare June 10, 2026 22:01
@Camelron Camelron force-pushed the cameronbaird/upstream/runtime-go-clh-templating branch 8 times, most recently from 02c125e to 96f2eaa Compare June 16, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant