Skip to content

feat: Add RKE2 as alternative Kubernetes distribution for service cluster#4566

Open
j0hnL wants to merge 3 commits into
pub/q2_dev_archievefrom
feature/rke2-service-cluster
Open

feat: Add RKE2 as alternative Kubernetes distribution for service cluster#4566
j0hnL wants to merge 3 commits into
pub/q2_dev_archievefrom
feature/rke2-service-cluster

Conversation

@j0hnL

@j0hnL j0hnL commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add support for deploying RKE2 (Rancher Kubernetes Engine 2) as an alternative to vanilla Kubernetes (kubeadm) for the service cluster. The distribution is selected by listing service_rke2 (instead of service_k8s) in software_config.json and configuring the new service_rke2_k8s_cluster block in omnia_config.yml.

How to use

  1. In input/software_config.json, replace service_k8s with service_rke2 in the softwares list and set its "version" (e.g. 1.35.1)
  2. In input/omnia_config.yml, uncomment and configure the service_rke2_k8s_cluster block with deployment: true (only one of service_k8s_cluster / service_rke2_k8s_cluster may have deployment: true)
  3. Ensure the cluster has an entry in high_availability_config.yml — the kube-vip VIP is required for cluster join
  4. Run the pipeline as normal (local_repo.ymlbuild_image_x86_64.ymldiscovery.yml)

Configuration changes

  • New service_rke2_k8s_cluster block in omnia_config.yml with schema validation in omnia_config.json (supported CNIs: calico, canal, cilium, flannel)
  • Validation rejects service_k8s and service_rke2 being present together
  • rke2 (stable/1.35 channel) and rke2-common RPM repositories added to local_repo_config.yml

Pipeline integration

  • provision.yml: enable service_k8s tag when service_rke2 is in software_config.json
  • include_software_config.yml / validate_software_config_json.yml: detect service_rke2, set support facts, handle arch and version
  • common_validation.py: validate service_rke2 cluster configuration
  • image_package_collector.py: select service_rke2.json for image builds
  • k8s_config/main.yml: branch NFS setup on k8s_distro; create_rke2_config_nfs.yml reads NFS share details from storage_config.yml mounts
  • configure_cloud_init_group.yml: select RKE2-specific cloud-init templates

New files

  • 3 RKE2 cloud-init templates (first server, additional server, agent)
  • create_rke2_config_nfs.yml for RKE2-specific NFS setup
  • service_rke2.json package definitions for RHEL 10.0 x86_64

Key differences from kubeadm path

  • RKE2 uses built-in containerd (no CRI-O); RKE2 manages the CNI lifecycle
  • Token-based cluster join via the kube-vip VIP on port 9345 (RKE2 supervisor API)
  • kube-vip runs as a kubelet static pod (/var/lib/rancher/rke2/agent/pod-manifests/) with the kubeconfig mounted at /etc/kubernetes/admin.conf
  • Air-gapped system images: official RKE2 image tarballs (core + selected CNI, v1.35.1+rke2r1) are downloaded by local_repo, staged on the NFS share, and auto-imported from /var/lib/rancher/rke2/agent/images/ on every node
  • registries.yaml mirrors public registries to the Pulp mirror for all other images

Backward compatibility

Existing kubeadm deployment is completely unaffected when service_k8s is selected (the default). All branching logic only activates for service_rke2.

@j0hnL j0hnL force-pushed the feature/rke2-service-cluster branch 3 times, most recently from f513dc2 to 11db64e Compare May 26, 2026 19:22
@j0hnL j0hnL force-pushed the feature/rke2-service-cluster branch 2 times, most recently from 0759230 to b4bff18 Compare May 27, 2026 14:29
@j0hnL j0hnL changed the base branch from main to pub/q2_dev May 27, 2026 14:29
@j0hnL j0hnL force-pushed the feature/rke2-service-cluster branch from b4bff18 to ea1db21 Compare May 27, 2026 15:03
@sujit-jadhav sujit-jadhav requested a review from snarthan May 27, 2026 15:40

@sujit-jadhav sujit-jadhav left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input/omnia_config.yml

Can we segrigate the blocks as per the k8s destribution?

service_k8s_cluster:

  • cluster_name: service_cluster
    deployment: true
    etcd_on_local_disk: false
    k8s_distro: "kubeadm"
    k8s_cni: "calico"
    pod_external_ip_range: "172.16.107.170-172.16.107.200"
    k8s_service_addresses: "10.233.0.0/18"
    k8s_pod_network_cidr: "10.233.64.0/18"
    nfs_storage_name: "nfs_k8s"
    k8s_crio_storage_size: "20G"
    csi_powerscale_driver_secret_file_path: ""
    csi_powerscale_driver_values_file_path: ""

We can add new block:
service_rke2_k8s_cluster

and have specific settings for it.

I will ask team to change the existing service_k8s_cluster to service_upstream_k8s_cluster

Same way we can add service_charmed_k8s_cluster for canonical.

service_upstream_k8s_cluster
service_rke2_k8s_cluster
service_charmed_k8s_cluster

"cluster": [
{ "package": "docker.io/library/busybox", "type": "image", "tag": "1.36" },
{ "package": "firewalld", "type": "rpm", "repo_name": "x86_64_baseos" },
{ "package": "python3-firewall", "type": "rpm", "repo_name": "x86_64_baseos" },

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

software_config.json changes required to add service_rke2 and need to add input validation for the samea

@j0hnL j0hnL force-pushed the feature/rke2-service-cluster branch from ea1db21 to 7cb6f0c Compare May 29, 2026 20:02
@j0hnL j0hnL requested a review from sujit-jadhav May 29, 2026 20:12
…ster

Add support for deploying RKE2 (Rancher Kubernetes Engine 2) as an alternative
to vanilla Kubernetes (kubeadm) for the service cluster. Users can choose the
distribution via the new k8s_distro field in omnia_config.yml.

Configuration changes:
- Add k8s_distro field to service_k8s_cluster in omnia_config.yml (default: kubeadm)
- Update omnia_config.json schema to validate k8s_distro enum (kubeadm/rke2)
- Expand k8s_cni enum to include canal, cilium (RKE2-supported CNIs)

Pipeline integration:
- discovery.yml: Enable service_k8s tag when service_rke2 is in software_config.json
- include_software_config.yml: Detect service_rke2 and set service_rke2_support fact
- validate_software_config_json.yml: Handle service_rke2 arch and version detection
- common_validation.py: Accept service_rke2 for cluster validation
- image_package_collector.py: Dynamically select service_rke2.json for image builds
- k8s_config/main.yml: Branch NFS setup based on k8s_distro
- configure_cloud_init_group.yml: Select RKE2-specific templates when k8s_distro=rke2

New files:
- 3 RKE2 cloud-init templates (first server, additional server, agent)
- create_rke2_config_nfs.yml for RKE2-specific NFS directory setup
- service_rke2.json package definitions for RHEL 10.0 x86_64

Key differences from kubeadm path:
- RKE2 uses built-in containerd (no CRI-O)
- RKE2 manages CNI lifecycle (calico/canal/cilium/flannel)
- Token-based cluster join instead of kubeadm certificates
- kube-vip deployed as RKE2 static pod manifest
- RKE2 registries.yaml for Pulp mirror integration
- Port 9345 for RKE2 supervisor API

Existing kubeadm deployment is completely unaffected when k8s_distro=kubeadm (default).

Signed-off-by: John Lockman <j.lockman@dell.com>
@j0hnL j0hnL force-pushed the feature/rke2-service-cluster branch from 7cb6f0c to f2ba1dc Compare June 2, 2026 15:02
@j0hnL j0hnL requested a review from abhishek-sa1 June 2, 2026 16:17
j0hnL and others added 2 commits June 10, 2026 19:49
- Read NFS share details from storage_config.yml 'mounts' (nfs_client_params
  no longer exists); use k8s_nfs_server_path in cloud-init fstab and
  nfs-client-provisioner values
- Add rke2 (stable/1.35) and rke2-common RPM repos to local_repo_config.yml;
  source rke2-selinux from rke2-common
- Set cluster-cidr/service-cidr on joining servers to avoid critical
  configuration value mismatch
- Stage official RKE2 airgap image tarballs (core + CNI, v1.35.1+rke2r1) on
  the NFS share and import via /var/lib/rancher/rke2/agent/images so system
  images work offline
- Deploy kube-vip as a kubelet static pod in agent/pod-manifests (not the
  server/manifests AddOn dir), mount kubeconfig at /etc/kubernetes/admin.conf,
  and place it after rke2.yaml exists to avoid hostPath dir race

Signed-off-by: John Lockman <j.lockman@dell.com>
- Fix validation guard: use 'service_rke2' in tag_names (not 'service_k8s') when populating service_rke2_k8s_cluster for validation
- Fix airgap image download: url used .tar.gz extension but files are .tar.zst
- Add pod_external_ip_range and k8s_pod_network_cidr to required fields in omnia_config.json schema
- Fix set -e placement: move to top of inline shell block in additional control plane cloud-init
- Fix heredoc indentation: config.yaml content was over-indented by 2 spaces producing invalid YAML
- Fix omnia_config.yml example: deployment should be true, not false
- Move RKE2 repo comments inline with the two new repo entries in local_repo_config.yml
- Add arch validation: error if service_rke2 is configured with non-x86_64 arch
- Remove redundant inline comments across cloud-init templates and image_package_collector.py
- Remove unused HELM_VERSION variable from install-helm.sh in both CP templates
- Fix typo: 'nfs sever' → 'nfs server' in create_rke2_config_nfs.yml task name

Signed-off-by: John Lockman <jlockman3@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@j0hnL j0hnL force-pushed the feature/rke2-service-cluster branch from f606d62 to 174c1cb Compare June 16, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants