Hej KubeVirt

Hej KubeVirt #

Harsha Krishna, 2024-04-29

Earlier this month we announced that we are moving to KubeVirt. We had detailed how the new system would work.

Thanks to the efforts of Emil and Pierre, the move to KubeVirt is almost complete. We are in the process of moving the final apps which are running using the old virtual machine system to the new version.

Goodbye to CloudStack #

The first cluster was setup with few users. We were using Apache CloudStack, to orchestrate both Kubernetes clusters as well as independent virtual machines and to administer them. We used this design as we needed a way to administer the distribution of graphic cards, which were bound to a given virtual machine.

While this worked for a while, the increase in number of users in the cloud and the demand for the least available resource, the GPU, also increased. We needed a better way to distribute them.

While CloudStack is good for a very large data-centers type setup, it requires significant effort to maintain. CloudStack has limited community and still lacks some of the features for our requirements, which is based on donated hardware (We don’t always know what we get :smiley face: ). It uses an extensive combination of virtual networks and system virtual machines in order to orchestrate various clusters and VMs across different centers. Things really came to head when we had the incident in March which required an immediate shutdown following a failed upgrade. At this time the entire cluster became unstable.

Hello KubeVirt #

Fortunately, Emil was testing a new method to create VMs within a Kubernetes cluster. This allowed us to run the entire infrastructure as pure Kubernetes system. It also has the added advantage of running different loads as scheduled jobs with or without GPUs. This allows for quicker rotation of GPUs and possibly fairer sharing.

The new system will use Rancher to manage individual Kubernetes clusters. As users you will notice some minor changes to the UI when you log into your cloud console. However, thanks to Pierre and Emil, a lot of the complexity is now under the hood.

What to expect and timeline #

The upgrade to the servers are almost complete. Soon we will retire the CloudStack instance. We thank users for their patience. Any new system being deployed will already use the new system. We will complete and finalise the rollout in 2 weeks.

NOTE: that older VMs created in CloudStack will be deleted. If your VM is running and you need to make backups of your data or configurations, we request you to do so within this week. There will be no effect on your Kubernetes instances.

Please feel free to ping us on Discord for any information and we thank you for everyone’s continued support.