Skip to content

Mark virtual machines as tainted and pick untainted machines from the vm pool

Vaishant Kameswaran requested to merge pool-lock-1 into main

Add migration file to add tainted_at column to the vm

Mark virtual machines as tainted

At a certain stage in a virtual machine's lifecycle, it may become tainted with user data.

From the moment we start the hypervisor for the first time, it's uncertain when the user logged in and began processing. After this, the virtual machine is deemed tainted if it wasn't created for a VM pool. Virtual machines from the pool are considered tainted once they are selected from the pool. Customers cannot access them until they are chosen from the pool. If the virtual machine hasn't been marked as tainted before its destroy, we mark it as tainted during the destroy process.

When a virtual machine is considered tainted, it theoretically contains irreplaceable and private data.

We can develop various logics based on the taint value.

For instance, we discussed some 'recreate' logics for virtual machines. We can easily recreate untainted virtual machines, but we need to exercise caution with tainted ones.

Furthermore, the taint value can be utilized in our pool logic. We can select only untainted virtual machines from the pool.

Use tainted value to pick a vm from the pool

We don't disassociate the virtual machines from the pool when picked because we aim to maintain a fixed size of virtual machines provisioned by the pool. We join the VM dataset with the runner table to identify idle virtual machines, a practice that previously led to an incident. This incident occurred when I modified the runner to not wait for the destruction of the VM, resulting in virtual machines being selected by two different runners.

In addition, we lock all virtual machines in the pool while picking one, it inadvertently locks the picked virtual machine that's in use by a runner, leading to deadlocks. We shouldn't lock the virtual machines that have already been picked.

We can use the tainted value to pick an idle virtual machine from the pool. We can also use the updated row count to manage concurrent virtual machine selections by multiple threads. This approach allows us to avoid locking the virtual machines table entirely.

Use provisioned at instead of display state to pick a vm from the pool

The 'display_state' function is used to show the current state of the virtual machines to the customer, and it's dynamically calculated in certain cases. The 'display' prefix implies that we shouldn't use it into our backend logics. 'provisioned_at' serves as a more reliable indicator to determine whether the virtual machine has been provisioned.

Merge request reports