What do you do when somebody comes to live next door and spoils the whole neighborhood? They leave, you leave or you learn to live with it.
What does that have to do with cloud computing? I will give you an example.
This site is currently hosted on an Amazon EC2 virtual machine (VM). In all likelihood, there are other VMs hosted on the same physical machine. Let’s call these other VMs neighbors.
Over the past few weeks I had several occasions where this site was really slow, for extended periods, like 20 minutes. Deeper inspection revealed that a lot of the CPU time was reported as ‘stolen’. See here for more explanation on stolen CPU time. Looking further over a period of time with tools such as NewRelic leads to the hypothesis that another VM on that physical machine is running CPU bound batches. That is a bad neighbor.
I don’t want to live with this, and I cannot force this neighbor out. So I have to move. This turned out to be easier than I thought. On Amazon EC2 my VM is running with EBS storage, which means its ‘hard disk’ exists even if the VM is gone. The VM instance also has a so-called Elastic IP address, which is independent of the physical machine.
So I stopped the VM, and restarted it. It then landed on a different physical machine with different neighbors. Users saw only a brief interruption, nothing worse than what we were going through anyway.
With proper load balancing, users would not notice it at all. Apparently, Netflix is reportedly doing the same thing: killing VMs that have landed in bad neighborhoods.