A cluster is a logical grouping of identical hosts that enables live migration—the ability to move running virtual machines between hosts without any downtime. This single capability is what makes CloudStack (and modern cloud infrastructure) so powerful.
What is a Cluster?
A cluster is:
- Logical — a software grouping, not physical hardware
- Homogeneous — all hosts must run the same hypervisor (all KVM, all Hyper-V, etc.)
- Self-managed — CloudStack automatically schedules instances and migrations within the cluster
- Migratable — instances can move between hosts without stopping
Where Clusters Fit
Region
└─ Zone (Typically = 1 Datacenter)
└─ Pod (Usually = 1 Physical Rack)
├─ Cluster 1 (All KVM)
│ ├─ Host 1, Host 2, Host 3
│ └─ Primary Storage
│
├─ Cluster 2 (All Hyper-V)
│ ├─ Host 4, Host 5
│ └─ Primary Storage
│
└─ Cluster 3 (All vSphere)
├─ Host 6, Host 7
└─ Primary Storage
Notice: multiple clusters can exist in one pod, but each runs a different hypervisor type. You can’t mix KVM and Hyper-V in the same cluster.
Why Homogeneity Matters: Live Migration
Live migration means moving a running instance from one host to another while it keeps running. The instance experiences maybe 50 milliseconds of pause—users don’t notice.
How Live Migration Works
Step 1: Copy instance memory while it runs on Host 1
Step 2: Copy changed memory pages again
Step 3: Brief pause (50ms), copy final state
Step 4: Instance resumes on Host 2
Result: Instance moved with imperceptible downtime
Why All Hosts Must Be Identical
When you migrate an instance, you’re copying its CPU state, memory, and storage information. All three must be compatible:
CPU State (hypervisor-specific):
├─ KVM format → KVM host (works)
├─ KVM format → Hyper-V host (fails - different format)
└─ Hyper-V format → KVM host (fails - incompatible)
Memory Layout (architecture-dependent):
├─ x86-64 → x86-64 (works)
├─ x86-64 → ARM (fails - completely different)
└─ Must be identical to preserve memory addresses
Storage Commands (hypervisor-specific):
├─ KVM uses libvirt
├─ Hyper-V uses PowerShell
└─ They speak different languages
If hosts aren’t identical, live migration fails. That’s why CloudStack enforces the homogeneity rule.
Real Impact: Zero-Downtime Maintenance
Before live migration (traditional approach):
Shutdown all instances: 5 minutes
Update and reboot host: 15 minutes
Boot all instances: 10 minutes
Total downtime: 30 minutes
With live migration:
Live-migrate instances to other hosts: 5 minutes
Update and reboot host: 15 minutes
Total downtime: 0 minutes
Your infrastructure can be maintained without anyone noticing.
When Live Migration Saves You
Host failure: Instance automatically moves to healthy host before it crashes.
Load balancing: CloudStack detects overloaded host and live-migrates instances to balance load.
Disaster recovery: If a zone goes down, instances failover to another zone with zero downtime.
Planned maintenance: Update kernels, patch security issues, replace hardware—all without stopping instances.
The Business Impact
Without Live Migration:
├─ 95% uptime SLA (36.5 hours down/year)
└─ Customers worry about reliability
With Live Migration:
├─ 99.9% uptime SLA (8.76 hours down/year)
├─ Or 99.99% uptime SLA (52 minutes down/year)
└─ Customers trust your platform
Key Takeaway
Clusters aren’t just organizational units—they’re the enabler of live migration. Because all hosts in a cluster are identical, instances can move freely between them while running. This transforms cloud infrastructure from fragile (vulnerable to any host failure) into resilient (self-healing through automatic migration).
The homogeneity requirement is the price of a powerful capability that makes your infrastructure bulletproof.