How to Monitor KVM Virtual Machines
KVM (Kernel-based Virtual Machine) creates a unique monitoring challenge that sits between bare metal server monitoring and container orchestration. Unlike monitoring a single server where you track one operating system, or containers where you monitor lightweight processes, KVM introduces a hypervisor layer that manages multiple complete virtual machines, each with their own operating systems, storage, and network configurations.
The complexity emerges from needing visibility into both the host system running KVM and the guest virtual machines, while understanding how resource allocation, storage pools, and network bridges affect overall performance. A KVM host might show perfect CPU and memory utilization while individual VMs suffer from resource contention, storage bottlenecks, or network performance issues that standard Linux monitoring tools simply can't detect.
This guide walks you through implementing comprehensive KVM monitoring using native Linux tools and libvirt commands, giving you hypervisor-level visibility without requiring expensive enterprise solutions or complex monitoring frameworks.
What Makes KVM Monitoring Different
Traditional server monitoring focuses on system-level metrics: CPU usage, memory consumption, disk I/O, and network throughput. These metrics tell you how the physical hardware is performing, but they don't reveal what's happening inside your virtual machines or how the hypervisor is managing resources between them.
KVM monitoring requires understanding several distinct layers. The host system provides the foundation, running the KVM hypervisor and managing physical resources. The hypervisor layer handles resource allocation, scheduling, and isolation between VMs. Each guest VM operates as a complete system with its own resource usage patterns. Storage pools manage disk allocation across multiple VMs, and virtual networks handle communication between guests and external systems.
Consider a scenario where your host system shows 60% CPU utilization across 8 cores, but one VM is experiencing severe performance issues. Standard monitoring might suggest you have plenty of capacity, but KVM-specific tools would reveal that this particular VM is CPU-starved due to poor resource allocation or CPU pinning configuration.
Storage presents another complexity layer. While traditional monitoring tracks filesystem usage and disk I/O, KVM environments use storage pools that can span multiple physical devices, support different formats (qcow2, raw, LVM), and handle features like snapshots and live migration. A storage pool might appear healthy from a filesystem perspective while suffering from performance degradation that affects all VMs using that pool.
Essential KVM Monitoring Commands
The virsh command serves as your primary interface for KVM monitoring and management. Start with basic VM status monitoring using virsh list --all to see all virtual machines and their current states. This command shows you which VMs are running, paused, or shut off, providing the foundation for any monitoring setup.
For real-time resource monitoring, virt-top provides a top-like interface specifically designed for virtualized environments. Install it using your distribution's package manager, then run it to see CPU, memory, and I/O usage across all running VMs:
sudo apt install virt-top # Ubuntu/Debian
sudo yum install virt-top # CentOS/RHEL
virt-topThe virsh domstats command provides detailed performance statistics for running VMs. Use virsh domstats --cpu-total --memory --block --net to collect comprehensive metrics for all running domains. This command outputs structured data that's perfect for parsing in monitoring scripts.
Monitor individual VM performance using domain-specific commands. virsh dominfo vm-name shows configuration details and current resource allocation, while virsh domstats vm-name provides real-time performance data for that specific VM.
Storage pool monitoring uses virsh pool-list --all to show all storage pools and their states. Get detailed pool information with virsh pool-info pool-name, which shows capacity, allocation, and available space. For storage volumes within pools, use virsh vol-list pool-name to see all volumes and their sizes.
Network monitoring involves checking virtual network status with virsh net-list --all and getting detailed network information using virsh net-info network-name. For VM-specific network statistics, virsh domifstat vm-name interface shows network I/O counters for specific network interfaces.
Setting Up Automated KVM Metrics Collection
Create a comprehensive KVM monitoring script that collects hypervisor and VM metrics systematically. Start with a shell script that gathers basic VM information and resource usage:
#!/bin/bash
# kvm-metrics-collector.sh
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
METRICS_DIR="/var/log/kvm-metrics"
mkdir -p "$METRICS_DIR"
# Collect VM list and states
echo "[$TIMESTAMP] Collecting VM states..." >> "$METRICS_DIR/vm-states.log"
virsh list --all >> "$METRICS_DIR/vm-states.log"
# Collect detailed stats for running VMs
for VM in $(virsh list --state-running --name); do
echo "[$TIMESTAMP] $VM stats:" >> "$METRICS_DIR/vm-performance.log"
virsh domstats --cpu-total --memory --block --net "$VM" >> "$METRICS_DIR/vm-performance.log"
done
# Collect storage pool information
echo "[$TIMESTAMP] Storage pool status:" >> "$METRICS_DIR/storage-pools.log"
for POOL in $(virsh pool-list --name); do
virsh pool-info "$POOL" >> "$METRICS_DIR/storage-pools.log"
doneMake the script executable and test it manually before automating. The script creates structured logs that you can parse for specific metrics or integrate with log aggregation systems.
Set up automated collection using systemd timers, which provide more reliable scheduling than cron for system-level tasks. Create a systemd service file at /etc/systemd/system/kvm-metrics.service:
[Unit]
Description=KVM Metrics Collection
After=libvirtd.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/kvm-metrics-collector.sh
User=rootCreate the corresponding timer file at /etc/systemd/system/kvm-metrics.timer:
[Unit]
Description=Run KVM metrics collection every 5 minutes
Requires=kvm-metrics.service
[Timer]
OnCalendar=*:0/5
Persistent=true
[Install]
WantedBy=timers.targetEnable and start the timer with systemctl enable kvm-metrics.timer && systemctl start kvm-metrics.timer. Check timer status using systemctl status kvm-metrics.timer and view recent executions with journalctl -u kvm-metrics.service.
For Prometheus integration, create a custom exporter that converts KVM metrics to Prometheus format. This Python script parses virsh output and exposes metrics on a web endpoint:
#!/usr/bin/env python3
# kvm-prometheus-exporter.py
import subprocess
import re
from http.server import HTTPServer, BaseHTTPRequestHandler
class KVMMetricsHandler(BaseHTTPRequestHandler):
def do_GET(self):
if self.path == '/metrics':
self.send_response(200)
self.send_header('Content-type', 'text/plain')
self.end_headers()
metrics = self.collect_kvm_metrics()
self.wfile.write(metrics.encode())
else:
self.send_response(404)
self.end_headers()
def collect_kvm_metrics(self):
metrics = []
# Get running VMs
result = subprocess.run(['virsh', 'list', '--state-running', '--name'],
capture_output=True, text=True)
for vm_name in result.stdout.strip().split('\n'):
if vm_name:
# Get VM stats
stats_result = subprocess.run(['virsh', 'domstats', '--cpu-total',
'--memory', vm_name],
capture_output=True, text=True)
for line in stats_result.stdout.split('\n'):
if 'cpu.time=' in line:
cpu_time = line.split('=')[1]
metrics.append(f'kvm_cpu_time_total{{vm="{vm_name}"}} {cpu_time}')
elif 'memory.actual=' in line:
memory = line.split('=')[1]
metrics.append(f'kvm_memory_actual_bytes{{vm="{vm_name}"}} {memory}')
return '\n'.join(metrics) + '\n'
if __name__ == '__main__':
server = HTTPServer(('localhost', 9177), KVMMetricsHandler)
server.serve_forever()Monitoring VM Performance
One of KVM's advantages is the ability to monitor guest VM performance from the hypervisor level without installing monitoring agents inside each VM. This approach reduces overhead and works regardless of the guest operating system.
Use virsh domblkstat vm-name to monitor disk I/O performance for specific VMs. This command shows read/write operations, bytes transferred, and I/O timing information. For continuous monitoring, create a script that samples these statistics at regular intervals:
#!/bin/bash
# Monitor disk I/O for a specific VM
VM_NAME="$1"
INTERVAL=5
while true; do
echo "$(date): Disk stats for $VM_NAME"
virsh domblkstat "$VM_NAME"
echo "---"
sleep "$INTERVAL"
doneNetwork performance monitoring uses virsh domifstat vm-name interface to track network I/O. First, identify the VM's network interfaces using virsh domiflist vm-name, then monitor specific interfaces. Network statistics include packets transmitted/received, bytes transferred, and error counts.
Memory balloon monitoring helps optimize memory allocation across VMs. The memory balloon driver allows the hypervisor to reclaim unused memory from guests dynamically. Monitor balloon statistics using virsh domstats --balloon vm-name to see current balloon size and memory pressure indicators.
CPU performance monitoring involves tracking CPU time, utilization, and pinning effectiveness. Use virsh vcpuinfo vm-name to see virtual CPU mapping to physical CPUs and current utilization. For VMs with CPU pinning configured, verify that the pinning is working effectively by monitoring CPU usage on specific cores using tools like htop or mpstat.
Create a comprehensive VM performance monitoring script that samples multiple metrics:
#!/bin/bash
# vm-performance-monitor.sh
VM_NAME="$1"
DURATION="${2:-300}" # Default 5 minutes
INTERVAL="${3:-10}" # Default 10 seconds
echo "Monitoring $VM_NAME for $DURATION seconds..."
END_TIME=$(($(date +%s) + DURATION))
while [ $(date +%s) -lt $END_TIME ]; do
TIMESTAMP=$(date '+%H:%M:%S')
echo "[$TIMESTAMP] === VM Performance Stats ==="
# CPU stats
echo "CPU Usage:"
virsh domstats --cpu-total "$VM_NAME" | grep cpu.time
# Memory stats
echo "Memory Usage:"
virsh domstats --memory "$VM_NAME" | grep -E "(memory.actual|memory.unused)"
# Disk I/O
echo "Disk I/O:"
virsh domblkstat "$VM_NAME" | head -4
# Network I/O (assuming first interface)
INTERFACE=$(virsh domiflist "$VM_NAME" | awk 'NR==3 {print $1}')
if [ -n "$INTERFACE" ]; then
echo "Network I/O ($INTERFACE):"
virsh domifstat "$VM_NAME" "$INTERFACE" | head -4
fi
echo ""
sleep "$INTERVAL"
doneIntegrating KVM Metrics with Existing Monitoring Stacks
You probably already have monitoring infrastructure in place, so integrating KVM metrics into existing systems provides better value than building separate monitoring solutions. The key is creating exporters or collectors that translate KVM-specific metrics into formats your monitoring stack understands.
For Prometheus environments, extend the node_exporter with custom KVM metrics using the textfile collector. Create a script that generates KVM metrics in Prometheus format and writes them to the textfile directory:
#!/bin/bash
# kvm-textfile-exporter.sh
TEXTFILE_DIR="/var/lib/node_exporter/textfile_collector"
TEMP_FILE="$TEXTFILE_DIR/kvm_metrics.prom.tmp"
FINAL_FILE="$TEXTFILE_DIR/kvm_metrics.prom"
# Ensure directory exists
mkdir -p "$TEXTFILE_DIR"
# Start with empty metrics file
echo "# HELP kvm_vm_state VM state (1=running, 0=not running)" > "$TEMP_FILE"
echo "# TYPE kvm_vm_state gauge" >> "$TEMP_FILE"
# Collect VM states
for VM in $(virsh list --all --name); do
if [ -n "$VM" ]; then
STATE=$(virsh domstate "$VM")
if [ "$STATE" = "running" ]; then
echo "kvm_vm_state{vm=\"$VM\"} 1" >> "$TEMP_FILE"
else
echo "kvm_vm_state{vm=\"$VM\"} 0" >> "$TEMP_FILE"
fi
fi
done
# Add storage pool metrics
echo "# HELP kvm_pool_capacity_bytes Storage pool total capacity" >> "$TEMP_FILE"
echo "# TYPE kvm_pool_capacity_bytes gauge" >> "$TEMP_FILE"
echo "# HELP kvm_pool_allocation_bytes Storage pool current allocation" >> "$TEMP_FILE"
echo "# TYPE kvm_pool_allocation_bytes gauge" >> "$TEMP_FILE"
for POOL in $(virsh pool-list --name); do
if [ -n "$POOL" ]; then
CAPACITY=$(virsh pool-info "$POOL" | grep Capacity | awk '{print $2}' | sed 's/[^0-9]//g')
ALLOCATION=$(virsh pool-info "$POOL" | grep Allocation | awk '{print $2}' | sed 's/[^0-9]//g')
if [ -n "$CAPACITY" ]; then
echo "kvm_pool_capacity_bytes{pool=\"$POOL\"} $CAPACITY" >> "$TEMP_FILE"
fi
if [ -n "$ALLOCATION" ]; then
echo "kvm_pool_allocation_bytes{pool=\"$POOL\"} $ALLOCATION" >> "$TEMP_FILE"
fi
fi
done
# Atomically move the file
mv "$TEMP_FILE" "$FINAL_FILE"Run this script every minute via cron or systemd timer. The node_exporter will automatically pick up the metrics and make them available to Prometheus.
For Grafana dashboards, create KVM-specific panels that visualize hypervisor health and VM performance. Key metrics to display include VM states over time, resource utilization per VM, storage pool capacity trends, and hypervisor-level resource allocation. Use Grafana's template variables to create dynamic dashboards that allow filtering by VM name or storage pool.
Set up alerting rules that understand KVM-specific failure modes. Unlike container orchestration where failed containers restart automatically, VM failures often require manual intervention. Create alerts for VM state changes, storage pool capacity thresholds, and resource contention between VMs:
# prometheus-kvm-alerts.yml
groups:
- name: kvm-alerts
rules:
- alert: KVM_VM_Down
expr: kvm_vm_state == 0
for: 1m
labels:
severity: critical
annotations:
summary: "VM {{ $labels.vm }} is not running"
- alert: KVM_Storage_Pool_Full
expr: (kvm_pool_allocation_bytes / kvm_pool_capacity_bytes) > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "Storage pool {{ $labels.pool }} is {{ $value | humanizePercentage }} full"Common KVM Performance Issues
KVM performance problems often manifest as symptoms that seem unrelated to virtualization. A web application might respond slowly, a database might show high query times, or users might report general sluggishness. The key is knowing how to trace these symptoms back to hypervisor-level issues.
CPU performance problems in KVM environments typically involve resource contention, poor CPU pinning, or inadequate CPU allocation. Start troubleshooting with virsh vcpuinfo vm-name to see how virtual CPUs map to physical cores. If multiple VMs are pinned to the same physical cores, you'll see contention during high-load periods.
Use virsh schedinfo vm-name to examine CPU scheduling parameters. The CPU shares value determines how much CPU time a VM gets relative to other VMs when resources are constrained. VMs with default shares (1024) compete equally, but you can adjust these values based on workload priorities.
Memory performance issues often stem from overcommitment or ineffective memory ballooning. Check memory allocation using virsh dominfo vm-name to see configured memory versus actual usage. If the sum of all VM memory allocations exceeds physical RAM, you're overcommitted and likely experiencing swap pressure.
Monitor memory balloon effectiveness using virsh domstats --balloon vm-name. If balloon statistics show high memory pressure or frequent balloon adjustments, the VM might need more dedicated memory allocation.
Storage performance problems require examining both the storage pool level and individual VM disk I/O patterns. Use virsh domblkstat vm-name to identify VMs with high I/O rates, then correlate this with storage pool performance using iotop or iostat on the host system.
Check storage pool configuration with virsh pool-dumpxml pool-name to verify the underlying storage configuration. Pools backed by spinning disks will show different performance characteristics than SSD-backed pools, and LVM-based pools behave differently than file-based storage.
Network performance issues in KVM environments often involve bridge configuration, bandwidth limits, or virtual network topology problems. Start with virsh domiflist vm-name to see network interface configuration, then check bridge status using brctl show or ip link show.
Monitor network I/O patterns using virsh domifstat vm-name interface and look for error counters or dropped packets. High error rates often indicate bridge configuration problems or physical network interface issues on the host.
Production KVM Monitoring Automation
Production KVM environments require automated monitoring that scales with your infrastructure and integrates with existing operational workflows. This means automated VM discovery, configuration management for monitoring components, and integration with alerting and incident response systems.
Implement automated VM discovery using libvirt events and hooks. Create a script that monitors for VM lifecycle events and automatically configures monitoring for new VMs:
#!/bin/bash
# kvm-monitoring-hook.sh
# Place in /etc/libvirt/hooks/qemu
GUEST_NAME="$1"
OPERATION="$2"
case "$OPERATION" in
"started")
echo "$(date): VM $GUEST_NAME started, adding to monitoring" >> /var/log/kvm-monitoring.log
# Add VM to monitoring configuration
/usr/local/bin/add-vm-monitoring.sh "$GUEST_NAME"
;;
"stopped")
echo "$(date): VM $GUEST_NAME stopped, updating monitoring" >> /var/log/kvm-monitoring.log
# Update monitoring to reflect stopped state
/usr/local/bin/update-vm-monitoring.sh "$GUEST_NAME" "stopped"
;;
esacMake the hook executable and ensure it has appropriate permissions. Libvirt will call this script automatically when VMs start, stop, or change states.
For configuration management, use tools like Ansible to deploy and maintain KVM monitoring configurations across multiple hypervisor hosts. Create playbooks that install monitoring scripts, configure systemd timers, and set up log rotation:
# ansible-playbook kvm-monitoring.yml
---
- hosts: kvm_hosts
become: yes
tasks:
- name: Install KVM monitoring scripts
copy:
src: "{{ item }}"
dest: /usr/local/bin/
mode: '0755'
with_items:
- kvm-metrics-collector.sh
- kvm-textfile-exporter.sh
- name: Configure systemd timer for metrics collection
template:
src: kvm-metrics.timer.j2
dest: /etc/systemd/system/kvm-metrics.timer
notify: restart kvm-metrics timer
- name: Enable KVM metrics collection
systemd:
name: kvm-metrics.timer
enabled: yes
state: startedLog aggregation becomes critical in multi-host KVM environments. Configure rsyslog or journald to forward KVM-related logs to a central logging system. Create custom log parsing rules that extract meaningful events from libvirt and QEMU logs:
# /etc/rsyslog.d/50-kvm.conf
# Forward KVM-related logs to central server
if $programname == 'libvirtd' then @@logserver.example.com:514
if $programname startswith 'qemu' then @@logserver.example.com:514
# Local logging with rotation
$template KVMLogFormat,"/var/log/kvm/%programname%.log"
if $programname == 'libvirtd' then ?KVMLogFormat
if $programname startswith 'qemu' then ?KVMLogFormatImplement backup monitoring for VM snapshots and exports. Create scripts that verify backup completion, test restore procedures, and alert on backup failures. This is particularly important for KVM environments where backup strategies often involve live snapshots or VM exports that can fail silently.
While building KVM monitoring with native Linux tools provides deep visibility and control, managing all these scripts, timers, and integrations can become complex as your virtualization infrastructure grows. Fivenines offers ready-made KVM monitoring that automatically discovers your virtual machines, tracks hypervisor performance, and integrates seamlessly with your existing infrastructure, letting you focus on managing your VMs rather than building monitoring tools.