Play with Libvirt/KVM
Create the VMs
Following IBM KVM installation guide to install kvm and libvirt. Play on a virtual CentOS 7. (There is also a developer version guide)
# check Intel-VT and AMD-V, the hardware-assisted virtualization is supported
grep -E 'vmx|svm' /proc/cpuinfo
# ensure you are not running Xen kernel
uname -a # see '2.6.18-164.el5Xen' means Xen kernel is running
# install packages
yum install -y kvm virt-manager libvirt libvirt-python python-virtinst virt-install qemu-kvm
# start libvirt
service libvirtd start
chkconfig libvirtd on
Download a cirros image to use. qemu-img is versatile.
wget --no-check-certificate https://download.cirros-cloud.net/0.3.2/cirros-0.3.2-x86_64-disk.img
qemu-img info cirros-0.3.2-x86_64-disk.img # show image format (qcow2)
Create the cirros guest. Note that the ~/cirros-0.3.2-x86_64-disk.img will be modified and used as VM’s disk file. The owner is changed to qemu:qemu.
# install seabios to prevent "qemu: could not load PC BIOS 'bios-256k.bin'" error
yum install seabios-bin
# create and start cirros
virt-install --connect=qemu:///system --name=cirros --ram=512 --vcpus=1 --disk path=cirros-0.3.2-x86_64-disk.img,format=qcow2 --import --network network:default --vnc
The configuration locates at /etc/libvirt/qemu/cirros.xml. Change it using
ls /etc/libvirt/qemu/cirros.xml
virsh edit cirros
To find out process and command arguments to kvm/qemu
$ ps -ef|grep -E "kvm|qemu"
qemu 13354 1 8 05:26 ? 00:01:06 /usr/bin/qemu-system-x86_64 -name cirros -S -machine pc-i440fx-2.0,accel=tcg,usb=off -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid ffd5da2b-61fc-49ad-8007-95e6f6ea9fc0 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/cirros.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/root/cirros-0.3.2-x86_64-disk.img,if=none,id=drive-ide0-0-0,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=23,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:3c:f7:ba,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
root 14341 27909 0 05:39 pts/0 00:00:00 grep --color=auto -E kvm|qemu
To find vnc port
$ virsh vncdisplay cirros
127.0.0.1:0
Repeating the above steps, I have create another VM ‘cirros2’, using ~/irros-0.3.2-x86_64-disk.img.1 as disk file
$ virsh list
Id Name State
----------------------------------------------------
8 cirros running
13 cirros2 running
Create VM, the XML way
Libvirt KVM VM is purely defined by the XML under /etc/libvirt/qemu/, including the disk file path. Format in here. To create a new VM using the XML file,
# get cirros-0.3.2-x86_64-disk.img.2
wget --no-check-certificate https://download.cirros-cloud.net/0.3.2/cirros-0.3.2-x86_64-disk.img
cp /etc/libvirt/qemu/cirros2.xml cirros3.xml
vim cirros3.xml
... # change name.cirros2, delete uuid, change device.disk.source[file], replace last 3 bytes of 'mac address',
virsh define cirros3.xml
ll /etc/libvirt/qemu/cirros3.xml
virsh start cirros3
After this, cirros3 is successfully started.
$ virsh list
Id Name State
----------------------------------------------------
8 cirros running
13 cirros2 running
14 cirros3 running
Libvirt in Openstack
If you use boot VM from image (rather than boot from volume) in Openstack. When you boot a VM in nova, image file will be ‘GET’ from glance and put in local file system ‘/var/lib/nova/instances/_base’ of the compute node. Refer to https://lists.launchpad.net/openstack/msg08074.html.
# on a openstack compute node
$ ll -h /var/lib/nova/instances/_base
total 16G
-rw-r--r-- 1 nova qemu 40G Nov 3 08:38 5dca4e25ea2410ac6c0615581e875f62a294b8db
-rw-r--r-- 1 nova qemu 20G Nov 3 08:38 68f3e3d4d52bf185741ec5dc374ed664e3f93797
-rwxr-xr-x 1 nova qemu 2.2G Nov 3 08:38 7a2c028dc0908c780fb56ad35527cd50ac1ad661
...
The disk of an VM is stored on compute’s local file system under ‘/var/lib/nova/instances/’. Each of the folder corresponds to a VM, with its openstack id as folder name.
# on the compute node
$ ll -h /var/lib/nova/instances/
032588cd-cbfa-4fe6-b627-f1e6d1fe6ddc 9042822a-d3f9-44cd-9694-4f45e5259257
1419c8c0-0d86-4b19-84b2-cdf9136d3aa6 _base
List one of the folder content.
# on the compute node
$ ll -h /var/lib/nova/instances/032588cd-cbfa-4fe6-b627-f1e6d1fe6ddc/
total 207M
-rw-rw---- 1 qemu qemu 21K Sep 9 04:04 console.log
-rw-r--r-- 1 qemu qemu 207M Nov 3 08:55 disk
-rw-r--r-- 1 nova nova 1.6K Sep 9 04:03 libvirt.xml
The libvirt definition xml file as below. The instance has id ‘032588cd-cbfa-4fe6-b627-f1e6d1fe6ddc’ in openstack, and id ‘instance-0002a893’ in libvirt. Access the VM through ‘https://
# on the compute node
$ less /var/lib/nova/instances/032588cd-cbfa-4fe6-b627-f1e6d1fe6ddc/libvirt.xml
<domain type="kvm">
<uuid>032588cd-cbfa-4fe6-b627-f1e6d1fe6ddc</uuid>
<name>instance-0002a893</name>
<memory>4194304</memory>
<vcpu>2</vcpu>
<sysinfo type="smbios">
<system>
<entry name="manufacturer">Red Hat Inc.</entry>
<entry name="product">OpenStack Nova</entry>
<entry name="version">2013.2.3-140612233026_ocp</entry>
<entry name="serial">8de7977e-195d-47ba-8ed2-19f5ddb728fc</entry>
<entry name="uuid">03e588cd-cbfa-4fe6-b626-f1e6d2fe6ddc</entry>
</system>
...
There is also a VM definition file in libvirt’s /etc/libvirt/qemu/. They look alike but with a little difference.
# on the compute node
$ less /etc/libvirt/qemu/instance-0002a893.xml
<domain type='kvm'>
<name>instance-0002a893</name>
<uuid>03e588cd-cbfa-4fe6-b626-f1e6d2fe6ddc</uuid>
<memory unit='KiB'>4194304</memory>
<currentMemory unit='KiB'>4194304</currentMemory>
<vcpu placement='static'>2</vcpu>
<sysinfo type='smbios'>
<system>
<entry name='manufacturer'>Red Hat Inc.</entry>
<entry name='product'>OpenStack Nova</entry>
<entry name='version'>2013.2.3-140612233026_ocp</entry>
<entry name='serial'>8de7977e-195d-47ba-8ed2-19f5ddb728fc</entry>
<entry name='uuid'>03e588cd-cbfa-4fe6-b626-f1e6d2fe6ddc</entry>
</system>
...
$ diff /var/lib/nova/instances/032588cd-cbfa-4fe6-b627-f1e6d1fe6ddc/libvirt.xml /etc/libvirt/qemu/instance-0002a893.xml
Play with Network
Usually a hypervisor gives you two network options to connect VMs to outside: the NAT mode and the Bridge mode.
The NAT Mode
In NAT mode we connect VM to a bridge (yes, it also uses bridge), and use SNAT to translate VM IPs to host ports. A DHCP runs on the bridge, by which the VMs get IP addresses. I.e., VM gets private IP. The bridge consumes one private IP address itself.
On default, libvirt kvm is using the NAT mode. Now let’s dive.
First, ifconfig on the host
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.224.147.166 netmask 255.255.255.0 broadcast 10.224.147.255
inet6 fe80::f816:3eff:fe98:559f prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:98:55:9f txqueuelen 1000 (Ethernet)
RX packets 1925933 bytes 400908368 (382.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 967162 bytes 269767657 (257.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 0 (Local Loopback)
RX packets 1666 bytes 129707 (126.6 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1666 bytes 129707 (126.6 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
virbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether fe:54:00:17:1e:56 txqueuelen 0 (Ethernet)
RX packets 944 bytes 88204 (86.1 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1008 bytes 110279 (107.6 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vnet0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::fc54:ff:fe3c:f7ba prefixlen 64 scopeid 0x20<link>
ether fe:54:00:3c:f7:ba txqueuelen 500 (Ethernet)
RX packets 548 bytes 63284 (61.8 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 10604 bytes 596604 (582.6 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vnet1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::fc54:ff:fedb:4c6c prefixlen 64 scopeid 0x20<link>
ether fe:54:00:db:4c:6c txqueuelen 500 (Ethernet)
RX packets 137 bytes 13561 (13.2 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8358 bytes 441239 (430.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vnet2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::fc54:ff:fe17:1e56 prefixlen 64 scopeid 0x20<link>
ether fe:54:00:17:1e:56 txqueuelen 500 (Ethernet)
RX packets 133 bytes 12731 (12.4 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6061 bytes 320596 (313.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
The ‘virbr0’ is the bridge. From vnet0 to vnet1 corresponds to our 3 cirros VM’s nic. To see the bridge setup:
$ brctl show
bridge name bridge id STP enabled interfaces
virbr0 8000.fe5400171e56 yes vnet0
vnet1
vnet2
You can see vnet0-2 are attached on virbr0. Bridge virbr0 has IP 192.168.122.1. But how is the DHCP running on virbr0? It is done by dnsmasq (it is capable of both DNS and DHCP).
$ ps -ef|grep libvirt
...
nobody 4618 1 0 Nov02 ? 00:00:00 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf
...
Check out the configure file dnsmasq is using. So you know dnsmasq is listening on virbr0 and allocate IP address to VMs. The range is 192.168.122.2 - 192.168.122.254.
$ less /var/lib/libvirt/dnsmasq/default.conf
strict-order
pid-file=/var/run/libvirt/network/default.pid
except-interface=lo
bind-dynamic
interface=virbr0
dhcp-range=192.168.122.2,192.168.122.254
dhcp-no-override
dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases
dhcp-lease-max=253
dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile
addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts
To find out IP address of VM, we first use arp to get IP and its map to MAC address. You can find a VM’s MAC address in libvirt xml at /etc/libvirt/qemu/. So you know a VM’s IP.
$ arp -an
? (10.224.147.167) at fa:16:3e:65:b5:95 [ether] on eth0
? (169.254.169.254) at fa:16:3e:96:a5:41 [ether] on eth0
? (10.224.147.154) at fa:16:3e:99:41:fa [ether] on eth0
? (192.168.122.102) at 52:54:00:db:4c:6c [ether] on virbr0
? (192.168.122.182) at 52:54:00:17:1e:56 [ether] on virbr0
? (10.224.147.152) at fa:16:3e:96:a5:41 [ether] on eth0
? (10.224.147.1) at 00:00:0c:07:ac:00 [ether] on eth0
? (10.224.147.203) at fa:16:3e:c8:17:5e [ether] on eth0
? (10.224.147.204) at fa:16:3e:10:16:a0 [ether] on eth0
? (10.224.147.168) at fa:16:3e:a8:4f:41 [ether] on eth0
? (192.168.122.56) at 52:54:00:3c:f7:ba [ether] on virbr0
? (10.224.147.2) at 40:f4:ec:1d:6e:48 [ether] on eth0
? (10.224.147.205) at fa:16:3e:2f:da:d3 [ether] on eth0
Now we can login guest VM. The ifconfig inside it looks perfectly normal.
$ ssh cirros@192.168.122.56
cirros@192.168.122.56's password:
$ ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:3C:F7:BA
inet addr:192.168.122.56 Bcast:192.168.122.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fe3c:f7ba/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:792 errors:0 dropped:0 overruns:0 frame:0
TX packets:576 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:80117 (78.2 KiB) TX bytes:66950 (65.3 KiB)
Interrupt:10 Base address:0xc000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
$ ping 192.168.122.182
PING 192.168.122.182 (192.168.122.182): 56 data bytes
64 bytes from 192.168.122.182: seq=0 ttl=64 time=4.302 ms
64 bytes from 192.168.122.182: seq=1 ttl=64 time=1.383 ms
$ ping www.baidu.com
PING www.baidu.com (180.76.3.151): 56 data bytes
64 bytes from 180.76.3.151: seq=0 ttl=37 time=43.311 ms
64 bytes from 180.76.3.151: seq=1 ttl=37 time=42.429 ms
$ ping 169.254.169.254 # the magic address of cloud-init
PING 169.254.169.254 (169.254.169.254): 56 data bytes
64 bytes from 169.254.169.254: seq=0 ttl=63 time=5.598 ms
64 bytes from 169.254.169.254: seq=1 ttl=63 time=1.454 ms
The next question is, how does the packet from a VM, comes to the out side into www.baidu.com? Open the four default tables in iptables,
$ iptables -t raw -nL
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
$ iptables -t mangle -nL
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
CHECKSUM udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:68 CHECKSUM fill
$ iptables -t nat -nL
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE tcp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535
MASQUERADE udp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535
MASQUERADE all -- 192.168.122.0/24 !192.168.122.0/24
$ iptables -t filter -nL
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:53
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:53
ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:67
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:67
Chain FORWARD (policy ACCEPT)
target prot opt source destination
ACCEPT all -- 0.0.0.0/0 192.168.122.0/24 ctstate RELATED,ESTABLISHED
ACCEPT all -- 192.168.122.0/24 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable
REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
You can see in the ‘nat’ table, all traffic from 192.168.122.* and not to 192.168.122.* (! means not) is applied action ‘MASQUERADE’, the SNAT transaltion. This is how traffic from inside VM can come outside.
The Bridge Mode
In bridge mode use connect VM to a bridge, and direct connect the bridge to public network. VMs get IP addresses from public DHCP service, which DHCP the host is also using. I.e., VMs get public IP. The bridge consumes one public IP address. Usually you take the IP address from eth0 and give it to the bridge. (By connecting the bridge’s IP, client can also login to the host.)
Follow this guide: http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaai.kvminstall/liaaikvminstallbridge.htm
Access by VNC
On another laptop with graphic desktop, I’m using chicken VNC. The host on which I run guest VM is 10.224.147.166. First you need to modify VM’s libvirt xml. The default config, <graphics type='vnc' port='-1' autoport='yes'/>
, can only be connected vir host’s loopback address. Refer to here.
$ virsh edit cirros
... # change <graphics type='vnc' port='-1' autoport='yes'/> to <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'/>
$ virsh shutdown cirros
... # wait some time until fully shutdown
$ virsh start cirros
Do the same for cirros2 and cirros3. Now list VNC ports
$ virsh vncdisplay cirros
:0
$ virsh vncdisplay cirros2
:1
$ virsh vncdisplay cirros3
:2
On default, for a VM with VNC port N, VNC server listens on port 5900+N. VNC uses TCP, no UDP. To test connectivity, use telnet. Be careful whether there is firewall blocking these ports.
$ telnet 10.224.147.166 5900
Trying 10.224.147.166...
Connected to 10.224.147.166.
Escape character is '^]'.
RFB 003.008
Now on my laptop, I use chicken VNC to connect with Host: 10.224.147.166, Display: 0. Things worked!
About File Injection into VM
In Openstack you can inject a file to VM’s file system prior to boot. This feature is not originally shipped with Libvirt, it is implemented by Nova. Here is a code dive.
Libguestfs is a library for accessing and modifying VM’s disk image. You can mount VM’s virtual filesystem onto host’s VFS, where you access it as a common filesystem. This is where we can “inject a file”.
There are several ways to bring user data into VM. Here I copied from Liping’s blog:
1) File injection prior to VM boost
[root@compute1 ~]# nova boot --image 2401a752-fbda-482e-98f6-281656758b7f --file /home/test=./matt.test --flavor=1 test6
[root@compute2 ~]# ssh cirros@100.100.100.10
The authenticity of host '100.100.100.10 (100.100.100.10)' can't be established.
RSA key fingerprint is xxxxxxxxxxxxxxxxxxxxxxxxxxxxx.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '100.100.100.10' (RSA) to the list of known hosts.
cirros@100.100.100.10's password:
$ sudo su -
# cd /home/
# ls
cirros ftp test
2) Inject metadata.
[root@compute1 ~]# nova boot --image 2401a752-fbda-482e-98f6-281656758b7f --meta matt1=test1 --meta matt2=test2 --flavor=1 test2
[root@compute1 ~]# nova show test2
+-------------------------------------+--------------------------------------------------------------+
| Property | Value |
+-------------------------------------+--------------------------------------------------------------+
| status | ACTIVE |
| updated | 2013-09-12T02:39:43Z |
| OS-EXT-STS:task_state | None |
| OS-EXT-SRV-ATTR:host | compute2.webex.com |
| key_name | None |
| image | cirros-0.3.0-x86_64_2 (2401a752-fbda-482e-98f6-281656758b7f) |
| private-net network | 100.100.100.7 |
| hostId | 9f9ff1519a8807f08ae0798af159cbaa8c96912da9beacfd8b6ca134 |
| OS-EXT-STS:vm_state | active |
| OS-EXT-SRV-ATTR:instance_name | instance-000001b9 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute2.webex.com |
| flavor | m1.tiny (1) |
| id | 6c6cf452-0763-486d-a94c-acdf30c69304 |
| security_groups | [{u'name': u'default'}] |
| user_id | 2a167be1e83c477e9dd57033c2eaaec9 |
| name | test2 |
| created | 2013-09-12T02:38:25Z |
| tenant_id | 0b337e1ad59d43428b77c8bb2f84ce32 |
| OS-DCF:diskConfig | MANUAL |
| metadata | {u'matt2': u'test2', u'matt1': u'test1'} |
| accessIPv4 | |
| accessIPv6 | |
| progress | 0 |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-AZ:availability_zone | nova |
| config_drive | |
+-------------------------------------+--------------------------------------------------------------+
[root@compute2 ~]# ssh cirros@100.100.100.7
cirros@100.100.100.7's password:
$ cd /
$ cat meta.js
{"matt2": "test2", "matt1": "test1"}
3) The user-data way. Note 169.254.169.254. This is used by cloud-init.
[root@compute1 ~]# cat matt.test
This is test for user-data
[root@compute1 ~]# nova boot --image 2401a752-fbda-482e-98f6-281656758b7f --user-data ./matt.test --flavor=1 test4
[root@compute1 ~]# ssh cirros@100.100.100.8
The authenticity of host '100.100.100.8 (100.100.100.8)' can't be established.
RSA key fingerprint is 31:7f:b6:5f:ea:b8:5a:b4:f5:97:35:27:7c:3c:8e:3a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '100.100.100.8' (RSA) to the list of known hosts.
cirros@100.100.100.8's password:
$ sudo su -
# telnet 169.254.169.254 80
GET /latest/user-data
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Content-Length: 27
Date: Thu, 12 Sep 2013 03:54:12 GMT
Connection: close
This is test for user-data
Connection closed by foreign host
Here is an example using metadata and 169.254.169.254 to fetch public key and hostname for new VM.
Libvirt Remote
You can access libvirt from another host. Refer to http://libvirt.org/remote.html#Remote_certificates. On default this needs certificate on server and client. By using ssh we can bypass this.
# on host 10.224.147.167
$ virsh --connect qemu:///10.224.147.166:/system
error: failed to connect to the hypervisor
error: internal error: unexpected QEMU URI path '/10.224.147.166:/system', try qemu:///system
$ virsh --connect qemu+ssh://10.224.147.166/system list
Id Name State
----------------------------------------------------
15 cirros running
16 cirros2 running
17 cirros3 running
Attach Volumes to VM
Libvirt/KVM supports all kinds of volume type: raw, iso, qcow2, vmdk, etc, See here. The official site usually guides you to use storage pools. But I found you can also attach a disk directly.
Attach A Disk Directly
To directly attach a new disk to VM, you have to modify the libvirt VM xml (refer to here). In this way, file based disk can be attached.
virsh edit cirros
# in <device>...</device>, following <disk>...</disk>, add
# this points to the new disk file. you have to replace target dev name as 'hdb' or others.
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
<source file='/root/vm_direct_disk1.qcow2'/>
<target dev='hdb' bus='ide'/>
</disk>
virsh shutdown cirros
... # wait untile fully shutdown
virsh start cirros
You can find the kvm parameters have added the new disk.
$ ps -ef|grep -E 'qemu|kvm'
qemu 26915 1 90 12:18 ? 00:06:40 /usr/bin/qemu-system-x86_64 -name cirros ... -drive file=/root/vm_direct_disk1.qcow2,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 ...
Wait until the VM fully starts SSH. Login and check out the new disk
# login VM cirros
$ ssh cirros@192.168.122.56
# originally there is only /dev/sda
$ ls /dev/sd*
/dev/sda /dev/sda1 /dev/sdb
$ sudo su -
$ mkfs.ext4 /dev/sdb
$ mkdir /mnt/disk1
$ mount /dev/sdb /mnt/disk1
$ echo "hello world this is vm_direct_disk1.qcow2" > /mnt/disk1/hello.txt
$ umount /mnt/disk1
Let’s mount the disk to another VM, then checkout files on it.
virsh edit cirros
... # remove the disk configuration segment we have added
virsh edit cirros2
... # add the same disk configuration segment
virsh shutdown cirros2
... # wait until fully shutdown, use 'virsh list' to see
virsh start cirros2
... # wait until SSH service starts
# login to VM cirros2
ssh cirros@192.168.122.102
$ ls /dev/sd*
/dev/sda /dev/sda1 /dev/sdb
$ sudo su -
$ mkdir /mnt/disk1
$ mount /dev/sdb /mnt/disk1
$ cat /mnt/disk1/hello.txt
hello world this is vm_direct_disk1.qcow2
$ umount /mnt/disk1
Files on the volume can be seen by VM cirros2.
Attach Logical Volume as Disk
Not only files can be attached as disks, logical volumes can do it too. I followed the config here. First, let’s create logical volumes using LVM (disk is faked by dd)
# create the disk image
dd if=/dev/zero of=vm_disk.img bs=1 count=1 seek=1G
# mount as disk device
losetup /dev/loop3 vm_disk.img
# use lvm to create logic volumes
pvcreate /dev/loop3
vgcreate vm_vg /dev/loop3
lvcreate --name vm_lv1 --size 256M vm_vg
lvcreate --name vm_lv2 --size 256M vm_vg
lvcreate --name vm_lv3 --size 256M vm_vg
You need to edit the libvirt VM xml.
virsh edit cirros
# in <device>...</device>, following <disk>...</disk>, add
# target dev name can be changed. bus supports scsi, ide and virtio
<disk type='block' device='disk'>
<source dev='/dev/vm_vg/vm_lv1' />
<target dev='sdb' bus='scsi' />
</disk>
<disk type='block' device='disk'>
<source dev='/dev/vm_vg/vm_lv2' />
<target dev='sdc' bus='scsi' />
</disk>
<disk type='block' device='disk'>
<source dev='/dev/vm_vg/vm_lv3' />
<target dev='sdd' bus='scsi' />
</disk>
virsh shutdown cirros
... # wait untile fully shutdown
virsh start cirros
After rebooting the VM, let’s see the new disks.
$ ssh cirros@192.168.122.56
$ ls
$ ls /dev/sd*
/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sdd1
$ sudo su -
$ mkfs.ext4 /dev/sdb
$ mount /dev/sdb /mnt/disk1
$ echo hello world this is logical volume /dev/vm_vg/vm_lv1 > /mnt/disk1/hello.txt
$ umount /mnt/disk1
... # reboot the VM and find the hello.txt is still here
Check out the kvm process and its parameters.
$ ps -ef|grep -E 'kvm|qemu'
qemu 1921 1 7 13:57 ? 00:00:57 /usr/bin/qemu-system-x86_64 -name cirros ... -drive file=/dev/vm_vg/vm_lv1,if=none,id=drive-scsi0-0-1,format=raw -device scsi-hd,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 -drive file=/dev/vm_vg/vm_lv2,if=none,id=drive-scsi0-0-2,format=raw -device scsi-hd,bus=scsi0.0,scsi-id=2,drive=drive-scsi0-0-2,id=scsi0-0-2 -drive file=/dev/vm_vg/vm_lv3,if=none,id=drive-scsi0-0-3,format=raw -device scsi-hd,bus=scsi0.0,scsi-id=3,drive=drive-scsi0-0-3,id=scsi0-0-3 ...
Use Storage Pool [FAILED]
Before approaching to this section, Let’s unload all the attached disks in VM cirros and cirros2, by removing the added xml fragment
virsh edit cirros
... # remove the added config for new disks
virsh edit cirros2
... # remove the added config for new disks
... # restart cirros and cirros2
Libvirt supports many type of storage pools: directory based, filesystem based, NFS based, iSCSI based, RBD based, etc. I will follow the official guide and use Filesystem pool.
mkdir -p /var/lib/virt/images
virsh edit cirros
# in <device>...</device>, following <disk>...</disk>, add
<pool type="fs">
<name>virtimages</name>
<source>
<device path="/dev/vm_vg/vm_lv1"/>
</source>
<target>
<path>/var/lib/virt/images</path>
</target>
</pool>
[FAILURE] After I wq the virsh edit, then open it again, I found my edits disappeared. The xml config is restored to the original. It seems I can’t add
Use Storage Pool, the Right Way
After googling I figured out my understanding is WRONG. Storage pools are defined in /etc/libvirt/storage, not in VM’s xml config. They are separated and not like “disks to attach”. This is a mail ask. This is a tutorial.
mkdir -p /var/lib/virt/images
$ virsh pool-define-as virtimages --type fs --source-dev /dev/vm_vg/vm_lv1 --target /var/lib/virt/images # --print-xml to dry-run
# here is where pools are defined
$ cat /etc/libvirt/storage/virtimages.xml
<pool type='fs'>
<name>virtimages</name>
<uuid>b8343798-f0fe-4ae5-bc40-6c94b0b15993</uuid>
<capacity unit='bytes'>0</capacity>
<allocation unit='bytes'>0</allocation>
<available unit='bytes'>0</available>
<source>
<device path='/dev/vm_vg/vm_lv1'/>
<format type='auto'/>
</source>
<target>
<path>/var/lib/virt/images</path>
<permissions>
<mode>0755</mode>
<owner>-1</owner>
<group>-1</group>
</permissions>
</target>
</pool>
# now truely create and start the pool
$ virsh pool-build virtimages
$ virsh pool-autostart virtimages
$ virsh pool-list --all
$ virsh pool-info virtimages
# if not mkfs, pool-start reports mount error
$ mkfs.ext4 /dev/vm_vg/vm_lv1
$ virsh pool-start virtimages
# you can see, pool.source in xml is mounted to pool.target
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 40G 2.6G 38G 7% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 41M 1.9G 3% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/mapper/vm_vg-vm_lv1 240M 2.1M 222M 1% /var/lib/virt/images
Next, we can create volumes from the pool
$ virsh vol-create-as virtimages disk_in_pool.qcow2 128M --format qcow2
$ virsh vol-list virtimages
$ ll -h /var/lib/virt/images/
total 205K
-rw------- 1 root root 193K Nov 5 15:19 disk_in_pool.qcow2
drwx------ 2 root root 12K Nov 5 15:14 lost+found
You can find the corresponding disk file in /var/lib/virt/images now. I’ve seen tutorials telling me to use the disk file in virt-install,
# refer to http://koumm.blog.51cto.com/703525/1304196, /data/oeltest03.qcow2 is the volume's disk file
virt-install --name=oeltest03 --os-variant=RHEL6 --ram 1024 --vcpus=1 --disk path=/data/oeltest03.qcow2,format=qcow2,size=20,bus=virtio --accelerate --cdrom /data/iso/oel63x64.iso --vnc --vncport=5910 --vnclisten=0.0.0.0 --network bridge=br0,model=virtio –noautoconsole
Or attach it to VM by modifying xml (what the above two sections do). But after searching out ‘man virsh’, I can’t find a command to just attach volume to VM (although there is a set of command for pool, and a set for volume).
Boot VM From Volume
In Openstack we often talk about boot from image and boot from volume. I want to do “boot from volume” in libvirt/kvm.
There is an excellent summit talk about how Openstack nova work with kvm and libvirt. Below is a VM’s libvirt xml, booted from IMAGE (not volume), found in a compute node:
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
<source file='/var/lib/nova/instances/ef77b0bc-a5ef-4012-b21b-41f53d9abfc1/disk'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>
No direct article telling “libvirt, boot from volume” or “kvm, boot from volume” found. From what I see I want to conclude that “boot from volue” is no difference from attaching disks in libvirt/kvm. Just use a bootable disk image and set it into the xml config of VM.
Other not so related materials:
Boot from LVM Volume [FAILED]
Next I will try to create a logical volume, make it bootable and launch VM from it. BTW, actually the more practical is to first create an empty volume, then install a dummy VM (can use cdrom to install OS), by which way the volume becomes naturally bootable.
Following this guide. First let’s create the logical volume
# still, we use dd and loop device to create volume
dd if=/dev/zero of=boot_vg.img bs=1 count=1 seek=1G
losetup /dev/loop4 boot_vg.img
pvcreate /dev/loop4
vgcreate boot_vg /dev/loop4
lvcreate --name boot_lv1 --size 512M boot_vg
Use dd to copy the image to the logical volume. The dd can preserve all data including boot sector.
dd if=cirros-0.3.2-x86_64-disk.img of=/dev/boot_vg/boot_lv1
Create the VM cirros4 and boot.
cp /etc/libvirt/qemu/cirros3.xml cirros4.xml
vim cirros4.xml
... # change name, delete uuid, replace last 3 bytes of 'mac address'
# and, change the disk section as
<disk type='block' device='disk'>
<driver name='qemu' type='qcow2'/> # note that the original cirros image is qcow2
<source dev='/dev/boot_vg/boot_lv1' />
<target dev='sda' bus='scsi' />
</disk>
virsh define cirros4.xml
ll /etc/libvirt/qemu/cirros4.xml
virsh start cirros4
OH, NO! Don’t work. Don’t know why. Use VNC and see that VM boots with error meesage “No bootable device”.
Boot from LVM Volume, Way 2
Now I follow this guide, which converts cirros image to raw format beforehand.
qemu-img convert -f qcow2 -O raw cirros-0.3.2-x86_64-disk.img cirros-0.3.2-x86_64-disk.raw.img
dd if=cirros-0.3.2-x86_64-disk.raw.img of=/dev/boot_vg/boot_lv1
virsh edit cirros4
# change disk section to
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/boot_vg/boot_lv1' />
# I found that if use <target dev='sda' bus='scsi' />, still error with "No bootable device"
<target dev='hda' bus='ide' />
</disk>
virsh destroy cirros4 # virsh shutdown cannot stop a boot failed VM
virsh start cirros4
Succeeded! I can now see the VM boot up successfuly from VNC. But I found that if I use <target dev='sda' bus='scsi' />
in disk section, still error boot up with “No bootable device”. So I want to try way 1 again.
Boot from LVM Volume, Back to Way 1
I want to try way 1 with <target dev='hda' bus='ide' />
instead of <target dev='sda' bus='scsi' />
again, will it fail?
dd if=cirros-0.3.2-x86_64-disk.img of=/dev/boot_vg/boot_lv1
vim cirros4.xml
# change the disk section to
<disk type='block' device='disk'>
<driver name='qemu' type='qcow2'/> # note that the original cirros image is qcow2
<source dev='/dev/boot_vg/boot_lv1' />
<target dev='hda' bus='ide' /> # use 'hda' this time
</disk>
virsh destroy cirros4
virsh start cirros4
Good, it worked! So in conclusion no need to convert qcow2 to raw. We can use qcow2 on LVM. Boot from volume is no difference with attaching a (bootable) disk to VM.
Virtio
To enable virtio, follow libvirt wiki. Virtio can be used in network and disk. There are 3 parts: virtio_net, virtio_blk, and virtio_balloon.
To enable virtio it requires
- KVM version support virtio (recent version is ok)
- Guest VM supports has installed virtio drivers. Any Linux OS with kernel >= 2.6.25 should be OK.
- Libvirt >= 0.4.4
I will try virtio in disk.
virsh edit cirros
# add a new disk like what prior section does. but choose bus='virtio'
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
<source file='/root/vm_direct_disk1.qcow2'/>
<target dev='vda' bus='virtio'/>
</disk>
virsh shutdown cirros
... # wail until fully shutdown
virsh start cirros
Next, let’s check out the virtio on Guest VM.
$ ssh cirros@192.168.122.56
$ sudo su -
# check whether virtio started
$ lsmod | grep virtio
$ ls /sys/devices/virtio*
ls: /sys/devices/virtio*: No such file or directory
$ ls /dev/vd*
/dev/vda
$ mount /dev/vda /mnt/disk1
$ cat /mnt/disk1/hello.txt
hello world this is vm_direct_disk1.qcow2 # the message left before
We can see virtio is not started. I think, cirros image doesn’t support virtio. But after all I can access the attached disk /dev/vda
.
KVM Tuning
There are many aritcles published about KVM tunning: [1][2]. This usually include setting a lot of config on kvm/libvirt. Below I will play with vcpupin and transparent huge page.
VCPUPin
Following this article for vcpu pin. Main things to consider are
- How to see NUMA node info
- Check the host capabilities
- Know which cpu is free
- How to pin vcpu
- List vcpu status
Here I play with vcpupin.
$ virsh vcpuinfo cirros
VCPU: 0
CPU: 1
State: running
CPU time: 430.3s
CPU Affinity: -y
# check which cpu the VM is running, 8611 is the pid of VM
$ ps -eLo pid,psr,comm | grep 8611
8611 1 qemu-system-x86
8611 1 qemu-system-x86
8611 0 qemu-system-x86
# pin the cpu: vcpupin name <vcpu> <cpu>
$ virsh vcpupin cirros 0 0
$ virsh vcpuinfo cirros
VCPU: 0
CPU: 0
State: running
CPU time: 430.3s
CPU Affinity: y-
$ ps -eLo pid,psr,comm | grep 8611
8611 0 qemu-system-x86
8611 0 qemu-system-x86
8611 0 qemu-system-x86
$ check the runtime status
$ grep pid /var/run/libvirt/qemu/cirros.xml
<domstatus state='running' reason='booted' pid='8611'>
<vcpu pid='8615'/>
In VM’s libvirt xml you can find this config to pin cpu
<domain>
...
<vcpu placement='static' cpuset="1-4,^3,6" current="1">2</vcpu>
...
</domain>
How to unpin vcpu? In libvirt you have to re-pin a vcpu to all cpus, refer to here
virsh vcpupin cirros 0 0-1 # 0-N, N is your max physical cpu id.
The vcpupin
doesn’t reflex in ps -ef|grep cirros
. I.e. it is not set by kvm parameters. If using kvm only, you need to use command taskset
to set cpu affinity, refer to this book P60.
Transparent Hugepage
First, the host needs to enable transparent hugepage. View the status by
# usually transparent hugepage is enabled by default
$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never # in the 3 options, 'always' is selected
To check hugepage status
$ cat /proc/meminfo | grep -i AnonHugePages
AnonHugePages: 323584 kB
To enable or disable.
# to disable
echo never > /sys/kernel/mm/transparent_hugepage/enabled
# to enable
echo always > /sys/kernel/mm/transparent_hugepage/enabled
Cool, transparent, so that’s all. No need to config libvirt/kvm. Let’s play with VM.
$ grep AnonHugePages /proc/meminfo
AnonHugePages: 323584 kB
$ virsh destroy cirros
# the page used by VM is released
$ grep AnonHugePages /proc/meminfo
AnonHugePages: 45056 kB
Other References
- A Chinese KVM book: http://item.jd.com/11325760.html
- A good KVM guide: http://koumm.blog.51cto.com/703525/1292146
- Ceph rbd and libvirt: http://ceph.com/docs/master/rbd/libvirt/
Create an Issue or comment below