ArticlePDF Available

Live migration with pass-through device for Linux VM

January 2008

January 2008

Authors:

Edwin Zhai

Intel

Yaozu Dong

Intel

Open source Linux virtualization, such as Xen and KVM, has made great progress recently, and has been a hot topic in Linux world for years. With virtualization support, the hypervisor de-privileges operating systems as guest operating systems and shares physical resources among guests, such as memory and the network device. For device virtualization, some mechanisms are intro-duced for improving performance. Paravirtualized (PV) drivers are implemented to avoid excessive guest and hypervisor switching and thus achieve better perfor-mance, for example Xen's split virtual network inter-face driver (VNIF). Unlike software optimization in PV driver, IOMMU, such as Intel R Virtualization Technol-ogy for Directed I/O, AKA VT-d, enables direct passing through of physical devices to guests to take advantage of hardware DMA remapping, thus reducing hypervisor intervention and achieving high bandwidth. Physically assigned devices impose challenges to live migration, which is one of the most important virtualiza-tion features in server consolidation. This paper shows how we solve this issue using virtual hot plug technol-ogy, in addition with the Linux bonding driver, and is or-ganized as follows: We start from device virtualization and live migration challenges, followed by the design and implementation of the virtual hotplug based solu-tion. The network connectivity issue is also addressed using the bonding driver for live migration with a direct assigned NIC device. Finally, we present the current status, future work, and other alternative solutions.

Live Migration with Pass-through Device

…

Figures - uploaded by Yaozu Dong

Content may be subject to copyright.

Content uploaded by Yaozu Dong

Content may be subject to copyright.

Live Migration with Pass-through Device for Linux VM

Edwin Zhai, Gregory D. Cummings, and Yaozu Dong

Intel Corp.

{edwin.zhai, gregory.d.cummings, eddie.dong}@intel.com

Abstract

Open source Linux virtualization, such as Xen and

KVM, has made great progress recently, and has been

a hot topic in Linux world for years. With virtualization

support, the hypervisor de-privileges operating systems

as guest operating systems and shares physical resources

among guests, such as memory and the network device.

For device virtualization, some mechanisms are intro-

duced for improving performance. Paravirtualized (PV)

drivers are implemented to avoid excessive guest and

hypervisor switching and thus achieve better perfor-

mance, for example Xen’s split virtual network inter-

face driver (VNIF). Unlike software optimization in PV

driver, IOMMU, such as Intel R

Virtualization Technol-

ogy for Directed I/O, AKA VT-d, enables direct passing

through of physical devices to guests to take advantage

of hardware DMA remapping, thus reducing hypervisor

intervention and achieving high bandwidth.

Physically assigned devices impose challenges to live

migration, which is one of the most important virtualiza-

tion features in server consolidation. This paper shows

how we solve this issue using virtual hot plug technol-

ogy, in addition with the Linux bonding driver, and is or-

ganized as follows: We start from device virtualization

and live migration challenges, followed by the design

and implementation of the virtual hotplug based solu-

tion. The network connectivity issue is also addressed

using the bonding driver for live migration with a direct

assigned NIC device. Finally, we present the current

status, future work, and other alternative solutions.

1 Introduction to Virtualization

Virtualization became a hot topic in Linux world re-

cently, as various open source virtualization solutions

based on Linux were released. With virtualization,

the hypervisor supports simultaneously running multi-

ple operating systems on one physical machine by pre-

senting a virtual platform to each guest operating sys-

tem. There are two different approaches a hypervisor

can take to present the virtual platform: full virtualiza-

tion and paravirtualization. With full virtualization, the

guest platform presented consists of all existing compo-

nents, such as a PIIX chipset, an IDE controller/disk, a

SCSI controller/disk, and even an old Pentium R

II pro-

cessor, etc. which can be already supported by mod-

ern OS without any modiﬁcation. Paravirtualization

presents the guest OS with a synthetic platform, with

components that may not have existed in the real world

to date, and thus are unable to run a commercial OS di-

rectly. Instead, paravirtualization requires modiﬁcations

to the guest OS or driver source code to match the syn-

thetic platform, which is usually designed to avoid ex-

cessive context switches between guest and hypervisor,

by using the underlying hypervisor knowledge, and thus

achieving better performance.

2 Device Virtualization

Most hardware today doesn’t support virtualization, so

device virtualization could only rely on pure software

technology. Software based virtualization shares phys-

ical resources between different guests, by intercepting

guest access to device resource, for example trapping

I/O commands from a native device driver running in

the guest and providing emulation, that is an emulated

device, or servicing hypercalls from the guest front-end

paravirtualized drivers in split device model, i.e. a PV

device. Both sharing solutions require hypervisor inter-

vention which cause additional overhead, which limits

performance.

To reduce this overhead, a pass-through mechanism is

introduced in Xen and KVM (work in progress) to al-

low assignment of a physical PCI device to a speciﬁc

guest so that the guest can directly access the physical

•261 •

262 •Live Migration with Pass-through Device for Linux VM

resource without hypervisor intervention [8]. A pass-

through mechanism introduces an additional require-

ment for the DMA engines. A DMA engine transac-

tion requires the host physical address but a guest can

only provide the guest physical address. So a method

must be invoked to convert a guest physical address to a

host physical address for correctness in a non-identical

mapping guest and for secure isolation among guests.

Hardware IOMMU technologies, such as Intel R

Virtu-

alization Technology for devices, i.e. VT-d [7], are de-

signed to convert guest physical addresses to host physi-

cal addresses. They do so by remapping DMA addresses

provided by the guest to host physical addresses in hard-

ware via a VT-d table indexed by a device requestor ID,

i.e. Bus/Device/Function as deﬁned in the PCI speciﬁca-

tion. Pass-through devices have close to native through-

put while maintaining low CPU usage.

PCI SIG I/O Virtualization based Single Root I/O Vir-

tualization, i.e. SR-IOV, is another emerging hardware

virtualization technology which speciﬁes how a single

device can be shared between multiple guest via a hard-

ware mechanism. A single SR-IOV device can have

multiple virtual functions (VF). Each VF has its own

requestor ID and resources which allows the VF to be

assigned to a speciﬁc guest. The guest can then di-

rectly access the physical resource without hypervisor

intervention and the VF speciﬁc requestor ID allows the

hardware IOMMU to convert guest physical addresses

to host physical addresses.

Of all the devices that are virtualized, network devices

are one of the most critical in data centers. With tra-

ditional LAN solutions and storage solutions such as

iSCSI and FCoE converging on to the network, network

device virtualization is becoming increasingly impor-

tant. In this paper, we choose network devices as a case

study.

3 Live Migration

Relocating a virtual machine from one physical host to

another with very small down-time of service, such as

100 ms [6], is one major beneﬁt of virtualization. Data

centers can use the VM relocation feature, i.e. live mi-

gration, to dynamically balance load on different host-

ing platforms, to achieve better throughput. It can also

be used to consolidate services to reduce the number of

hosting platforms dynamically to achieve better power

savings, or be used to maintain the physical platform

after running for a long time because each physical plat-

form has its life cycles, while VMs can run far longer

than the life cycle of a physical machine. Live migra-

tion, or its similar features, like VM save and VM re-

store, is achieved by copying VM state from one place

to another including memory, virtual devices, and pro-

cessor states. The virtual platform, where the migrated

VM is running, must be the same as the one where it

previously ran, and it must provide the capability that

all internal states can be saved and restored, which de-

pends on how the devices are virtualized.

The guest memory subsystem, making up the guest plat-

form, is kept identical when the VM relocates, assigning

the same amount of memory in the target VM with the

same layout. The live migration manager will copy con-

tents from the source VM to the target, using an incre-

mental approach, to reduce the service outage time [5],

given that the memory a guest owns may vary from tens

of megabytes to tens of gigabytes, and even more in the

future, which means a relatively long time to transmit

even in a ten gigabit Ethernet environment.

The processor type the guest owns and features the host

processor have usually need to be the same across VM

migration, but certain exceptions can be taken if all the

features the source VM uses exist in the target host-

ing processor, or if the hypervisor could provide emu-

lation of those features which do not exist on the tar-

get side. For example, live migration can request the

same CPUID in host side, or just hide the difference in

host side by providing the guest a common subset of

physical features. MSRs are more complicated, except

that the host platform is identical. Fortunately, today’s

guest platform presented is pretty simple and won’t use

those model-speciﬁc MSRs. The whole CPU context

size saved at the ﬁnal step of live migration is usually

in the magnitude of tens of kilobytes, which means just

several milliseconds of out of service time.

On the device side, cloning source device instances to

the target VM after live migration is much more com-

plicated. If the source VM includes only those software

emulated devices or paravirtualized devices, identical

platform device could be maintained by generating ex-

actly the same conﬁguration for the target VM startup,

and the device state could be easily maintained since

the hypervisor knows all of its internal state. Those

devices are called migration friendly devices. But for

guests who have pass-through devices or SR-IOV Vir-

tual Functions on the source VM side, things are totally

2008 Linux Symposium, Volume Two •263

different.

3.1 Issues of Live Migration with pass-through de-

vice

Although guest with pass-through device can achieve

almost native performance, maintaining identical plat-

form device after migration may be impossible. The tar-

get VM may not have the same hardware. Furthermore,

even if the target guest has the identical platform device

as the source side, cloning the device instance to target

VM is also almost impossible, because some device in-

ternal states may not be readable, and some may be still

in-ﬂight at migration time, which is unknown to the hy-

pervisor without the device-speciﬁc knowledge. Even

without those unknown states, knowing how to write

those internal states to the relocated VM is another big

problem without device-speciﬁc knowledge in the hy-

pervisor. Finally, some devices may have unique infor-

mation that can’t be migrated, such as a MAC address.

Those devices are migration unfriendly.

To address pass-through device migration, either the hy-

pervisor needs to have the device knowledge to help mi-

gration or the guest needs to do those device-speciﬁc

operations. In this paper, we ask for guest support by

proposing a guest hot plug based solution to request co-

operation from the guest to unplug all the migration un-

friendly devices before relocation happens, so that we

can have identical platform devices and identical de-

vice states after migration. But hot unplugging an Eth-

ernet card may lead to network service outage, usually

in the magnitude of several seconds. The Linux bond-

ing driver, originally developed for aggregating multiple

network interfaces, is used here to maintain connectiv-

ity.

4 Solution

This section describes a simple and generic solution to

resolve the issue of live migration with pass-through de-

vice. This section also illustrates how to address the fol-

lowing key issues: save/restore device state and keeping

network connectivity for NIC device.

4.1 Stop Pass-through Device

As described in the previous section, unlike emulated

devices, most physical devices can’t be paused to save

and restore their hardware states, so a consistent de-

vice state across live migration is impossible. The only

choice is to stop the guest from using physical devices

before live migration.

How to do it? One easy way is to let the end user stop ev-

erything using a pass-through device including applica-

tions, services, and drivers, and then restore them on the

target machine after the hypervisor allocates a new de-

vice. This method works, but it’s not generic, as differ-

ent Linux distributions have different operations. More-

over, a lot of user intervention is needed inside the Linux

guest.

Another generic solution is ACPI [1] S3 (suspend-to-

ram), in which the operating system freezes all pro-

cesses, suspends all I/O devices, then goes into a sleep

state with all context lost except system memory. But

this is overkill because the whole platform is affected,

besides the target device, and service outage time is in-

tolerable. PCI hotplug is perfect in this case, because:

•Unlike ACPI S3, it is a device-level, ﬁne-grained

mechanism.

•It’s generic, because the 2.6 kernel supports various

PCI hotplug mechanisms.

•No huge user intervention, because PCI hotplug

can be triggered by hardware.

The solution using PCI hotplug looks like the following:

1. Before live migration, on the source host, the con-

trol panel triggers a virtual PCI hot removal event

against the pass-through device into the guest.

2. The Linux guest responds to the hot removal event,

and stops using the pass-through device after un-

loading the driver.

3. Without any pass-through device, Linux can be

safely live migrated to the target platform.

4. After live migration, on the target host, a virtual

PCI hot add event, against a new pass-through de-

vice, is triggered.

5. Linux guest loads the proper driver and starts using

the new pass-through device. Because the guest re-

initializes a new device that has nothing to do with

the old one, the limitation described in 3.1 doesn’t

hold.

264 •Live Migration with Pass-through Device for Linux VM

4.2 Keeping Network Connectivity

The most popular usage model for a pass-through device

is assigning a NIC to a VM for high network throughput.

Unfortunately, using PCI NIC hotplug within live mi-

gration breaks the network connectivity, which leads to

an unpleasant user experience. To address this issue, it

is desired that the Linux guest can automatically switch

to a virtual NIC after hot removal of the physical NIC,

and then migrate with the virtual NIC. Thanks to the

powerful and versatile Linux network stack, the Ether-

net bonding driver [3] already supports this feature.

The Linux bonding driver provides a mechanism for en-

slaving multiple network interfaces into a single, log-

ical “bonded” interface with the same MAC address.

Behavior of the bonded interfaces depends on modes.

For instance, the bonding driver has the ability to detect

link failure and reroute network trafﬁc around a failed

link in a manner transparent to the application, which

is active-backup mode. It also has the ability to ag-

gregate network trafﬁc in all working links to achieve

higher throughput, which is referred to as trunking [4].

The active-backup mode can be used for an automatic

switch. In this mode, only one slave in the bond is ac-

tive, while another acts as a backup. The backup slave

becomes active if, and only if, the active slave fails. Ad-

ditionally, one slave can be deﬁned as primary that will

always be the active while it is available. Only when the

primary is off-line will secondary devices be used. This

is very useful when bonding pass-through device, as the

physical NIC is preferred over other virtual devices, for

performance reasons.

It’s very simple to enable bonding driver in Linux. The

end user just needs to reconﬁgure the network before us-

ing a pass-through device. The whole conﬁguration in

the Linux guest is shown in Figure 1, where a new bond

is created to aggregate two slaves: the physical NIC as

primary, and a virtual NIC as secondary. In normal con-

ditions, the bond would rely on the physical NIC, and

take the following actions in response to hotplug events

in live migration:

•When hot removal happens, the virtual NIC be-

comes active and takes over the in/out trafﬁc, with-

out breaking the network inside of the Linux guest.

•With this virtual NIC, the Linux guest is migrated

to target machine.

•When hot add is complete on the target machine,

the new physical NIC recovers as the active slave

with high throughput.

In this process, no user intervention is required to switch

because the powerful bonding driver handles everything

well.

4.3 PCI Hotplug Implementation

PCI hotplug plays an important role in live migration

with a pass-through device. It should be implemented

in the device model, according to the hardware PCI hot-

plug spec. Currently, the device model of most popular

Linux virtualization solutions such as Xen and KVM,

are derived from QEMU. Unfortunately, QEMU did not

support virtual PCI hotplug when this solution was de-

veloped, so we implemented a virtual PCI hotplug de-

vice model from scratch.

4.3.1 Choosing Hotplug Spec

The PCI spec doesn’t deﬁne a standard hotplug mecha-

nism. Here are the three existing categories of PCI hot-

plug mechanisms:

•ACPI Hotplug: This is a similar mechanism as the

ACPI dock hot insert/ejection, where some ACPI

control methods work with ACPI GPE to service

the hotplug.

•SHPC [2] (Standard HotPlug Controller): It’s

the spec from PCI-SIG to deﬁne a complicated

controller to handle the PCI hotplug.

•Vendor-speciﬁc: There are other vendor-speciﬁc

standards, such as Compaq and IBM, which have

their own hardware on servers for PCI hotplug.

Linux 2.6 supports all of the above hotplug standards,

which gives us more choices to select a simple, open,

and efﬁcient one. SHPC is a really complicated device,

so it’s hard to implement. Vendor-speciﬁc controllers

are not well supported in other OS. ACPI hotplug is best

suited to being emulated in the device model, because

interface exposed to OSPM is very simple and well de-

ﬁned.

2008 Linux Symposium, Volume Two •265

























 !"



"

#$%









%



Figure 1: Live Migration with Pass-through Device

4.3.2 Virtual ACPI hotplug

Making an ACPI hotplug controller in device model is

something like designing a hardware platform to support

ACPI hotplug, but using software emulation. Virtual

ACPI hotplug needs several parts in the device model

to coordinate in a sequence similar to native. For sys-

tem event notiﬁcation, ACPI introduces GPE (General

Purpose Event), which is a bitmap, and each bit can be

wired to different value-added event hardware depend-

ing on design.

The virtual ACPI hotplug sequence is described in Fig-

ure 2. When the end user issues the hot removal com-

mand for the pass-through device, analogous to pushing

the eject button, the hotplug controller updates its status,

then asserts the GPE bit and raises a SCI (System Con-

trol Interrupt). Upon receiving a SCI, the ACPI driver in

the Linux guest clears the GPE bit, queries the hotplug

controller about which speciﬁc device it needs to eject,

and then notiﬁes the Linux guest. In turn, the Linux

guest shuts down the device and unloads the driver. At

last, the ACPI driver executes the related control method

_EJ0, to power off the PCI device, and _STA to verify

the success of the ejection. Hot add is similar to this

process, except it doesn’t call the _EJ0.

In the process shown above, it’s obvious that following

components are needed:

•GPE: A GPE device model, with one bit wired

to the hotplug controller, is described in the guest

FADT (Fixed ACPI Description Table).

•PCI Hotplug Controller: A PCI hotplug con-

troller is needed to respond to the user’s hotplug ac-

tion and maintain the status of the PCI slots. ACPI

abstracts a well-deﬁned interface so we can imple-

ment internal logic in a simpliﬁed style, such as

stealing some reserved ioports for register status.

•Hotplug Control Method: ACPI control methods

for hotplug, such as _EJ0 and _STA, should be

added in the guest ACPI table. These methods in-

teract with the hotplug controller for device ejec-

tion and status check.

5 Status and Future Work

Right now, hotplug with a pass-through device works

well on Xen. With this and the bonding driver, Linux

guests can successfully do live migration. Besides live

migration, pass-through device hotplug has other useful

usage models, such as dynamically switching physical

devices between different VMs.

There is some work and investigation that needs to be

done in future:

•High-level Management Tools: Currently, hot-

plug of a pass-through device is separated from

generic live migration logic for a clean design, so

266 •Live Migration with Pass-through Device for Linux VM





















 



"#



%&

Figure 2: ACPI Hotplug Sequence

the end user is required to issue hotplug commands

manually before and after live migration. In the

future, these actions should be pushed into high

level management tools, such as a friendly GUI or

scripts, in order to function without user interven-

tion.

•Virtual S3: The Linux bonding driver works per-

fectly for a NIC, but bonding other directly as-

signed devices, such as graphics cards, is not as

useful. Since Linux has good support for ACPI S3,

we can try virtual S3 to suspend all devices before

live migration and wakeup them after that. Some

drawbacks of virtual S3 need more consideration:

–All other devices, besides pass-through de-

vices, go into this loop too, which takes more

time than virtual hotplug.

–With S3, the OS is in sleep state, so a long

down time of the running service is unavoid-

able.

–S3 has the assumption that the OS would

wake up on the same platform, so the same

type of pass-through devices must exist in the

target machine.

–S3 support in the guest may not be complete

and robust.

Although virtual S3 for pass-through device live

migration has its own limitation, it is still useful in

some environments where virtual hotplug doesn’t

work, for instance, hot removal of pass-through

display cards which are likely to cause a guest

crash.

•Other Guest: Linux supports ACPI hotplug and

has a powerful bonding driver, but other guest OS

may not be lucky enough to have such a frame-

work. We are in the process of extending support

to other guests.

6 Conclusion

VM direct access of physical device achieves close to

native performance, but breaks VM live migration. Our

virtual ACPI hotplug device model allows VM to hot

remove the pass-through device before relocation and

hot add another one after relocation, thus making pass-

through devices coexist with VM relocation. By inte-

grating the Linux bonding driver into the relocation pro-

cess, we enable continuous network connectivity for di-

rectly assigned NIC devices, which is the most popular

pass-through device usage model.

References

[1] “Advanced Conﬁguration & Power

Speciﬁcation,” Revision 3.0b, 2006,

Hewlett-Packard, Intel, Microsoft, Phoenix,

Toshiba. http://www.acpi.info

2008 Linux Symposium, Volume Two •267

[2] “PCI Standard Hot-Plug Controller and

Subsystem Speciﬁcation,” Revision 1.0, June,

2001, http://www.pcisig.info

[3] “Linux Ethernet Bonding Driver,” T. Davis, W.

Tarreau, C. Gavrilov, C.N. Tindel Linux Howto

Documentation, April, 2006.

[4] “High Available Networking,” M. John, Linux

Journal, January, 2006.

[5] “Live Migration of Virtual Machines,” C. Clark,

K. Fraser, S. Hand, J.G. Hansen, E. Jul, C.

Limpach, I. Pratt, and A. Warﬁled, In

Proceedings of the 2nd Symposium on

Networked Systems Design and Implementation,

2005.

[6] “Xen 3.0 and the Art of Virtualization,” I. Pratt,

K. Fraser, S. Hand, C. Limpach, A. Warﬁeld, D.

Magenheimer, J. Nakajima, and A. Mallick, In

Proceedings of the Linux Symposium (OLS),

Ottawa, Ontario, Canada, 2005.

[7] “Intel Virtualization Technology for Directed

I/O Architecture Speciﬁcation,” 2006,

ftp://download.intel.com/

technology/computing/vptech/

Intel(r)_VT_for_Direct_IO.pdf

[8] “Utilizing IOMMUs for Virtualization in Linux

and Xen,” M. Ben-Yehuda, J. Mason, O. Krieger,

J. Xenidis, L.V. Doorn, A. Mallick, and J.

Nakamima, In Proceedings of the Linux

Symposium, Ottawa, Ontario, Canada, (OLS),

2006.

Intel may make changes to speciﬁcations, product descrip-

tions, and plans at any time, without notice.

Intel and Pentium are trademarks or registered trademarks of

Intel Corporation or its subsidiaries in the United States and

other countries (regions).

*Other names and brands may be claimed as the property of

others.

are granted per submission guidelines; all other rights re-

served.

Proceedings of the

Linux Symposium

Volume Two

July 23rd–26th, 2008

Ottawa, Ontario

Canada

Conference Organizers

Andrew J. Hutton, Steamballoon, Inc., Linux Symposium,

Thin Lines Mountaineering

C. Craig Ross, Linux Symposium

Review Committee

Andrew J. Hutton, Steamballoon, Inc., Linux Symposium,

Thin Lines Mountaineering

Dirk Hohndel, Intel

Gerrit Huizenga, IBM

Dave Jones, Red Hat, Inc.

Matthew Wilson, rPath

C. Craig Ross, Linux Symposium

Proceedings Formatting Team

John W. Lockhart, Red Hat, Inc.

Gurhan Ozen, Red Hat, Inc.

Eugene Teo, Red Hat, Inc.

Kyle McMartin, Red Hat, Inc.

Jake Edge, LWN.net

Robyn Bergeron

Dave Boutcher, IBM

Mats Wichmann, Intel

Authors retain copyright to all submitted papers, but have granted unlimited redistribution rights

to all as a condition of submission.

An overview of virtual machine live migration techniques

Article

Full-text available

Oct 2019
IJECE

In a cloud computing the live migration of virtual machines shows a process of moving a running virtual machine from source physical machine to the destination, considering the CPU, memory, network, and storage states. Various performance metrics are tackled such as, downtime, total migration time, performance degradation, and amount of migrated data, which are affected when a virtual machine is migrated. This paper presents an overview and understanding of virtual machine live migration techniques, of the different works in literature that consider this issue, which might impact the work of professionals and researchers to further explore the challenges and provide optimal solutions.

Using SMT to accelerate nested virtualization

Conference Paper

Jun 2019

IaaS datacenters offer virtual machines (VMs) to their clients, who in turn sometimes deploy their own virtualized environments, thereby running a VM inside a VM. This is known as nested virtualization. VMs are intrinsically slower than bare-metal execution, as they often trap into their hypervisor to perform tasks like operating virtual I/O devices. Each VM trap requires loading and storing dozens of registers to switch between the VM and hypervisor contexts, thereby incurring costly runtime overheads. Nested virtualization further magnifies these overheads, as every VM trap in a traditional virtualized environment triggers at least twice as many traps. We propose to leverage the replicated thread execution resources in simultaneous multithreaded (SMT) cores to alleviate the overheads of VM traps in nested virtualization. Our proposed architecture introduces a simple mechanism to colocate different VMs and hypervisors on separate hardware threads of a core, and replaces the costly context switches of VM traps with simple thread stall and resume events. More concretely, as each thread in an SMT core has its own register set, trapping between VMs and hypervisors does not involve costly context switches, but simply requires the core to fetch instructions from a different hardware thread. Furthermore, our inter-thread communication mechanism allows a hypervisor to directly access and manipulate the registers of its subordinate VMs, given that they both share the same in-core physical register file. A model of our architecture shows up to 2.3× and 2.6× better I/O latency and bandwidth, respectively. We also show a software-only prototype of the system using existing SMT architectures, with up to 1.3× and 1.5× better I/O latency and bandwidth, respectively, and 1.2--2.2× speedups on various real-world applications.

VM Migration Support for Secure Out-of-Band VNC with Shadow Devices

Conference Paper

Nov 2023

On-Demand Virtualization for Post-Copy OS Migration in Bare-Metal Cloud

Article

Jan 2022

The demand for bare-metal cloud services has increased rapidly because bare-metal cloud is cost-effective for various types of cloud workloads. However, as the bare-metal cloud does not utilize the abstraction of the virtualization layer, it misses the benefits of virtualization. One important benefits absent in the bare-metal cloud is the live migration of guest operating systems. Migrating an OS and applications in the OS as a single unit provides a convenient way to manage cloud services such as load balancing, fault management, and system maintenance. To enable live migration for bare-metal cloud, several approaches have been proposed but they have limitations; they require OS modifications or impose additional overheads for workloads. This paper suggests an on-demand virtualization technique for post-copy OS migration to improve manageability of the bare-metal cloud services. When live migration is requested, a lightweight virtualization layer is enabled in the host on the fly. After completion of the live migration, the virtualization layer is removed from the host. Therefore, the host returns to a bare-metal system for performance. To implement on-demand virtualization, we modify BitVisor to perform the post-copy migration on the x86 architecture. The elapsed time of on-demand virtualization is negligible.It takes only 20 ms to insert the virtualization layer and 30 ms to remove the one. The downtime of migration is reduced because of the post-copy migration.

S2H: Hypervisor as a Setter within Virtualized Network I/O for VM Isolation on Cloud Platform

Article

Nov 2021
COMPUT NETW

Virtualized Network I/O (VNIO) plays a key role in providing the network connectivity to cloud services, as it delivers packets for Virtual Machines (VMs). Existing para-virtualized solutions accelerate the virtual Switch (vSwitch) data transfer via memory-sharing mechanism, that unfortunately impairs the memory isolation barrier among VMs. In this paper, we categorize existing para-virtualized solutions into two types: VM to vSwitch (V2S) and vSwitch to VM (S2V), according to the memory-sharing strategy. We then analyze their individual VM isolation issues, that is, a malicious VM may access other ones’ data by exploiting the shared memory. To solve this issue, we propose a new S2H memory sharing scheme, which shares the I/O memory from vSwitch to Hypervisor. The S2H scheme can guarantee both VM isolation and network performance as the hypervisor acts as a “setter” between VM and vSwitch for packet delivery. To show that S2H can be implemented easily and efficiently, we implement the prototype based on the de-facto para-virtualization standard vHost-User solution. Extensive experimental results show that S2H not only guarantees the isolation but also holds the comparable throughput with the same CPU cores configured, when comparing with the native vHost-User solution.

Hy-FiX: Fast In-place Upgrades of KVM Hypervisors

Article

Feb 2021

Maintaining up-to-date KVM hypervisors requires regular upgrades to the host kernel, hence rebooting the physical host with the consequent termination of running Virtual Machines (VMs). Cloud platforms capable of massive large-scale live migrations evacuate VMs from the hosts before rebooting, minimizing the impact over VM up-time. However, scenarios exist where resource constraints make live migration undesirable, or the presence of fault-tolerant instances (e.g., replicated services) favors the adoption of VM termination, a simpler but more disruptive strategy. In this article, we present Hy-FiX, a fast in-place upgrade solution for KVM hypervisors. Hy-FiX preserves VM memory across host reboots, protecting the execution state of running guests while hypervisor upgrades are applied. Hy-FiX memory preservation across reboot, combined with a mixed suspend-to-disk/suspend-to-RAM technique, achieves a 2.31-second checkpoint/restore time for a 256 GB VM, and Hy-FiX lazy memory initialization reboots an enterprise-class host in constant time (7.6 seconds) regardless of its equipped memory. Hy-FiX is, therefore, a better alternative to classical VM termination and restart.

VM Migration for Secure Out-of-band Remote Management with Nested Virtualization

Conference Paper

Oct 2020

MDev-NVMe: Mediated Pass-Through NVMe Virtualization Solution with Adaptive Polling

Article

Dec 2020

The fast access to data and high parallel processing in high-performance computing instigates an urgent demand on the improvement of the NVMe storage within modern data centers. However, the former NVMe virtualization's unsatisfactory performance demonstrates that NVMe devices are often underutilized within cloud platforms. An NVMe virtualization mechanism with high performance and device sharing has captured researchers and developers' attention. This paper introduces MDev-NVMe, a new virtualization solution for NVMe storage device with (1) full NVMe storage virtualization for VMs running native NVMe driver, (2) a mediated pass-through mechanism for NVMe management, and (3) adaptive configuration of active polling optimization to simultaneously achieve high throughput, low latency performance, and substantial device scalability. We practically implement the MDev-NVMe as a Linux kernel module. This paper subsequently evaluates MDev-NVMe with Intel OPTANE and P3600 SSD by comparing several mainstream NVMe virtualization mechanisms using application-level I/O benchmarks. MDev-NVMe with active polling can demonstrate a 142% improvement over native (interrupt-driven) throughput and over 2.5 × the Virtio throughput with only 70% native average latency and 31% Virtio average latency. Finally, the advantages of MDev-NVMe and the importance of adaptive polling are discussed, offering evidence that MDev-NVMe is a superior virtualization choice for cloud storage.

BPFHV: Adaptive Network Paravirtualization for Continuous Cloud Provider Evolution

Conference Paper

Full-text available

Sep 2019

Fast and Scalable VMM Live Upgrade in Large Cloud Infrastructure

Conference Paper

Full-text available

Apr 2019

High availability is the most important and challenging problem for cloud providers. However, virtual machine monitor (VMM), a crucial component of the cloud infrastructure, has to be frequently updated and restarted to add security patches and new features, undermining high availability. There are two existing live update methods to improve the cloud availability: kernel live patching and Virtual Machine (VM) live migration. However, they both have serious drawbacks that impair their usefulness in the large cloud infrastructure: kernel live patching cannot handle complex changes (e.g., changes to persistent data structures); and VM live migration may incur unacceptably long delays when migrating millions of VMs in the whole cloud, for example, to deploy urgent security patches. In this paper, we propose a new method, VMM live upgrade, that can promptly upgrade the whole VMM (KVM & QEMU) without interrupting customer VMs. Timely upgrade of the VMM is essential to the cloud because it is both the main attack surface of malicious VMs and the component to integrate new features. We have built a VMM live upgrade system called Orthus. Orthus features three key techniques: dual KVM, VM grafting, and device handover. Together, they enable the cloud provider to load an upgraded KVM instance while the original one is running and "cut-and-paste'' the VM to this new instance. In addition, Orthus can seamlessly hand over passthrough devices to the new KVM instance without losing any ongoing (DMA) operations. Our evaluation shows that Orthus can reduce the total migration time and downtime by more than $99%$ and $90%$, respectively. We have deployed Orthus in one of the largest cloud infrastructures for a long time. It has become the most effective and indispensable tool in our daily maintenance of hundreds of thousands of servers and millions of VMs.

Live Migration of Virtual Machines.

Conference Paper

Full-text available

May 2005

Post-copy live migration of virtual machines

Article

Full-text available

Jul 2009

We present the design, implementation, and evaluation of post-copy based live migration for virtual machines (VMs) across a Gigabit LAN. Post-copy migration defers the trans- fer of a VM's memory contents until after its processor state has been sent to the target host. This deferral is in contrast to the traditional pre-copy approach, which first copies the memory state over multiple iterations followed by a final transfer of the processor state. The post-copy strategy can provide a "win-win" by reducing total migration time while maintaining the liveness of the VM during migration. We compare post-copy extensively against the traditional pre- copy approach on the Xen Hypervisor. Using a range of VM workloads we show that post-copy improves several metrics including pages transferred, total migration time, and net- work overhead. We facilitate the use of post-copy with adap- tive prepaging techniques to minimize the number of page faults across the network. We propose different prepaging strategies and quantitatively compare their effectiveness in reducing network-bound page faults. Finally, we eliminate the transfer of free memory pages in both pre-copy and post- copy through a dynamic self-ballooning (DSB) mechanism. DSB periodically reclaims free pages from a VM and sig- nificantly speeds up migration with negligible performance impact on VM workload.

Advanced Configuration & Power Specification,” Revision 3.0b

Hewlett
Packard
Intel
Microsoft
Phoenix
Toshiba

“Advanced Configuration & Power Specification,” Revision 3.0b, 2006, Hewlett-Packard, Intel, Microsoft, Phoenix, Toshiba. http://www.acpi.info r2008 Linux Symposium, Volume Two • 267

Linux Ethernet Bonding Driver

May 2006

W Davis
C Tarreau
C N Gavrilov

" Linux Ethernet Bonding Driver, " T. Davis, W. Tarreau, C. Gavrilov, C.N. Tindel Linux Howto Documentation, April, 2006.

Linux Symposium, Thin Lines Mountaineering Dirk Hohndel

Andrew J Hutton
Inc Steamballoon

Andrew J. Hutton, Steamballoon, Inc., Linux Symposium, Thin Lines Mountaineering Dirk Hohndel, Intel Gerrit Huizenga, IBM Dave Jones, Red Hat, Inc.

Intel Corporation Redistribution rights are granted per submission guidelines; all other rights re- served

Aug 2008

Copyright (c) 2008, Intel Corporation. Redistribution rights are granted per submission guidelines; all other rights re- served. Proceedings of the Linux Symposium Volume Two July 23rd–26th, 2008 Ottawa, Ontario Canada Conference Organizers

Intel Virtualization Technology for Directed I/O Architecture Specification Intel(r)_VT_for_Direct_IO.pdf [8] Utilizing IOMMUs for Virtualization in Linux and Xen

Jan 2006

J Ben-Yehuda
O Mason
J Krieger
L V Xenidis
A Doorn
J Mallick
Nakamima

" Intel Virtualization Technology for Directed I/O Architecture Specification, " 2006, ftp://download.intel.com/ technology/computing/vptech/ Intel(r)_VT_for_Direct_IO.pdf [8] " Utilizing IOMMUs for Virtualization in Linux and Xen, " M. Ben-Yehuda, J. Mason, O. Krieger, J. Xenidis, L.V. Doorn, A. Mallick, and J. Nakamima, In Proceedings of the Linux Symposium, Ottawa, Ontario, Canada, (OLS), 2006.

Tindel Linux Howto Documentation

Apr 2006

T Davis
W Tarreau
C Gavrilov

"Linux Ethernet Bonding Driver," T. Davis, W. Tarreau, C. Gavrilov, C.N. Tindel Linux Howto Documentation, April, 2006.

High Available Networking

Jan 2006

"High Available Networking," M. John, Linux Journal, January, 2006.

Live Migration of Virtual Machines

Jan 2005

C Clark
K Fraser
S Hand
J G Hansen
E Jul
C Limpach
I Pratt
A Warfiled

"Live Migration of Virtual Machines," C. Clark, K. Fraser, S. Hand, J.G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfiled, In Proceedings of the 2nd Symposium on Networked Systems Design and Implementation, 2005.

Live migration with pass-through device for Linux VM

Abstract and Figures

Recommended publications

Hybrid-VirtualizationEnhanced Virtualization for Linux

SNICh: Efficient last hop networking in the data center

Direct device assignment for untrusted fully-virtualized virtual machines

P2V Migration with Post-Copy Hot Cloning for Service Downtime Reduction

Communication-aware CPU Management in Consolidated Virtualization-based Hosting Platforms

ReNIC: Architectural extension to SR-IOV I/O virtualization for efficient replication