Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve...

34
Introduction to bhyve Takuya ASADA / @syuu1228

Transcript of Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve...

Page 1: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Introduction to bhyveTakuya ASADA / @syuu1228

Page 2: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

What is bhyve?

Page 3: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

What is bhyve?

• bhyve is a hypervisor introduced in FreeBSD

• Similar to Linux KVM, runs on host OS

• BSD License

• Developed by Peter Grehan and Neel Natu

Page 4: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

bhyve features• Required Intel VT-x and EPT (Nehalem or later)

AMD support in progress

• Does not support BIOS/UEFI for now UEFI support in progress

• Minimal device emulation support: virtio-blk, virtio-net, COM port + α

• Supported guest OS: FreeBSD/amd64, i386, Linux/x86_64, OpenBSD/amd64

Page 5: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

How to use it?

kldload vmm.ko

/usr/sbin/bhyveload -m ${mem} -d ${disk} ${name}

/usr/sbin/bhyve -c ${cpus} -m ${mem} \-s 0,hostbridge -s 2,virtio-blk,${disk} \ -s 3,virtio-net,${tap} -s 31,lpc -l com1,stdio vm0

Page 6: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

How to run Linux?

• bhyve OS Loader(/usr/sbin/bhyveload) only supports FreeBSD You need another OS Loader to support other OSs

• grub2-bhyve is the solution

• It’s modified version grub2, runs on host OS (FreeBSD)

• Can load Linux and OpenBSD

• Available in ports & pkg!

Page 7: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Virtualization in general

Page 8: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Difference between container and hypervisor

• Jail is container

• It’s virtualize OS environment on kernel level

• bhyve is hypervisor

• It virtualizes whole machine

• Totally different approach

Page 9: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Container• Process in jail is just a normal

process for the kernel

• The kernel do some tricks to isolate environments between jails

• Lightweight, less-overhead

• Share one kernel with all jails → If the kernel panics, all jails will die

• You cannot install another OS (No Windows, No Linux!)

jail2jail1

Kernel

DiskNIC

process

process

process

Page 10: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Hypervisor• Hypervisor virtualizes a machine

• From guest OS, it looks like real hardware

• Virtual machine is a normal process for host OS

• Does not share kernel, it is completely isolated

• You can run Full OS inside of the VM → Windows! Linux!

Kernel

DiskNIC

Hypervisor

process

vm1

Kernel

Disk NIC

process

vm2

Kernel

Disk NIC

process

Page 11: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

How hypervisor virtualize machine?

• To make complete virtual machine, you need to virtualize following things:

• CPU

• Memory (Address Space)

• I/O

Page 12: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

CPU Virtualization: Emulate entire CPU?

• Like QEMU

• You can emulate the entire CPU operation on a normal process

• Very slow, not a really useful choice for virtualization

QEMU mov dx,3FBhmov al,128out dx,al

CPUemulator

run

virtual device

OS

physical device

physical CPU

IO

Page 13: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

CPU Virtualization: Direct execution?

• You want run guest instructions directly on a real CPU since you are virtualizing x86 on x86

• You need to avoid executing some instructions which modify system global state, or perform I/O (called sensitive instructions)

• If you execute these instructions on a real CPU, it may break host OS state such as directly accessing a HW device

Page 14: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Perform I/O on VM

• You need to avoid access to real HW from VM

• Need to prevent execution of the instruction

GuestOS

Virtual CPUReal Display

outb

Page 15: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Perform IO on VM

• You can trap them by executing in lower privileged mode

• However, on x86, there are some instructions which are impossible to trap because these are nonprivileged instructions

GuestOS

Virtual CPU

outb

Virtual Display

trap!

Page 16: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Software techniques to virtualize x86

• Binary translation (old VMware): interpret & modify guest OS’s instructions on-the-fly → Runs fast, but implementation is very complex

• Paravirtualization (old Xen): Modify guest OS for the hypervisor → Runs fast, but is impossible to run unmodified OS’s

• We want an easier & better solution → HW assisted virtualization!

Page 17: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Hardware assisted virtualization(Intel VT-x)

• New CPU mode: VMX root mode (hypervisor) / VMX non-root mode (guest)

• If some event needs to emulate in the hypervisor, CPU stops guest, exit to hypervisor → VMExit

• You don’t need complex software techniquesYou don’t have to modify the guest OS

User(Ring 3)

Kernel(Ring 0)

User(Ring 3)

Kernel(Ring 0)

VMXroot mode

VMXnon-root

mode

VMEntry

VMExit

Page 18: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Memory Virtualization

• If you run guest OS natively, memory address translation become problematic

• If GuestB loads Page table A, virtual page 1 translate to Host physical page 1but you meant Host physical page 5

Process A

1

Process B12

Guest physical memory

21

34

1 12

1 32 4

Page table A

Page table B

Guest A

1

12

21

34

1 12

1 32 4

Host physical memory

21

7

3456

8

Process A

Process B

Guest physical memoryPage table A

Page table B

Guest B

Page 19: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Shadow Paging

• Trap page table loading/modifying, create “Shadow Page Table”, tell physical page number to the MMU

• A software trick that works well, but is slow

1

12

21

34

1 22

1 32 4

21

7

3456

8

1 52

1 72 8

Process A

Process B

Guest physical memory

Page table A

Page table B

Guest A

Host physical memoryPage table A'

Page table B'

Page 20: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Nested Paging (Intel EPT)

• HW assisted memory virtualization!

• You will have Guest physical : Host physical translation table

• MMU translates address by two step (Nested)

1

12

21

34

1 22

1 32 4

21

7

3456

8

1 52 6

EPT A

3 74 8

Process A

Process B

Guest physical memory

Page table A

Page table B

Guest AHost physical memory

Page 21: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

I/O Virtualization

• To run unmodified OSs, you’ll need to emulate all devices what you have on the real hardware

• SATA, NIC(e1000), USB(ehci), VGA(Cirrus), Interrupt controller(LAPIC, IO-APIC), Clock(HPET), COM port…

• Emulating real devices is not very fast because it causes lot of VMExits, not ideal for for virtualization

Page 22: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Paravirtual I/O

• Virtual I/O device is designed for VM use

• Much faster than emulating real devices

• Required device driver on guest OS

• De-facto standard: virtio-blk, virtio-net

Page 23: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

PCI Device passthrough

• If you attach a real HW device on a VM, you will have a problem with DMA

• Because the device requires physical address for DMA but the guest OS doesn’t know the Host physical address

• Address translator for the devices: IOMMU(Intel VT-d)

• Translates guest physical to host physical using a translation table

Physical memory

21

7

3456

8

PCIDevices

DMA!

5 16 27 38 4

IOMMU translation table

Process A

1

Process B12 Guest physical

memory

21

34

1 22

1 32 4

Pagetable A

Pagetable B

Guest A

1 52 6

EPT A

3 74 8

Page 24: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

bhyve internals

Page 25: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

How bhyve virtualize machine?

• CPU: HW-assisted virtualization (Intel VT-x)

• Memory: HW-assisted memory virtualization (Intel EPT)

• IO: virtio, PCI passthrough, +α

• Uses HW assisted features

Page 26: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

bhyve overview• bhyveload: loads

guest OS

• bhyve: userland part of Hypervisor, emulates devices

• bhyvectl: a management tool

• libvmmapi: userland API

• vmm.ko: kernel part of Hypervisor

FreeBSD kernel

bhyveload bhyve

/dev/vmm/${vm_name} (vmm.ko)

Guest kernel

1. Create VM instance,load guest kernel

2. Run VM instace

HD

NIC

Console

Disk imagetap device

stdin/stdout

bhyvectl

libvmmapi

3. Destroy VMinstance

mmap/ioctl

Page 27: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

vmm.ko

• All VT-x features only accessible in kernel mode, vmm.ko handles it

• Most important work of vmm.ko is CPU mode switching between hypervisor/guest

• Provides interface for userland via /dev/vmm/${vmname}

• Each vmm device file contains each VM instance state

Page 28: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

/dev/vmm/${vmname} interfaces

• create/destroyCan create/destroy device file via sysctl hw.vmm.create, hw.vmm.destroy

• read/write/mmapCan access guest memory area by standard syscall (Which means you even can dump guest memory by dd command)

• ioctl Provides various operations to VM

Page 29: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

/dev/vmm/${vmname} ioctls

• VM_MAP_MEMORY: Maps guest memory area at requested size

• VM_SET/GET_REGISTER: Access registers

• VM_RUN: Run guest machine, until virtual devices accessed (or some other trap happened)

Page 30: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

libvmmapi

• wrapper library of /dev/vmm operations

• vm_create(name)→ sysctl(“hw.vmm.create”, name)

• vm_set_register(reg, val) → ioctl(VM_SET_REGISTER, reg, val)

Page 31: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

bhyveload• bhyve uses OS loader instead of BIOS/UEFI, to load guest OS

• FreeBSD bootloader ported to userland: userboot

• bhyveload runs host OS, to initialize guest OS

• Once it called, it does following things:

• Parse UFS on diskimage, find kernel

• Load kernel to guest memory area

• Initialize Page Table

• Create GDT, IDT, LDT

• Initialize special registers to get ready for 64bit mode

• Guest machine can starts from kernel entry point, with 64bit mode

Page 32: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

bhyve

• bhyve command is the userland part of the hypervisor

• It invokes ioctl(VM_RUN) to run GuestOS

• Emulates virtual devices

• Provides user interface(no GUI for now)

Page 33: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

main loop in bhyvewhile (1) {

ioctl(VM_RUN, &vmexit);

switch (vmexit.exit_code) {

case IOPORT_ACCESS:

emulate_device(vmexit.ioport); … }

}

Page 34: Introduction to bhyve - bhyvecon - The BSD Hypervisor ...bhyvecon.org/introduction_to_bhyve.pdfbhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress!

Q&A?