The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J....

Post on 13-Jan-2016

217 views 2 download

Transcript of The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J....

The Performance of Micro-Kernel-Based Systems

H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter

Presentation by: Seungweon Park

Introduction

μ-kernels have reputation for being too slow, inflexible

Can 2nd generation μ-kernel (L4) overcome limitations?

Experiment: – Port Linux to run on L4 (Mach 3.0)– Compared to native Linux, MkLinux (Linux on 1st

gen Mach derived μ-kernel)

Introduction (cont.)

Test speed of standard OS personality on top of fast μ-kernel: Linux implemented on L4

Test extensibility of system:– pipe-based communication implemented directly on μ-

kernel– mapping-related OS extensions implemented as user tasks– user-level real-time memory management implemented

Test if L4 abstractions independent of platform

L4 Essentials

Based on threads and address spaces Recursive construction of address spaces by user-

level servers– Initial address space σ0 represents physical memory

– Basic operations: granting, mapping, and unmapping.

Owner of address space can grant or map page to another address space

All address spaces maintained by user-level servers (pagers)

L4Linux – Design & Implementation

Fully binary compliant with Linux/X86 Restricted modifications to architecture-

dependent part of Linux No Linux-specific modifications to L4 kernel

L4Linux – Design & Implementation

Address Spaces– Initial address space σ0 represents physical memory

– Basic operations: granting, mapping, and unmapping.

– L4 uses “flexpages”: logical memory ranging from one physical page up to a complete address space.

– An invoker can only map and unmap pages that have been mapped into its own address space

L4Linux – Design & Implementation

L4Linux – Design & Implementation

Address Spaces (cont.)– I/O ports are parts of address spaces.– Hardware interrupts are handled by user-level

processes. The L4 kernel will send a message via IPC.

L4Linux – Design & Implementation

The Linux server– L4Linux will use a single-server approach.– A single Linux server will run on top of L4, multiplexing

a single thread for system calls and page faults.– The Linux server maps physical memory into its

address space, and acts as the pager for any user processes it creates.

– The Server cannot directly access the hardware page tables, and must maintain logical pages in its own address space.

L4Linux – Design & Implementation

Interrupt Handling– All interrupt handlers are mapped to messages.– The Linux server contains threads that do nothing

but wait for interrupt messages.– Interrupt threads have a higher priority than the

main thread.

L4Linux – Design & Implementation

User Processes– Each different user process is implemented as a

different L4 task: Has its own address space and threads.

– The Linux Server is the pager for these processes. Any fault by the user-level processes is sent by RPC from the L4 kernel to the Server.

L4Linux – Design & Implementation

System Calls– Three system call interfaces:

A modified version of libc.so that uses L4 primitives. A modified version of libc.a A user-level exception handler (trampoline) calls the

corresponding routine in the modified shared library.

– The first two options are the fastest. The third is maintained for compatibility.

L4Linux – Design & Implementation

Signalling– Each user-level process has an additional thread

for signal handling.– Main server thread sends a message for the

signal handling thread, telling the user thread to save it’s state and enter Linux

L4Linux – Design & Implementation

Scheduling– All thread scheduling is down by the L4 kernel– The Linux server’s schedule() routine is only

used for multiplexing it’s single thread.– After each system call, if no other system call is

pending, it simply resumes the user process thread and sleeps.

L4Linux – Design & Implementation

Tagged TLB & Small Space.– In order to reduce TLB conflicts, L4Linux has a

special library to customize code and data for communicating with the Linux Server

– The emulation library and signal thread are mapped close to the application, instead of default high-memory area.

Performance

What is the penalty of using L4Linux?Compare L4Linux to native Linux

Does the performance of the underlying micro-kernel matter?

Compare L4Linux to MkLinux

Does co-location improve performance?Compare L4Linux to an in-kernel version of MkLinux

Microbenchmarks

measured system call overhead on shortest system call “getpid()”

Microbenchmarks (cont.)

Measures specific system calls to determine basic performance.

Macrobenchmarks

measured time to recompile Linux server

Macrobenchmarks (cont.)

Next use a commercial test suite to simulate a system under full load.

Performance Analysis

L4Linux is, on average 8.3% slower than native Linux. Only 6.8% slower at maximum load.

MkLinux: 49% average, 60% at maximum. Co-located MkLinux: 29% average, 37% at

maximum.

Extensibility Performance

A micro-kernel must provide more than just the features of the OS running on top of it.

Specialization – improved implementation of Os functionality

Extensibility – permits implementation of new services that cannot be easily added to a conventional OS.

Pipes and RPC

First five (1) use the standard pipe mechanism of the Linux kernel.

(2) Is asynchronous and uses only L4 IPC primitives. Emulates POSIX standard pipes, without signalling. Added thread for buffering and cross-address-space communication.

(3) Is synchronous and uses blocking IPC without buffering data.

(4) Maps pages into the receiver’s address space.

Virtual Memory Operations

The “Fault” operation is an example of extensibility – measures the time to resolve a page fault by a user-defined pager in a separate address space.

“Trap” – Latency between a write operation to a protected page, and the invocation of related exception handler.

“Appel1” – Time to access a random protected page. The fault handler unprotects the page, protects some other page, and resumes.

“Appel2” – Time to access a random protected page where the fault handler only unprotects the page and resumes.

Conclusion

Using the L4 micro-kernel imposes a 5-10% slowdown to native Linux. Much faster than previous micro-kernels.

Further optimizations such as co-locating the Linux Server, and providing extensibility could improve L4Linux even further.

Q&A