The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J....

26
The Performance of Micro- Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park

Transcript of The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J....

Page 1: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

The Performance of Micro-Kernel-Based Systems

H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter

Presentation by: Seungweon Park

Page 2: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Introduction

μ-kernels have reputation for being too slow, inflexible

Can 2nd generation μ-kernel (L4) overcome limitations?

Experiment: – Port Linux to run on L4 (Mach 3.0)– Compared to native Linux, MkLinux (Linux on 1st

gen Mach derived μ-kernel)

Page 3: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Introduction (cont.)

Test speed of standard OS personality on top of fast μ-kernel: Linux implemented on L4

Test extensibility of system:– pipe-based communication implemented directly on μ-

kernel– mapping-related OS extensions implemented as user tasks– user-level real-time memory management implemented

Test if L4 abstractions independent of platform

Page 4: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4 Essentials

Based on threads and address spaces Recursive construction of address spaces by user-

level servers– Initial address space σ0 represents physical memory

– Basic operations: granting, mapping, and unmapping.

Owner of address space can grant or map page to another address space

All address spaces maintained by user-level servers (pagers)

Page 5: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4Linux – Design & Implementation

Fully binary compliant with Linux/X86 Restricted modifications to architecture-

dependent part of Linux No Linux-specific modifications to L4 kernel

Page 6: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4Linux – Design & Implementation

Address Spaces– Initial address space σ0 represents physical memory

– Basic operations: granting, mapping, and unmapping.

– L4 uses “flexpages”: logical memory ranging from one physical page up to a complete address space.

– An invoker can only map and unmap pages that have been mapped into its own address space

Page 7: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4Linux – Design & Implementation

Page 8: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4Linux – Design & Implementation

Address Spaces (cont.)– I/O ports are parts of address spaces.– Hardware interrupts are handled by user-level

processes. The L4 kernel will send a message via IPC.

Page 9: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4Linux – Design & Implementation

The Linux server– L4Linux will use a single-server approach.– A single Linux server will run on top of L4, multiplexing

a single thread for system calls and page faults.– The Linux server maps physical memory into its

address space, and acts as the pager for any user processes it creates.

– The Server cannot directly access the hardware page tables, and must maintain logical pages in its own address space.

Page 10: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4Linux – Design & Implementation

Interrupt Handling– All interrupt handlers are mapped to messages.– The Linux server contains threads that do nothing

but wait for interrupt messages.– Interrupt threads have a higher priority than the

main thread.

Page 11: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4Linux – Design & Implementation

User Processes– Each different user process is implemented as a

different L4 task: Has its own address space and threads.

– The Linux Server is the pager for these processes. Any fault by the user-level processes is sent by RPC from the L4 kernel to the Server.

Page 12: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4Linux – Design & Implementation

System Calls– Three system call interfaces:

A modified version of libc.so that uses L4 primitives. A modified version of libc.a A user-level exception handler (trampoline) calls the

corresponding routine in the modified shared library.

– The first two options are the fastest. The third is maintained for compatibility.

Page 13: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4Linux – Design & Implementation

Signalling– Each user-level process has an additional thread

for signal handling.– Main server thread sends a message for the

signal handling thread, telling the user thread to save it’s state and enter Linux

Page 14: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4Linux – Design & Implementation

Scheduling– All thread scheduling is down by the L4 kernel– The Linux server’s schedule() routine is only

used for multiplexing it’s single thread.– After each system call, if no other system call is

pending, it simply resumes the user process thread and sleeps.

Page 15: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

L4Linux – Design & Implementation

Tagged TLB & Small Space.– In order to reduce TLB conflicts, L4Linux has a

special library to customize code and data for communicating with the Linux Server

– The emulation library and signal thread are mapped close to the application, instead of default high-memory area.

Page 16: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Performance

What is the penalty of using L4Linux?Compare L4Linux to native Linux

Does the performance of the underlying micro-kernel matter?

Compare L4Linux to MkLinux

Does co-location improve performance?Compare L4Linux to an in-kernel version of MkLinux

Page 17: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Microbenchmarks

measured system call overhead on shortest system call “getpid()”

Page 18: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Microbenchmarks (cont.)

Measures specific system calls to determine basic performance.

Page 19: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Macrobenchmarks

measured time to recompile Linux server

Page 20: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Macrobenchmarks (cont.)

Next use a commercial test suite to simulate a system under full load.

Page 21: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Performance Analysis

L4Linux is, on average 8.3% slower than native Linux. Only 6.8% slower at maximum load.

MkLinux: 49% average, 60% at maximum. Co-located MkLinux: 29% average, 37% at

maximum.

Page 22: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Extensibility Performance

A micro-kernel must provide more than just the features of the OS running on top of it.

Specialization – improved implementation of Os functionality

Extensibility – permits implementation of new services that cannot be easily added to a conventional OS.

Page 23: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Pipes and RPC

First five (1) use the standard pipe mechanism of the Linux kernel.

(2) Is asynchronous and uses only L4 IPC primitives. Emulates POSIX standard pipes, without signalling. Added thread for buffering and cross-address-space communication.

(3) Is synchronous and uses blocking IPC without buffering data.

(4) Maps pages into the receiver’s address space.

Page 24: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Virtual Memory Operations

The “Fault” operation is an example of extensibility – measures the time to resolve a page fault by a user-defined pager in a separate address space.

“Trap” – Latency between a write operation to a protected page, and the invocation of related exception handler.

“Appel1” – Time to access a random protected page. The fault handler unprotects the page, protects some other page, and resumes.

“Appel2” – Time to access a random protected page where the fault handler only unprotects the page and resumes.

Page 25: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Conclusion

Using the L4 micro-kernel imposes a 5-10% slowdown to native Linux. Much faster than previous micro-kernels.

Further optimizations such as co-locating the Linux Server, and providing extensibility could improve L4Linux even further.

Page 26: The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.

Q&A