Performance Overview ******************** IPC, Context-Switch and Syscall Performance =========================================== With L4Re being a microkernel-based system and hypervisor, some of you are interested in the IPC and syscall performance of L4Re as well as the performance of context switches. IPC is a base-level communication mechanisms that allows to exchange a limited amount of payload data between two threads. Context switching is switching from one executing thread to another, which sending a message is exactly doing. The fastest IPC is between two threads running in the same address space (task) on the same CPU core (`Intra`). `Inter` is IPC between two address spaces. A syscall is also an IPC but only communicates with the kernel. The following table provides IPC performance numbers for a single IPC on various popular platforms (average over multiple ten thousand calls). To perform the measurement, the L4Re microkernel has been configured in its performance configuration ``CONFIG_PERFORMANCE=y``, i.e., without assertions. The source code of the benchmark program can be found `here `_. The images used to measure those are linked in the table below. Numbers are measured with the performance counters. On Arm, the cycle counter is used. On x86, the fixed-function counters are used. +-----------------+----------------+------------------------------------------+--------------------+---------------------------------------------------------------------------------------------+ | Platform | Processor | IPC (in CPU cycles) | Syscall | Image | | | +--------------------+---------------------+ | | | | | Intra | Inter | | | +=================+================+====================+=====================+====================+=============================================================================================+ | Raspberry Pi 5 | Arm Cortex-A76 | 247 | 384 | 138 | `▶️ `__ | | 64bit - EL1 | | | | | | +-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+ | Raspberry Pi 5 | Arm Cortex-A76 | 300 | 401 | 202 | `▶️ `__ | | 64bit - EL2 | | | | | | +-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+ | NXP S32G2 64bit | Arm Cortex-A53 | 562 | 691 | 230 | `▶️ `__ | | - EL1 | | | | | | +-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+ | NXP S32G2 64bit | Arm Cortex-A53 | 661 | 770 | 228 | `▶️ `__ | | - EL2 | | | | | | +-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+ | Ampere Altra (32| Arm Neoverse-N1| 257 | 399 | 142 | `▶️ `__ | | /80 Cores) 64bit| | | | | | | - EL1 | | | | | | +-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+ | Ampere Altra (32| Arm Neoverse-N1| 299 | 440 | 148 | `▶️ `__ | | /80 Cores) 64bit| | | | | | | - EL2 | | | | | | +-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+ | amd64 / x86_64 | Intel N100 | 173/622/557 [#1]_ | 390/1388/613 [#1]_ | 52/188/147 [#1]_ | `▶️ `__ | +-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+ | amd64 / x86_64 | Intel Xeon | 511/649/543 [#1]_ | 934/1128/587 [#1]_ | 222/160/148 [#1]_ | `▶️ `__ | | | Platinum 8352S | | | | | +-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+ | amd64 / x86_64 | Intel Xeon | 664/663/557 [#1]_ | 1172/1172/613 [#1]_ | 146/146/147 [#1]_ | `▶️ `__ | | | Gold 6248R | | | | | +-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+ .. [#1] Values reflect the PMC's fixed-function counters 2 (TSC without halt) / 1 (clocks unhalted) / 0 (instructions retired) For x86: You can boot the image directly in GRUB2, e.g. ``multiboot2 (http,l4re.org)/download/ipcbench/amd64/l4re_ipcbench-20250602.elf`` For the Raspberry Pi's, converting the uimage to a raw image for firmware boot works like this: ``dd if=l4re_ipcbench_rpi5-elX.uimage of=l4re.raw bs=64 skip=1``. Plots of Parallel Execution of the Benchmark ============================================ Intra space IPC core-local IPC and system calls, parallel on cores .. raw:: html .. raw:: html :file: ../_build/perf/arm_altra80_el1_ipc.html .. raw:: html :file: ../_build/perf/arm_altra80_el2_ipc.html .. raw:: html :file: ../_build/perf/x86_xeon_gold_6248R_2socket_noht.html .. raw:: html :file: ../_build/perf/x86_xeon_gold_6248R_2socket_all.html .. raw:: html :file: ../_build/perf/x86_n100.html