How Slow Are C++ Exceptions?

In Systematic Error Handling in C++ Andrei Alexandrescu claims that C++ exceptions are very slow. This is understandable: the code behind the scenes that handles an exception is rather complicated. It needs to dispatch the exception to a proper `catch` block and make stack unwinding happen so that all objects that need to be destroyed, be destroyed.

It is clear that exceptions should be used for exceptional situations, and not just as a way to transfer control way up the stack.

Nevertheless, it is interesting to know how fast / slow exceptions are.

I used Google’s benchmark library to benchmark exception handling.

Scenarios I benchmarked were:

No exceptions: the benchmarked function calls another external function and modifies its result. Why call an external function: because the optimizer can optimize away an empty `try…catch` block. Why modify the result: because the optimizer can turn `CALL` instruction into `JMP`.
Same as above but with `try…catch` around the external call. This allows for measuring the real cost of the “zero cost” exception model.
Same as above but the external function throws an exception. The `catch` block swallows the exception and does nothing else. This allows for measuring the cost of exception handling.

extern int do_something();
extern int throw_something();

int func_empty()
{
    return do_something() + 1;
}

int func_trycatch()
{
    try {
        return do_something() + 1;
    }
    catch (...) {
        return -1;
    }
}

int func_throw()
{
    try {
        return throw_something() + 1;
    }
    catch (...) {
        return -1;
    }
}

#ifndef FUNCS_H
#define FUNCS_H

void func_empty();
void func_trycatch();
void func_throw();

#endif

#include 
#include 
#include "funcs.h"

int do_something()
{
    return 22;
}

int throw_something()
{
    throw std::exception();
}

static void BM_func_empty(benchmark::State& state)
{
    for (auto _ : state) {
        func_empty();
    }
}

static void BM_func_trycatch(benchmark::State& state)
{
    for (auto _ : state) {
        func_trycatch();
    }
}

static void BM_func_throw(benchmark::State& state)
{
    for (auto _ : state) {
        func_throw();
    }
}

BENCHMARK(BM_func_empty);
BENCHMARK(BM_func_trycatch);
BENCHMARK(BM_func_throw);

BENCHMARK_MAIN();

System specifications: 4 X 3800 MHz CPUs, CPU Caches: L1 Data 32K (x4), L1 Instruction 32K (x4), L2 Unified 256K (x4), L3 Unified 6144K (x1)

Compilers tested: g++ 7.2.0, g++ 6.4.0, g++ 5.4.1 (all are x86_64-linux-gnu); clang++ 5.0.0-3, clang++ 4.0.1-6, clang++ 3.9.1-17ubuntu1, clang++ 3.8.1-24ubuntu7. (all are x86_64-pc-linux-gnu).

g++ 4.8.0 didn’t work, the application crashed: `bm: malloc.c:2427: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize – 1)) == 0)’ failed.`

According to the Compiler Explorer, all g++ compilers generated the same code; clang++ compilers also generated the same code, albeit a bit different from g++.

func_empty():
        sub     rsp, 8
        call    do_something()
        add     rsp, 8
        add     eax, 1
        ret
func_trycatch():
        sub     rsp, 8
        call    do_something()
        add     eax, 1
.L4:
        add     rsp, 8
        ret
        mov     rdi, rax
        call    __cxa_begin_catch
        call    __cxa_end_catch
        or      eax, -1
        jmp     .L4

func_throw():
        sub     rsp, 8
        call    throw_something()
        add     eax, 1
.L9:
        add     rsp, 8
        ret
        mov     rdi, rax
        call    __cxa_begin_catch
        call    __cxa_end_catch
        or      eax, -1
        jmp     .L9

func_empty():                        # @func_empty()
        push    rax
        call    do_something()
        inc     eax
        pop     rcx
        ret
func_trycatch():                     # @func_trycatch()
        push    rax
        call    do_something()
        mov     ecx, eax
        inc     ecx
.LBB1_3:
        mov     eax, ecx
        pop     rcx
        ret
        mov     rdi, rax
        call    __cxa_begin_catch
        call    __cxa_end_catch
        mov     ecx, -1
        jmp     .LBB1_3
func_throw():                        # @func_throw()
        push    rax
        call    throw_something()
        mov     ecx, eax
        inc     ecx
.LBB2_3:
        mov     eax, ecx
        pop     rcx
        ret
        mov     rdi, rax
        call    __cxa_begin_catch
        call    __cxa_end_catch
        mov     ecx, -1
        jmp     .LBB2_3

	func_empty time, ns	func_trycatch time, ns	func_throw time, ns
g++ 7.2	2	2	1320
g++ 6.4	3	2	1288
g++ 5.4.1	3	2	1305
clang++ 5	2	2	1323
clang++ 4.0.1	2	2	1329
clang++ 3.9.1	2	2	1321
clang++ 3.8.1	2	2	1304

Some g++ tests says that `func_empty()` is sometimes slower than `func_trycatch()`. I find it hard to believe, as the code generated for both functions is the same (obviously, only `catch` part differs). This is probably a measurement error, and the difference in couple of nanoseconds is insignificant.

The results show that:

in terms of execution time, zero cost exception model is really zero cost — unless you have to catch an exception;
when you have to catch an exception, the overhead is quite significant — microseconds vs nanoseconds. In my opinion, this confirms the statement that exceptions should be used for exceptional situations — at least when you are concerned with performance.

How Slow Are C++ Exceptions?

Tagged on: benchmark C++ exceptions

Wild Wild Wolf

How Slow Are C++ Exceptions?

Leave a Reply Cancel reply