In Systematic Error Handling in C++ Andrei Alexandrescu claims that C++ exceptions are very slow. This is understandable: the code behind the scenes that handles an exception is rather complicated. It needs to dispatch the exception to a proper catch
block and make stack unwinding happen so that all objects that need to be destroyed, be destroyed.
It is clear that exceptions should be used for exceptional situations, and not just as a way to transfer control way up the stack.
Nevertheless, it is interesting to know how fast / slow exceptions are.
I used Google’s benchmark library to benchmark exception handling.
Scenarios I benchmarked were:
- No exceptions: the benchmarked function calls another external function and modifies its result. Why call an external function: because the optimizer can optimize away an empty
try…catch
block. Why modify the result: because the optimizer can turnCALL
instruction intoJMP
. - Same as above but with
try…catch
around the external call. This allows for measuring the real cost of the “zero cost” exception model. - Same as above but the external function throws an exception. The
catch
block swallows the exception and does nothing else. This allows for measuring the cost of exception handling.
extern int do_something(); extern int throw_something(); int func_empty() { return do_something() + 1; } int func_trycatch() { try { return do_something() + 1; } catch (...) { return -1; } } int func_throw() { try { return throw_something() + 1; } catch (...) { return -1; } }
#ifndef FUNCS_H #define FUNCS_H void func_empty(); void func_trycatch(); void func_throw(); #endif
#include <stdexcept> #include <benchmark/benchmark.h> #include "funcs.h" int do_something() { return 22; } int throw_something() { throw std::exception(); } static void BM_func_empty(benchmark::State& state) { for (auto _ : state) { func_empty(); } } static void BM_func_trycatch(benchmark::State& state) { for (auto _ : state) { func_trycatch(); } } static void BM_func_throw(benchmark::State& state) { for (auto _ : state) { func_throw(); } } BENCHMARK(BM_func_empty); BENCHMARK(BM_func_trycatch); BENCHMARK(BM_func_throw); BENCHMARK_MAIN();
System specifications: 4 X 3800 MHz CPUs, CPU Caches: L1 Data 32K (x4), L1 Instruction 32K (x4), L2 Unified 256K (x4), L3 Unified 6144K (x1)
Compilers tested: g++ 7.2.0, g++ 6.4.0, g++ 5.4.1 (all are x86_64-linux-gnu); clang++ 5.0.0-3, clang++ 4.0.1-6, clang++ 3.9.1-17ubuntu1, clang++ 3.8.1-24ubuntu7. (all are x86_64-pc-linux-gnu).
g++ 4.8.0 didn’t work, the application crashed: bm: malloc.c:2427: sysmalloc: Assertion
(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize – 1)) == 0)’ failed.
According to the <a href="https://gcc.godbolt.org/">Compiler Explorer</a>, all g++ compilers generated the same code; clang++ compilers also generated the same code, albeit a bit different from g++.
<pre class="lang:asm" title="g++">
func_empty():
sub rsp, 8
call do_something()
add rsp, 8
add eax, 1
ret
func_trycatch():
sub rsp, 8
call do_something()
add eax, 1
.L4:
add rsp, 8
ret
mov rdi, rax
call __cxa_begin_catch
call __cxa_end_catch
or eax, -1
jmp .L4
func_throw():
sub rsp, 8
call throw_something()
add eax, 1
.L9:
add rsp, 8
ret
mov rdi, rax
call __cxa_begin_catch
call __cxa_end_catch
or eax, -1
jmp .L9
</pre>
<pre class="lang:asm" title="clang++">
func_empty(): # @func_empty()
push rax
call do_something()
inc eax
pop rcx
ret
func_trycatch(): # @func_trycatch()
push rax
call do_something()
mov ecx, eax
inc ecx
.LBB1_3:
mov eax, ecx
pop rcx
ret
mov rdi, rax
call __cxa_begin_catch
call __cxa_end_catch
mov ecx, -1
jmp .LBB1_3
func_throw(): # @func_throw()
push rax
call throw_something()
mov ecx, eax
inc ecx
.LBB2_3:
mov eax, ecx
pop rcx
ret
mov rdi, rax
call __cxa_begin_catch
call __cxa_end_catch
mov ecx, -1
jmp .LBB2_3
</pre>
<table class="benchmark">
<tr>
<th></th>
<th>func_empty<br/>time, ns</th>
<th>func_trycatch<br/>time, ns</th>
<th>func_throw<br/>time, ns</th>
</tr>
<tr>
<th>g++ 7.2</th>
<td>2</td>
<td>2</td>
<td>1320</td>
</tr>
<tr>
<th>g++ 6.4</th>
<td>3</td>
<td>2</td>
<td>1288</td>
</tr>
<tr>
<th>g++ 5.4.1</th>
<td>3</td>
<td>2</td>
<td>1305</td>
</tr>
<tr>
<th>clang++ 5</th>
<td>2</td>
<td>2</td>
<td>1323</td>
</tr>
<tr>
<th>clang++ 4.0.1</th>
<td>2</td>
<td>2</td>
<td>1329</td>
</tr>
<tr>
<th>clang++ 3.9.1</th>
<td>2</td>
<td>2</td>
<td>1321</td>
</tr>
<tr>
<th>clang++ 3.8.1</th>
<td>2</td>
<td>2</td>
<td>1304</td>
</tr>
</table>
Some g++ tests says that
func_empty() is sometimes slower than
func_trycatch(). I find it hard to believe, as the code generated for both functions is the same (obviously, only
` part differs). This is probably a measurement error, and the difference in couple of nanoseconds is insignificant.
The results show that:
- in terms of execution time, zero cost exception model is really zero cost — unless you have to catch an exception;
- when you have to catch an exception, the overhead is quite significant — microseconds vs nanoseconds. In my opinion, this confirms the statement that exceptions should be used for exceptional situations — at least when you are concerned with performance.
Recommended reading: Top 15 C++ Exception handling mistakes and how to avoid them.