From time to time you have to deal with someone else’s code. And the code you have to deal with sometimes surprises you.
For example, Pimple, a small Dependency Injection Container for PHP; recent versions of which, according to README, are more focused on performance
.
Performance… So much in this word…
Learning something new is always interesting and sometimes funny, and therefore I started to read the code.
The deeper I went, the more I got confused: if that is the code focused on performance, what kind of code do you usually write?
class Container implements \ArrayAccess
ArrayAccess is slow. Yes, it is convenient, offers more readability but is is very slow. It is even faster to call offsetGet
/offsetSet
/offsetExists
directly than to use ArrayAccess.
$this->factories = new \SplObjectStorage(); $this->protected = new \SplObjectStorage();
It is pretty interesting approach to use objects as keys (are you really sure you need this?) but this artificially introduces some limitations: SplObjectStorage accepts only objects, and therefore factories can be either closures or invocable objects (classes with __invoke()
method), but not arrays (say, [$object, 'method']
, which is a valid callable in PHP).
Oh well, oh well… In PHP, if you really want performance, you need to change your habits as to how you write the code. The reason is that the PHP interpreter in many ways is as dumb as a rock: it will do what you tell it to, and it won’t try to be smarter than you.
So, whenever you see $this->values[$id]
, you can be sure that PHP will really fetch the property called ‘values’ from $this
, and then will look up a value by the key $id
. Common subexpression elimination? No way. Honestly, it is probably not that easy task anyway: if a property is an object implementing ArrayAccess
interface, you in theory can have different values every time you call offsetGet
; to make sure that this does not happen, the optimizer needs to know the name of the class, and have that class available (which is not always the case thanks to autoloading).
Below is what VLD shows for the above offsetGet
implementation (you can reproduce the result with php -d extension=vld.so -d vld.active=1 -d opcache.enable_cli=1 Container.php
; I intentionally used opcache.enable_cli=1
to mimic the real production environment):
function name: offsetGet number of ops: 60 compiled vars: !0 = $id, !1 = $raw, !2 = $val line #* E I O op fetch ext return operands ------------------------------------------------------------------------------------- 98 0 E > RECV !0 100 1 FETCH_OBJ_IS $4 'keys' 2 ISSET_ISEMPTY_DIM_OBJ 33554432 ~3 $4, !0 3 > JMPNZ ~3, ->8 101 4 > NEW $3 :-4 5 SEND_VAR_EX !0 6 DO_FCALL 0 7 > THROW 0 $3 105 8 > FETCH_OBJ_IS $4 'raw' 9 ISSET_ISEMPTY_DIM_OBJ 33554432 ~3 $4, !0 10 > JMPNZ ~3, ->27 106 11 > FETCH_OBJ_R $4 'values' 12 FETCH_DIM_R $5 $4, !0 13 TYPE_CHECK 8 ~4 $5 14 > JMPZ ~4, ->27 107 15 > FETCH_OBJ_R $4 'values' 16 FETCH_DIM_R $5 $4, !0 17 FETCH_OBJ_IS $4 'protected' 18 ISSET_ISEMPTY_DIM_OBJ 33554432 ~3 $4, $5 19 > JMPNZ ~3, ->27 108 20 > INIT_FCALL 'method_exists' 21 FETCH_OBJ_R $4 'values' 22 FETCH_DIM_R $3 $4, !0 23 SEND_VAR $3 24 SEND_VAL '__invoke' 25 DO_ICALL $3 26 > JMPNZ $3, ->30 110 27 > FETCH_OBJ_R $4 'values' 28 FETCH_DIM_R $3 $4, !0 29 > RETURN $3 113 30 > FETCH_OBJ_R $3 'values' 31 FETCH_DIM_R $5 $3, !0 32 FETCH_OBJ_IS $4 'factories' 33 ISSET_ISEMPTY_DIM_OBJ 33554432 ~3 $4, $5 34 > JMPZ ~3, ->42 114 35 > FETCH_OBJ_R $4 'values' 36 FETCH_DIM_R $3 $4, !0 37 INIT_DYNAMIC_CALL $3 38 FETCH_THIS $3 39 SEND_VAR_EX $3 40 DO_FCALL 0 $3 41 > RETURN $3 117 42 > FETCH_OBJ_R $4 'values' 43 FETCH_DIM_R $3 $4, !0 44 QM_ASSIGN !1 $3 118 45 INIT_DYNAMIC_CALL !1 46 FETCH_THIS $3 47 SEND_VAR_EX $3 48 DO_FCALL 0 $4 49 FETCH_OBJ_W $5 'values' 50 ASSIGN_DIM $3 $5, !0 51 OP_DATA $4 52 QM_ASSIGN !2 $3 119 53 FETCH_OBJ_W $3 'raw' 54 ASSIGN_DIM $3, !0 55 OP_DATA !1 121 56 FETCH_OBJ_W $3 'frozen' 57 ASSIGN_DIM $3, !0 58 OP_DATA123 59 > RETURN !2
For the records, “unoptimized” (with OpCache disabled) version had 67 operations.
Oplines 8 to 29 are responsible for the second if operator, I will decode them to explain what happens:
FETCH_OBJ_IS $4 'raw'
silently (that is, does not complain if the property does not exist) fetches property$this->raw
into a variable $4.ISSET_ISEMPTY_DIM_OBJ ~3 $4, !0
checks whether$4[!0]
(that is,$this->raw[$id]
) is empty and stores the result to a compiled variable ~3JMPNZ ~3, ->27
transfers control to the 27th opline if ~3 is not zero.FETCH_OBJ_R $4 'values'
fetches$this->values
into $4FETCH_DIM_R $5 $4, !0
reads$4[!0]
(that is,$this->raw[$id]
) into $5TYPE_CHECK 8 ~4 $5
checks whether $5 is of type 8 (IS_OBJECT) and stores the result into ~4- Oplines 15 and 16 are the same as 13 and 14 because Zend OpCache does not eliminate common subexpressions
FETCH_OBJ_IS $4 'protected'
silently fetches$this->protected
into $4ISSET_ISEMPTY_DIM_OBJ ~3 $4, $5
checks if$4[$5]
is set (isset($this->protected[$this->values[$id]])
) and stores the result to ~3JMPNZ ~3, ->27
transfers the control to the 27th opline if ~3 is not zero.INIT_FCALL 'method_exists'
prepares function call info and function call info cache formethod_exists()
function (INIT_FCALL
is roughly equivalent tozend_fcall_info_init()
Zend API)- Oplines 21 and 22… they look so familiar, I bet we have already seen them somewhere!
- Oplines 23 and 24 sends parameters to method_exists, and
DO_ICALL
invokesmethod_exists()
and stores the return value to $3.JMPNZ
transfers control to the 30th opline if $3 is not zero. And I cannot get rid of annoying feeling that I have already seen oplines 27 and 28.
I have written a small PhpBench benchmark to see how fast Pimple is:
<?php /** * @Revs(1000000) * @Iterations(5) * @OutputMode("throughput") * @OutputTimeUnit("seconds", precision=3) * @Groups({"Container"}) */ class PimpleContainerBench { private $x; public function __construct() { $this->x = new Pimple\Container(); $this->x['factory'] = $this->x->factory(function() { return 1; }); $this->x['shared'] = function() { return 2; }; $x = $this->x['shared']; // Resolve } /** * @Subject */ public function getShared() { $this->x['shared']; } /** * @Subject */ public function getFactory() { $this->x['factory']; } }
On average, getShared
performed 7,916,916 operations per second, getFactory
was slower and performed 2,067,365 operations per second.
Now let us try to optimize the container.
The most obvious optimization is to eliminate common subexpression:
@@ -101,20 +101,21 @@ throw new UnknownIdentifierException($id); } + $raw = $this->values[$id]; + if ( isset($this->raw[$id]) - || !\is_object($this->values[$id]) - || isset($this->protected[$this->values[$id]]) - || !\method_exists($this->values[$id], '__invoke') + || !\is_object($raw) + || isset($this->protected[$raw]) + || !\method_exists($raw, '__invoke') ) { - return $this->values[$id]; + return $raw; } - if (isset($this->factories[$this->values[$id]])) { - return $this->values[$id]($this); + if (isset($this->factories[$raw])) { + return $raw($this); } - $raw = $this->values[$id]; $val = $this->values[$id] = $raw($this); $this->raw[$id] = $raw;
However, this is not the best solution, and in fact, it will be slower for shared services than the original code.
line #* E I O op fetch ext return operands ------------------------------------------------------------------------------------- … 104 8 > FETCH_OBJ_R $4 'values' 9 FETCH_DIM_R $3 $4, !0 10 QM_ASSIGN !1 $3 107 11 FETCH_OBJ_IS $4 'raw' 12 ISSET_ISEMPTY_DIM_OBJ 33554432 ~3 $4, !0 13 > JMPNZ ~3, ->24 108 14 > TYPE_CHECK 8 ~4 !1 15 > JMPZ ~4, ->24 109 16 > FETCH_OBJ_IS $4 'protected' 17 ISSET_ISEMPTY_DIM_OBJ 33554432 ~3 $4, !1 18 > JMPNZ ~3, ->24 110 19 > INIT_FCALL 'method_exists' 20 SEND_VAR !1 21 SEND_VAL '__invoke' 22 DO_ICALL $3 23 > JMPNZ $3, ->25 112 24 > > RETURN !1
For a shared service, the condition isset($this->raw[$id])
will be true, so the path will be:
FETCH_OBJ_R $4 'values' FETCH_DIM_R $3 $4, !0 QM_ASSIGN !1 $3 FETCH_OBJ_IS $4 'raw' ISSET_ISEMPTY_DIM_OBJ ~3 $4, !0 JMPNZ ~3, 24 RETURN !1
In the original code the path was:
FETCH_OBJ_IS $4 'raw' ISSET_ISEMPTY_DIM_OBJ ~3 $4, !0 JMPNZ ~3, 27 FETCH_OBJ_R $4 'values' FETCH_DIM_R $3 $4, !0 RETURN $3
That is, 6 vs 7 instructions. Does one instruction make a difference? You bet!
getShared
benchmark now shows 7,761,829 operations per second, which is roughly 150,000 ops/sec less 🙂 That’s the power of one instruction.
The proper optimization would be
@@ -101,20 +101,24 @@ throw new UnknownIdentifierException($id); } + if (isset($this->raw[$id])) { + return $this->values[$id]; + } + + $raw = $this->values[$id]; + if ( - isset($this->raw[$id]) - || !\is_object($this->values[$id]) - || isset($this->protected[$this->values[$id]]) - || !\method_exists($this->values[$id], '__invoke') + !\is_object($raw) + || isset($this->protected[$raw]) + || !\method_exists($raw, '__invoke') ) { - return $this->values[$id]; + return $raw; } - if (isset($this->factories[$this->values[$id]])) { - return $this->values[$id]($this); + if (isset($this->factories[$raw])) { + return $raw($this); } - $raw = $this->values[$id]; $val = $this->values[$id] = $raw($this); $this->raw[$id] = $raw;
This change gives us 7,993,924 (vs 7,916,916) ops/s for getShared
and 2,310,604 (vs 2,067,365) ops/s for getFactory
.
getShared
path is now
FETCH_OBJ_IS $4 'raw' ISSET_ISEMPTY_DIM_OBJ ~3 $4, !0 JMPZ ~3, ->14 /* branch NOT taken */ FETCH_OBJ_R $4 'values' FETCH_DIM_R $3 $4, !0 RETURN $3
We have same 6 opcodes, so the speed should not really differ for getShared
(77 kops/sec is probably a measurement error), but the difference between two getFactory
benchmarks is ~250 kops/sec.
Frankly speaking, getOffset
is not the only place that can be optimized; for example, I would get rid of Container::$keys
property: this would save some space and code; I would differently implement __construct()
(yes, I do realize it is called once but the code smells); finally, I probably would not use SplObjectStorage
. However, offsetGet
is the most used method, and therefore it makes sense to optimize it first.
The moral of this story is that if you really care about performance, don’t delegate this to the optimizer, do it yourself.
PS: if this version of Pimple is more focused on performance
, I don’t think I want to see slower ones.