PHP extensions are usually written in C. C is a great and powerful language but the programmer needs to be very attentive: memory management is manual, if you forget to destroy a resource after you are through with it, you have a memory leak; if you need to make a copy of a zval and you forgot to increment its reference counter, chances are you will get a nice crash.

This is where C++ can help thanks to the RAII technique. But just like any tool, C++ must be properly used.

Memory allocation

Zend Engine uses its own memory allocation functions. They allow Zend Engine to recover from memory leaks (which happen either because of genuine bugs, or when the code did not have a chance to recover from a fatal error). Consider the following situation:

char* buffer = new char[1048576];
call_userland_function_that_dies();
delete[] buffer;

We allocate a 1 MB buffer, then call a userland function we cannot control, and which may suddenly terminate the script, and then we dispose of the allocated buffer. But in case that userland function never returns, we have a memory leak. Zend Engine will be unable to fix it automatically because we did not use Zend memory allocation functions.

If you are familiar with C++, you may wonder why we didn’t use something like std::unique_ptr:

std::unique_ptr<char[]> buffer{new char[1048576]};
call_userland_function_that_dies();
// buffer goes out of scope and gets automatically disposed of

The answer is because it does not really matter whether we use smart pointers here; when I said that the userland function may not return, I really meant this; the stack unwinding will not happen, and we still have a memory leak. I will return to this issue a bit later.

In C, we would use something like this:

char* buffer = emalloc(1048576);
call_userland_function_that_dies();
efree(buffer);

All memory allocated with emalloc() or ecalloc() is automatically freed by Zend Engine when the request finishes, and therefore there will be no memory leak (debug PHP builds do report memory leaks but that happens only when the script terminates normally, that is, without calls to die()/exit() or uncaught exceptions or fatal errors).

OK, for memory allocations we can use emalloc() and ecalloc() but what about std::string, std::vector etc?

The good news is that the STL supports custom allocators, and therefore we can easily write a emalloc()-based allocator that plays nice with Zend Engine:

extern "C" {
#include <Zend/zend.h>
}

#include <cstddef>
#include <memory>
#include <new>

/**
 * @brief <code>{{EJS11}}</code>-based allocator
 */
template<typename T>
class EMallocAllocator {
public:
    typedef T value_type;               ///< Value type

    /**@deprecated Definition for old compilers */
    typedef size_t size_type;           ///< Size type
    /**@deprecated Definition for old compilers */
    typedef ptrdiff_t difference_type;  ///< Difference type
    /**@deprecated Definition for old compilers */
    typedef T* pointer;                 ///< Pointer type
    /**@deprecated Definition for old compilers */
    typedef const T* const_pointer;     ///< Constant pointer type
    /**@deprecated Definition for old compilers */
    typedef T& reference;               ///< Reference type
    /**@deprecated Definition for old compilers */
    typedef const T& const_reference;   ///< Constant reference type

    /**
     * @brief Provides a way to obtain an allocator for a different type <code>{{EJS12}}</code>
     * @deprecated
     */
    template<class U>
    struct rebind { typedef EMallocAllocator<U> other; };

    /**
     * @brief Default constructor
     */
    EMallocAllocator(void) noexcept = default;

    /**
     * @brief Copy constructor
     */
    template<typename U>
    EMallocAllocator(const EMallocAllocator<U>&) noexcept
    {}

    /**
     * @brief Allocates <code>{{EJS13}}</code> bytes of uninitialized storage
     * @param cnt Number of objects to allocate
     * @return Pointer to the allocated storage
     *
     * Uses <code>{{EJS14}}</code> for memory allocation. Never returns <code>{{EJS15}}</code>
     * because a fatal error is thrown upon memory allocation failure
     */
    [[gnu::malloc, gnu::returns_nonnull]] T* allocate(std::size_t cnt)
    {
        return static_cast<T*>(safe_emalloc(cnt, sizeof(T), 0)); // Zend throws a fatal error on OOM
    }

    /**
     * @brief Deallocates the storage referenced by the pointer <code>{{EJS16}}</code>
     * @param p Storage to deallocate
     * @param cnt Number of the allocated objects
     * @warning <code>{{EJS17}}</code> must be a pointer obtained by an earlier call to <code>{{EJS18}}</code>
     */
    void deallocate(T* p, std::size_t cnt)
    {
        static_cast<void>(cnt);
        efree(p);
    }

    /**
     * @deprecated Included for compatibility with old compilers
     * @brief Constructs an object of type <code>{{EJS19}}</code> in allocated uninitialized storage pointed to by <code>{{EJS20}}</code>, using placement-new
     * @param p Allocated storage
     * @param args Constructor arguments
     */
    template<typename U, typename... Args>
    void construct(U* p, Args&&... args)
    {
        ::new(reinterpret_cast<void*>(p)) U(std::forward<Args>(args)...);
    }

    /**
     * @deprecated Included for compatibility with old compilers
     * @brief Calls the destructor of the object pointed to by p
     * @param p Object to be destroyed
     */
    template<class U>
    void destroy(U* p)
    {
        p->~U();
    }
};

/**
 * @brief Compares two allocators for equality
 * @return Always <code>{{EJS21}}</code> (allocators are stateless, thus two <code>{{EJS22}}</code>'s are equal)
 */
template<typename T, typename U>
static inline bool operator==(const EMallocAllocator<T>&, const EMallocAllocator<U>&) noexcept
{
    return true;
}

/**
 * @brief Compares two allocators for inequality
 * @return Always <code>{{EJS23}}</code> (allocators are stateless, thus two <code>{{EJS24}}</code>'s are equal)
 */
template<typename T, typename U>
static inline bool operator!=(const EMallocAllocator<T>& lhs, const EMallocAllocator<U>& rhs) noexcept
{
    return !operator==(lhs, rhs);
}

Now, if we need a string that uses our allocator, we can use something like this:

using estring = std::basic_string<char, std::char_traits<char>, EMallocAllocator<char> >;

estring some_string("This is a string");

Control Flow

Earlier I mentioned that stack unwinding does not always happen when the script terminates abnormally. Now I try to explain why this happens and what we can do.

Consider the following PHP function:

function fatal(string $s)
{
    echo '<strong>Something bad has happened: </strong>', $s, PHP_EOL;
    die();
}

What will happen when the interpreter executes this function?

die() translates to EXIT opcode which is handled this way:

ZEND_VM_HANDLER(79, ZEND_EXIT, CONST|TMPVAR|UNUSED|CV, ANY)
{
    USE_OPLINE

    SAVE_OPLINE();
    if (OP1_TYPE != IS_UNUSED) {
        /* handle die()'s argument: set the exit code or print the error message */
        /* skipped for brevity */
    }
    zend_bailout();
    ZEND_VM_NEXT_OPCODE(); /* Never reached */
}

That is, in the end, the handler calls zend_bailout() function.

zend_bailout() is a macro defined as _zend_bailout(__FILE__, __LINE__), which is

ZEND_API ZEND_COLD void _zend_bailout(const char *filename, uint32_t lineno) /* {{{ */
{

    if (!EG(bailout)) {
        zend_output_debug_string(1, "%s(%d) : Bailed out without a bailout address!", filename, lineno);
        exit(-1);
    }
    CG(unclean_shutdown) = 1;
    CG(active_class_entry) = NULL;
    CG(in_compilation) = 0;
    EG(current_execute_data) = NULL;
    LONGJMP(*EG(bailout), FAILURE);
}

LONGJMP is another macro that expands either to longjmp() or something compatible.

What does this mean for C++?

No destructors for automatic objects are called. If replacing of std::longjmp with throw and setjmp with catch would execute a non-trivial destructor for any automatic object, the behavior of such std::longjmp is undefined.

What is the proper way to handle such situations?

{
    // Save the original bailout address
    JMP_BUF* orig_bailout = EG(bailout);

    // Install our own
    JMP_BUF bailout;
    EG(bailout) = &bailout;

    /* This is the place to allocate all resources we may need */
    // ...
    /**/

    if (0 == SETJMP(bailout)) {
        // Execute our function

        // Finally, restore the original bailout address
        EG(bailout) = orig_bailout;

        // We return normally, all resources we have allocated above
        // will be automatically disposed of
        return;
    }

    // We land here if an error happened
    // Restore the original bailout address
    EG(bailout) = orig_bailout;
} // All allocated resources go out of scope here and get destroyed normally.

// Pass the error above
_zend_bailout(const_cast<char*>(__FILE__), __LINE__);

The trick here is to intercept the error, clean up all resources we may have allocated, and rethrow the error.

Resources must be allocated before we do SETJMP and freed before we call _zend_bailout().

Restoration of EG(bailout) can be automated with a simple class, but this is left as an exercise to the reader.

Mix C and C++ headers

PHP/Zend extensively uses inline functions, and not always those functions are enclosed with extern "C". Because of name mangling, the internal function name for the C and C++ may differ. This is usually not an issue — until the compiler refuses to inline the function and insists on calling it by name. I have faced this several times with PHP 7.0 (maybe things got better with 7.1 or 7.2, I don’t know), and therefore I strongly recommend to include Zend/PHP headers this way:

extern "C" {
#include <Zend/zend.h>
#include <Zend/zend_API.h>
// Include other C headers here
}

// Include C++ headers here

C++ exceptions

C++ exceptions should not leak into PHP, as this will crash the interpreter. This is especially critical for multi-threaded SAPIs.

If an exception has to be passed to the userland, it should be converted into a PHP exception:

try {
    // The code that might throw an exception
}
catch (const std::exception& e) {
    zend_throw_exception(zend_ce_exception, e.what(), 0);
}

Link In the Standard C++ Library

By default, PHP’s build system uses a C compiler and does not link in the standard C++ library.

To tell it you need C++, you need to tweak your extension’s config.m4 file.

For example, it if looks this way for C:

PHP_ARG_ENABLE(my-extension, whether to enable my extension, [ --enable-my-extension  Enable my extension])

if test "$PHP_MY_EXTENSION" = "yes"; then
    AC_DEFINE([HAVE_MY_EXTENSION], [1], [Whether my extension is enabled])
    PHP_NEW_EXTENSION([myextension], [file.c], $ext_shared,, [-Wall])
fi

You will need to add PHP_REQUIRE_CXX() and pass true as the very last argument to PHP_NEW_EXTENSION:

PHP_ARG_ENABLE(my-extension, whether to enable my extension, [ --enable-my-extension  Enable my extension])

if test "$PHP_MY_EXTENSION" = "yes"; then
    PHP_REQUIRE_CXX()
    AC_DEFINE([HAVE_MY_EXTENSION], [1], [Whether my extension is enabled])
    PHP_NEW_EXTENSION([myextension], [file.cpp], $ext_shared,, [-Wall], true)
fi
PHP Extensions and C++
Tagged on:         

Leave a Reply

Your email address will not be published. Required fields are marked *