I wanted to play with Docker swarm on a local machine to test a couple of scenarios. The goal was to run three manager nodes, and three worker nodes. I did not want to use a virtual machine to run five nodes on my computer, and I decided to use LXD. When using LXC or LXD containers, I usually try to use Alpine Linux for its small size, unless there are specific requirements.

First, I initialized the swarm on my local machine:

$ docker swarm init --advertise-addr 192.168.88.98
Swarm initialized: current node (bgzm63dfx8clvnm1tfudvrqpp) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

$ docker swarm join-token manager

To add a manager to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-3ia42cf6wfemjf5y6c05jf47w 192.168.88.98:2377

Then I creates two manager nodes, and three worker nodes. To create a container, which will run Docker, you need to set security.nesting=true.

OK, here it goes:

lxc launch images:alpine/3.11/amd64 manager-1 -c security.nesting=true
lxc launch images:alpine/3.11/amd64 manager-2 -c security.nesting=true

lxc exec manager-1 apk add docker
lxc exec manager-2 apk add docker

lxc exec manager-1 -T -- /etc/init.d/docker restart
lxc exec manager-2 -T -- /etc/init.d/docker restart

lxc exec manager-1 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-3ia42cf6wfemjf5y6c05jf47w 192.168.88.98:2377
lxc exec manager-2 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-3ia42cf6wfemjf5y6c05jf47w 192.168.88.98:2377

lxc launch images:alpine/3.11/amd64 worker-1 -c security.nesting=true
lxc launch images:alpine/3.11/amd64 worker-2 -c security.nesting=true
lxc launch images:alpine/3.11/amd64 worker-3 -c security.nesting=true

lxc exec worker-1 apk add docker
lxc exec worker-2 apk add docker
lxc exec worker-3 apk add docker

lxc exec worker-1 -T -- /etc/init.d/docker restart
lxc exec worker-2 -T -- /etc/init.d/docker restart
lxc exec worker-3 -T -- /etc/init.d/docker restart

lxc exec worker-1 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377
lxc exec worker-2 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377
lxc exec worker-3 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377

Now, docker node ls shows something like this:

ID                            HOSTNAME                 STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
kipabebjta1lujxz28jeiacag     manager-1                Ready               Active              Reachable           19.03.5
xae8nsn3yd29wuxukvf6ef1og     manager-2                Ready               Active              Reachable           19.03.5
bgzm63dfx8clvnm1tfudvrqpp *   nostalgia-for-infinity   Ready               Active              Leader              19.03.6
9c0p941inuizp1lbyhgrh8k1o     worker-1                 Ready               Active                                  19.03.5
tiqp4tszcai5wljy2kst0p8w0     worker-2                 Ready               Active                                  19.03.5
u8b2vjpld2lx3jn6i0fe53w4l     worker-3                 Ready               Active                                  19.03.5

However, when I tried to deploy my stack into the swarm, I faced the problem: Docker was unable to deploy any services to LXC nodes because of the following error:

Error response from daemon: cgroups: cannot find cgroup mount destination: unknown.

To make sure that this is not related to the way I have configured my containers (e.g., it’s not an issue with AppArmor, etc), I have configured another container, this time it was Ubuntu-based:

lxc launch images:ubuntu/bionic/amd64 worker-4 -c security.nesting=true
lxc exec worker-4 apt update
lxc exec worker-4 apt install docker.io
lxc exec worker-4 docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377

That worked: some services got deployed to the Ubuntu worker. This means that the problem was somewhere in the Alpine 🙁

I started to dig deeper.

When starting docker (rc-service docker start), I noticed mount: permission denied errors:

~ # rc-service docker start
 * Caching service dependencies ...                                                                                                                                                                                                                                      [ ok ]
 * Mounting cgroup filesystem ...                                                                                                                                                                                                                                        [ ok ]
mount: permission denied (are you root?)
mount: permission denied (are you root?)
mount: permission denied (are you root?)
mount: permission denied (are you root?)
mount: permission denied (are you root?)
 * /var/log/docker.log: creating file
 * /var/log/docker.log: correcting owner
 * Starting docker ...                                                                                                                                                                                                                                                   [ ok ]

OK, let us see what mount | grep cgroup shows:

cgroup_root on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755,uid=300001,gid=300001)
none on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cpuset on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)
blkio on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
memory on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
devices on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
freezer on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
perf_event on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
hugetlb on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
pids on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
rdma on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)

…and what are the subdirectories in /sys/fs/cgroup/:

blkio
cpu
cpuacct
cpuset
devices
freezer
hugetlb
memory
net_cls
net_prio
openrc
perf_event
pids
rdma
unified

We see that cpu, cpuacct, net_cls, net_prio are not mounted. And indeed, if you try to mount any of them, you will get an error:

~ # mount -t cgroup cgroup /sys/fs/cgroup/cpu -o rw,nosuid,nodev,noexec,relatime,cpu
mount: permission denied (are you root?)

OK, now let us see how Ubuntu handles that:

$ lxc exec worker-4 bash
root@bionic:~# mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755,uid=300001,gid=300001)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)

We see that it combines net_cls and net_prio into a single thing, and does the same to cpu and cpuacct.

No problem, let us go back to Alpine and add these mounts:

mkdir /sys/fs/cgroup/cpu,cpuacct
mkdir /sys/fs/cgroup/net_cls,net_prio
mount -t cgroup cgroup /sys/fs/cgroup/cpu,cpuacct -o rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
mount -t cgroup cgroup /sys/fs/cgroup/net_cls,net_prio -o rw,nosuid,nodev,noexec,relatime,net_cls,net_prio

mount gave no “permission denied” errors; however, docker is still unable to launch any containers:

~ # docker run -it --rm alpine ash
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
cbdbe7a5bc2a: Pull complete
Digest: sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54
Status: Downloaded newer image for alpine:latest
docker: Error response from daemon: cgroups: cannot find cgroup mount destination: unknown.

When looking at /etc/init.d/cgroups, I saw the following piece of code:

        if ! mountinfo -q /sys/fs/cgroup/openrc; then
                local agent="${RC_LIBEXECDIR}/sh/cgroup-release-agent.sh"
                mkdir /sys/fs/cgroup/openrc
                mount -n -t cgroup \
                        -o none,${cgroup_opts},name=openrc,release_agent="$agent" \
                        openrc /sys/fs/cgroup/openrc
                printf 1 > /sys/fs/cgroup/openrc/notify_on_release
        fi

However, I did not see /sys/fs/cgroup/openrc in the mount list. And indeed, if I try to mount it manually, it fails with infamous “permission denied” error.

There was one unanswered question, and then another one that gave me a clue:

~ # cat /proc/1/cgroup
12:pids:/
11:rdma:/
10:hugetlb:/
9:devices:/
8:cpuset:/
7:cpu,cpuacct:/
6:freezer:/
5:net_cls,net_prio:/
4:memory:/
3:perf_event:/
2:blkio:/
1:name=systemd:/
0::/

So, we do not have name=openrc there, nor do we have separate cpu, cpuacct, net_cls, and net_prio (and now it makes it clear to me why Ubuntu used cpu,cpuacct and net_cls,net_prio.

OK, instead of

mount -n -t cgroup \
    -o 'none,nodev,noexec,nosuid,name=openrc,release_agent=/lib/rc/sh/cgroup-release-agent.sh' \
    openrc /sys/fs/cgroup/openrc

I tried

mount -n -t cgroup \
    -o 'none,nodev,noexec,nosuid,name=systemd,release_agent=/lib/rc/sh/cgroup-release-agent.sh' \
    openrc /sys/fs/cgroup/openrc

…and it worked!

I intentionally did not change paths under /sys/fs/cgroup in order not to break OpenRC’s cgroup-release-agent.sh.

Success!

So, what are the changes? After cgroups start, we need to run the following piece of code:

mkdir /sys/fs/cgroup/cpu,cpuacct
mkdir /sys/fs/cgroup/net_cls,net_prio
mount -t cgroup cgroup /sys/fs/cgroup/cpu,cpuacct -o rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
mount -t cgroup cgroup /sys/fs/cgroup/net_cls,net_prio -o rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
mount -n -t cgroup -o 'none,nodev,noexec,nosuid,name=systemd,release_agent=/lib/rc/sh/cgroup-release-agent.sh' openrc /sys/fs/cgroup/openrc

For the sake of simplicity I decided not to parse /proc/1/cgroups

OK, now let us create a service that runs these commands:

#!/sbin/openrc-run

description="Mount the control groups for Docker"

depend()
{
    keyword -docker
    need sysfs cgroups
}

start()
{
    if [ -d /sys/fs/cgroup ]; then
        mkdir -p /sys/fs/cgroup/cpu,cpuacct
        mkdir -p /sys/fs/cgroup/net_cls,net_prio

        mount -n -t cgroup cgroup /sys/fs/cgroup/cpu,cpuacct -o rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
        mount -n -t cgroup cgroup /sys/fs/cgroup/net_cls,net_prio -o rw,nosuid,nodev,noexec,relatime,net_cls,net_prio

        if ! mountinfo -q /sys/fs/cgroup/openrc; then
            local agent="${RC_LIBEXECDIR}/sh/cgroup-release-agent.sh"
            mkdir -p /sys/fs/cgroup/openrc
            mount -n -t cgroup -o none,nodev,noexec,nosuid,name=systemd,release_agent="$agent" openrc /sys/fs/cgroup/openrc
        fi
    fi

    return 0
}

Save this as /etc/init.d/cgroups-patch, then

chmod +x /etc/init.d/cgroups-patch
rc-update add cgroups-patch boot

and then reboot.

Once the container is up, docker run -it --rm alpine ash works.

How to Run Docker in Alpine Container in LXC/LXD
Tagged on:                 

Leave a Reply

Your email address will not be published. Required fields are marked *