I wanted to play with Docker Swarm on a local machine to test a couple of scenarios. The goal was to run three manager nodes and three worker nodes. I did not want to run five nodes on my computer as virtual machines, so I decided to use LXD. When using LXC or LXD containers, I usually try to use Alpine Linux for its small size, unless there are specific requirements.
First, I initialized the swarm on my local machine:
$ docker swarm init --advertise-addr 192.168.88.98
Swarm initialized: current node (bgzm63dfx8clvnm1tfudvrqpp) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
$ docker swarm join-token manager
To add a manager to this swarm, run the following command:
docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-3ia42cf6wfemjf5y6c05jf47w 192.168.88.98:2377
Then I created two manager nodes and three worker nodes. To create a container that runs Docker, set security.nesting=true.
OK, here it goes:
lxc launch images:alpine/3.11/amd64 manager-1 -c security.nesting=true lxc launch images:alpine/3.11/amd64 manager-2 -c security.nesting=true lxc exec manager-1 apk add docker lxc exec manager-2 apk add docker lxc exec manager-1 -T -- /etc/init.d/docker restart lxc exec manager-2 -T -- /etc/init.d/docker restart lxc exec manager-1 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-3ia42cf6wfemjf5y6c05jf47w 192.168.88.98:2377 lxc exec manager-2 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-3ia42cf6wfemjf5y6c05jf47w 192.168.88.98:2377 lxc launch images:alpine/3.11/amd64 worker-1 -c security.nesting=true lxc launch images:alpine/3.11/amd64 worker-2 -c security.nesting=true lxc launch images:alpine/3.11/amd64 worker-3 -c security.nesting=true lxc exec worker-1 apk add docker lxc exec worker-2 apk add docker lxc exec worker-3 apk add docker lxc exec worker-1 -T -- /etc/init.d/docker restart lxc exec worker-2 -T -- /etc/init.d/docker restart lxc exec worker-3 -T -- /etc/init.d/docker restart lxc exec worker-1 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377 lxc exec worker-2 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377 lxc exec worker-3 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377
Now, docker node ls shows something like this:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION kipabebjta1lujxz28jeiacag manager-1 Ready Active Reachable 19.03.5 xae8nsn3yd29wuxukvf6ef1og manager-2 Ready Active Reachable 19.03.5 bgzm63dfx8clvnm1tfudvrqpp * nostalgia-for-infinity Ready Active Leader 19.03.6 9c0p941inuizp1lbyhgrh8k1o worker-1 Ready Active 19.03.5 tiqp4tszcai5wljy2kst0p8w0 worker-2 Ready Active 19.03.5 u8b2vjpld2lx3jn6i0fe53w4l worker-3 Ready Active 19.03.5
However, when I tried to deploy my stack into the swarm, I faced the problem: Docker was unable to deploy any services to LXC nodes because of the following error:
Error response from daemon: cgroups: cannot find cgroup mount destination: unknown.
To make sure that this is not related to the way I have configured my containers (e.g., it’s not an issue with AppArmor, etc), I have configured another container, this time it was Ubuntu-based:
lxc launch images:ubuntu/bionic/amd64 worker-4 -c security.nesting=true lxc exec worker-4 apt update lxc exec worker-4 apt install docker.io lxc exec worker-4 docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377
That worked: some services got deployed to the Ubuntu worker. This means that the problem was somewhere in the Alpine 🙁
I started to dig deeper.
When starting Docker (rc-service docker start), I noticed mount: permission denied errors:
~ # rc-service docker start
* Caching service dependencies ... [ ok ]
* Mounting cgroup filesystem ... [ ok ]
mount: permission denied (are you root?)
mount: permission denied (are you root?)
mount: permission denied (are you root?)
mount: permission denied (are you root?)
mount: permission denied (are you root?)
* /var/log/docker.log: creating file
* /var/log/docker.log: correcting owner
* Starting docker ... [ ok ]
OK, let us see what mount | grep cgroup shows:
cgroup_root on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755,uid=300001,gid=300001) none on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime) cpuset on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children) blkio on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) memory on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) devices on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) freezer on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) perf_event on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) hugetlb on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) pids on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) rdma on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
…and what are the subdirectories in /sys/fs/cgroup/:
blkio
cpu
cpuacct
cpuset
devices
freezer
hugetlb
memory
net_cls
net_prio
openrc
perf_event
pids
rdma
unified
We see that cpu, cpuacct, net_cls, and net_prio are not mounted. And indeed, if you try to mount any of them, you will get an error:
~ # mount -t cgroup cgroup /sys/fs/cgroup/cpu -o rw,nosuid,nodev,noexec,relatime,cpu
mount: permission denied (are you root?)
OK, now let us see how Ubuntu handles that:
$ lxc exec worker-4 bash
root@bionic:~# mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755,uid=300001,gid=300001)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
We can see that it combines net_cls and net_prio into a single thing, and does the same to cpu and cpuacct.
No problem, let us go back to Alpine and add these mounts:
mkdir /sys/fs/cgroup/cpu,cpuacct mkdir /sys/fs/cgroup/net_cls,net_prio mount -t cgroup cgroup /sys/fs/cgroup/cpu,cpuacct -o rw,nosuid,nodev,noexec,relatime,cpu,cpuacct mount -t cgroup cgroup /sys/fs/cgroup/net_cls,net_prio -o rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
mount gave no “permission denied” errors; however, Docker is still unable to launch any containers:
~ # docker run -it --rm alpine ash
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
cbdbe7a5bc2a: Pull complete
Digest: sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54
Status: Downloaded newer image for alpine:latest
docker: Error response from daemon: cgroups: cannot find cgroup mount destination: unknown.
When looking at /etc/init.d/cgroups, I saw the following piece of code:
if ! mountinfo -q /sys/fs/cgroup/openrc; then
local agent="${RC_LIBEXECDIR}/sh/cgroup-release-agent.sh"
mkdir /sys/fs/cgroup/openrc
mount -n -t cgroup \
-o none,${cgroup_opts},name=openrc,release_agent="$agent" \
openrc /sys/fs/cgroup/openrc
printf 1 > /sys/fs/cgroup/openrc/notify_on_release
fi
However, I did not see /sys/fs/cgroup/openrc in the mount list. And indeed, if I try to mount it manually, it fails with the infamous “permission denied” error.
There was one unanswered question, and then another one that gave me the clue:
~ # cat /proc/1/cgroup
12:pids:/
11:rdma:/
10:hugetlb:/
9:devices:/
8:cpuset:/
7:cpu,cpuacct:/
6:freezer:/
5:net_cls,net_prio:/
4:memory:/
3:perf_event:/
2:blkio:/
1:name=systemd:/
0::/
So, we do not have name=openrc there, nor do we have separate cpu, cpuacct, net_cls, and net_prio (and now it makes it clear to me why Ubuntu used cpu,cpuacct and net_cls,net_prio.
OK, instead of
mount -n -t cgroup \
-o 'none,nodev,noexec,nosuid,name=openrc,release_agent=/lib/rc/sh/cgroup-release-agent.sh' \
openrc /sys/fs/cgroup/openrc
I tried
mount -n -t cgroup \
-o 'none,nodev,noexec,nosuid,name=systemd,release_agent=/lib/rc/sh/cgroup-release-agent.sh' \
openrc /sys/fs/cgroup/openrc
…and it worked!
I intentionally did not change paths under /sys/fs/cgroup to avoid breaking OpenRC’s cgroup-release-agent.sh.
Success!
So, what are the changes? After cgroups start, we need to run the following piece of code:
mkdir /sys/fs/cgroup/cpu,cpuacct mkdir /sys/fs/cgroup/net_cls,net_prio mount -t cgroup cgroup /sys/fs/cgroup/cpu,cpuacct -o rw,nosuid,nodev,noexec,relatime,cpu,cpuacct mount -t cgroup cgroup /sys/fs/cgroup/net_cls,net_prio -o rw,nosuid,nodev,noexec,relatime,net_cls,net_prio mount -n -t cgroup -o 'none,nodev,noexec,nosuid,name=systemd,release_agent=/lib/rc/sh/cgroup-release-agent.sh' openrc /sys/fs/cgroup/openrc
For the sake of simplicity, I decided not to parse /proc/1/cgroups.
OK, now let us create a service that runs these commands:
#!/sbin/openrc-run
description="Mount the control groups for Docker"
depend()
{
keyword -docker
need sysfs cgroups
}
start()
{
if [ -d /sys/fs/cgroup ]; then
mkdir -p /sys/fs/cgroup/cpu,cpuacct
mkdir -p /sys/fs/cgroup/net_cls,net_prio
mount -n -t cgroup cgroup /sys/fs/cgroup/cpu,cpuacct -o rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
mount -n -t cgroup cgroup /sys/fs/cgroup/net_cls,net_prio -o rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
if ! mountinfo -q /sys/fs/cgroup/openrc; then
local agent="${RC_LIBEXECDIR}/sh/cgroup-release-agent.sh"
mkdir -p /sys/fs/cgroup/openrc
mount -n -t cgroup -o none,nodev,noexec,nosuid,name=systemd,release_agent="$agent" openrc /sys/fs/cgroup/openrc
fi
fi
return 0
}
Save this as /etc/init.d/cgroups-patch, then
chmod +x /etc/init.d/cgroups-patch rc-update add cgroups-patch boot
and then reboot.
Once the container is up, docker run -it --rm alpine ash works.