I wanted to play with Docker swarm on a local machine to test a couple of scenarios. The goal was to run three manager nodes, and three worker nodes. I did not want to use a virtual machine to run five nodes on my computer, and I decided to use LXD. When using LXC or LXD containers, I usually try to use Alpine Linux for its small size, unless there are specific requirements.
First, I initialized the swarm on my local machine:
$ docker swarm init --advertise-addr 192.168.88.98 Swarm initialized: current node (bgzm63dfx8clvnm1tfudvrqpp) is now a manager. To add a worker to this swarm, run the following command: docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions. $ docker swarm join-token manager To add a manager to this swarm, run the following command: docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-3ia42cf6wfemjf5y6c05jf47w 192.168.88.98:2377
Then I creates two manager nodes, and three worker nodes. To create a container, which will run Docker, you need to set security.nesting=true
.
OK, here it goes:
lxc launch images:alpine/3.11/amd64 manager-1 -c security.nesting=true lxc launch images:alpine/3.11/amd64 manager-2 -c security.nesting=true lxc exec manager-1 apk add docker lxc exec manager-2 apk add docker lxc exec manager-1 -T -- /etc/init.d/docker restart lxc exec manager-2 -T -- /etc/init.d/docker restart lxc exec manager-1 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-3ia42cf6wfemjf5y6c05jf47w 192.168.88.98:2377 lxc exec manager-2 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-3ia42cf6wfemjf5y6c05jf47w 192.168.88.98:2377 lxc launch images:alpine/3.11/amd64 worker-1 -c security.nesting=true lxc launch images:alpine/3.11/amd64 worker-2 -c security.nesting=true lxc launch images:alpine/3.11/amd64 worker-3 -c security.nesting=true lxc exec worker-1 apk add docker lxc exec worker-2 apk add docker lxc exec worker-3 apk add docker lxc exec worker-1 -T -- /etc/init.d/docker restart lxc exec worker-2 -T -- /etc/init.d/docker restart lxc exec worker-3 -T -- /etc/init.d/docker restart lxc exec worker-1 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377 lxc exec worker-2 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377 lxc exec worker-3 -- docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377
Now, docker node ls
shows something like this:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION kipabebjta1lujxz28jeiacag manager-1 Ready Active Reachable 19.03.5 xae8nsn3yd29wuxukvf6ef1og manager-2 Ready Active Reachable 19.03.5 bgzm63dfx8clvnm1tfudvrqpp * nostalgia-for-infinity Ready Active Leader 19.03.6 9c0p941inuizp1lbyhgrh8k1o worker-1 Ready Active 19.03.5 tiqp4tszcai5wljy2kst0p8w0 worker-2 Ready Active 19.03.5 u8b2vjpld2lx3jn6i0fe53w4l worker-3 Ready Active 19.03.5
However, when I tried to deploy my stack into the swarm, I faced the problem: Docker was unable to deploy any services to LXC nodes because of the following error:
Error response from daemon: cgroups: cannot find cgroup mount destination: unknown.
To make sure that this is not related to the way I have configured my containers (e.g., it’s not an issue with AppArmor, etc), I have configured another container, this time it was Ubuntu-based:
lxc launch images:ubuntu/bionic/amd64 worker-4 -c security.nesting=true lxc exec worker-4 apt update lxc exec worker-4 apt install docker.io lxc exec worker-4 docker swarm join --token SWMTKN-1-08noco12oi85n0v8mcbk9pphflmpnuap6w7jicah0zsbjqwc75-cnwlgyertaslaphko0ki079xc 192.168.88.98:2377
That worked: some services got deployed to the Ubuntu worker. This means that the problem was somewhere in the Alpine 🙁
I started to dig deeper.
When starting docker (rc-service docker start
), I noticed mount: permission denied errors:
~ # rc-service docker start * Caching service dependencies ... [ ok ] * Mounting cgroup filesystem ... [ ok ] mount: permission denied (are you root?) mount: permission denied (are you root?) mount: permission denied (are you root?) mount: permission denied (are you root?) mount: permission denied (are you root?) * /var/log/docker.log: creating file * /var/log/docker.log: correcting owner * Starting docker ... [ ok ]
OK, let us see what mount | grep cgroup
shows:
cgroup_root on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755,uid=300001,gid=300001) none on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime) cpuset on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children) blkio on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) memory on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) devices on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) freezer on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) perf_event on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) hugetlb on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) pids on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) rdma on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
…and what are the subdirectories in /sys/fs/cgroup/
:
blkio cpu cpuacct cpuset devices freezer hugetlb memory net_cls net_prio openrc perf_event pids rdma unified
We see that cpu
, cpuacct
, net_cls
, net_prio
are not mounted. And indeed, if you try to mount any of them, you will get an error:
~ # mount -t cgroup cgroup /sys/fs/cgroup/cpu -o rw,nosuid,nodev,noexec,relatime,cpu mount: permission denied (are you root?)
OK, now let us see how Ubuntu handles that:
$ lxc exec worker-4 bash root@bionic:~# mount | grep cgroup tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755,uid=300001,gid=300001) cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
We see that it combines net_cls
and net_prio
into a single thing, and does the same to cpu
and cpuacct
.
No problem, let us go back to Alpine and add these mounts:
mkdir /sys/fs/cgroup/cpu,cpuacct mkdir /sys/fs/cgroup/net_cls,net_prio mount -t cgroup cgroup /sys/fs/cgroup/cpu,cpuacct -o rw,nosuid,nodev,noexec,relatime,cpu,cpuacct mount -t cgroup cgroup /sys/fs/cgroup/net_cls,net_prio -o rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
mount
gave no “permission denied” errors; however, docker is still unable to launch any containers:
~ # docker run -it --rm alpine ash Unable to find image 'alpine:latest' locally latest: Pulling from library/alpine cbdbe7a5bc2a: Pull complete Digest: sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54 Status: Downloaded newer image for alpine:latest docker: Error response from daemon: cgroups: cannot find cgroup mount destination: unknown.
When looking at /etc/init.d/cgroups
, I saw the following piece of code:
if ! mountinfo -q /sys/fs/cgroup/openrc; then local agent="${RC_LIBEXECDIR}/sh/cgroup-release-agent.sh" mkdir /sys/fs/cgroup/openrc mount -n -t cgroup \ -o none,${cgroup_opts},name=openrc,release_agent="$agent" \ openrc /sys/fs/cgroup/openrc printf 1 > /sys/fs/cgroup/openrc/notify_on_release fi
However, I did not see /sys/fs/cgroup/openrc
in the mount list. And indeed, if I try to mount it manually, it fails with infamous “permission denied” error.
There was one unanswered question, and then another one that gave me a clue:
~ # cat /proc/1/cgroup 12:pids:/ 11:rdma:/ 10:hugetlb:/ 9:devices:/ 8:cpuset:/ 7:cpu,cpuacct:/ 6:freezer:/ 5:net_cls,net_prio:/ 4:memory:/ 3:perf_event:/ 2:blkio:/ 1:name=systemd:/ 0::/
So, we do not have name=openrc
there, nor do we have separate cpu
, cpuacct
, net_cls
, and net_prio
(and now it makes it clear to me why Ubuntu used cpu,cpuacct
and net_cls,net_prio
.
OK, instead of
mount -n -t cgroup \ -o 'none,nodev,noexec,nosuid,name=openrc,release_agent=/lib/rc/sh/cgroup-release-agent.sh' \ openrc /sys/fs/cgroup/openrc
I tried
mount -n -t cgroup \ -o 'none,nodev,noexec,nosuid,name=systemd,release_agent=/lib/rc/sh/cgroup-release-agent.sh' \ openrc /sys/fs/cgroup/openrc
…and it worked!
I intentionally did not change paths under /sys/fs/cgroup
in order not to break OpenRC’s cgroup-release-agent.sh
.
Success!
So, what are the changes? After cgroups
start, we need to run the following piece of code:
mkdir /sys/fs/cgroup/cpu,cpuacct mkdir /sys/fs/cgroup/net_cls,net_prio mount -t cgroup cgroup /sys/fs/cgroup/cpu,cpuacct -o rw,nosuid,nodev,noexec,relatime,cpu,cpuacct mount -t cgroup cgroup /sys/fs/cgroup/net_cls,net_prio -o rw,nosuid,nodev,noexec,relatime,net_cls,net_prio mount -n -t cgroup -o 'none,nodev,noexec,nosuid,name=systemd,release_agent=/lib/rc/sh/cgroup-release-agent.sh' openrc /sys/fs/cgroup/openrc
For the sake of simplicity I decided not to parse /proc/1/cgroups
OK, now let us create a service that runs these commands:
#!/sbin/openrc-run description="Mount the control groups for Docker" depend() { keyword -docker need sysfs cgroups } start() { if [ -d /sys/fs/cgroup ]; then mkdir -p /sys/fs/cgroup/cpu,cpuacct mkdir -p /sys/fs/cgroup/net_cls,net_prio mount -n -t cgroup cgroup /sys/fs/cgroup/cpu,cpuacct -o rw,nosuid,nodev,noexec,relatime,cpu,cpuacct mount -n -t cgroup cgroup /sys/fs/cgroup/net_cls,net_prio -o rw,nosuid,nodev,noexec,relatime,net_cls,net_prio if ! mountinfo -q /sys/fs/cgroup/openrc; then local agent="${RC_LIBEXECDIR}/sh/cgroup-release-agent.sh" mkdir -p /sys/fs/cgroup/openrc mount -n -t cgroup -o none,nodev,noexec,nosuid,name=systemd,release_agent="$agent" openrc /sys/fs/cgroup/openrc fi fi return 0 }
Save this as /etc/init.d/cgroups-patch
, then
chmod +x /etc/init.d/cgroups-patch rc-update add cgroups-patch boot
and then reboot.
Once the container is up, docker run -it --rm alpine ash
works.