Multi-stage Docker Builds and Extended File Attributes

Multi-stage builds are a great way to keep the size of the resulting image down. They are extremely useful if you want to use scratch-based images for your application.

The common usage pattern for multi-stage builds is as follows:

FROM alpine:3.13 AS build
WORKDIR /build
RUN apk --no-cache add build-base build-dependencies-go-here
COPY source.c source.c
RUN gcc -static source.c -o binary

FROM scratch
COPY --from=build /build/binary /binary
CMD ["/binary"]

This creates two images: the intermediate one (“build”) and the “real” one. The first one contains the build system, all necessary compile-time dependencies, the source code, etc. The destination image contains only the resulting binary.

The great thing about multi-stage builds is that you can sacrifice the size of the intermediate image for Dockerfile readability and better usage of the build cache: you don’t need to run everything in a single RUN instruction. You can create as many layers as necessary. For example, one layer can contain the build system, the second one — build dependencies, the third one — sources, the fourth one — intermediate build files, the fifth one — the linked binary. If something fails during the compilation stage, Docker will still use the cache to avoid downloading the build system and compile-time dependencies.

However, I have recently found a limitation of copying files between the stages: COPY ignores all extended file attributes.

First of all, why are they useful and why you may want to use them? In 90% of cases, you probably don’t need them, but when you do, this is probably because of the security requirements.

If your application needs for some reasons use the host networking and you need to bind to a privileged port, you usually to need your application as root. After initialization is complete, such applications usually drop their privileges and switch to a non-privileged user account. However, quite often, you can do without root privileges. Linux has a concept of capabilities, which can be independently enabled or disabled. If the only privileged thing your application needs is to bind to a privileged port, it could be enough to have just CAP_NET_BIND_SERVICE capability and don’t run as root. But how is this possible?

There is a setcap program that can set file capabilities. For example, to set the CAP_NET_BIND_SERVICE capability, you can run it like this:

setcap cap_net_bind_service=ep your-binary

Under the hood, setcap uses cap_set_file() to set the capabilities; cap_set_file() stores them in the extended file attributes under the security.capability key.

Consider the following Dockerfile:

FROM alpine:3.13 AS build
RUN apk add --no-cache libcap
RUN touch /test.txt
RUN setcap cap_net_bind_service=ep /test.txt
RUN getcap /test.txt

FROM alpine:3.13 as target
RUN apk add --no-cache libcap
COPY --from=build /test.txt /test.txt
RUN getcap /test.txt

The build stage creates test.txt, sets the CAP_NET_BIND_SERVICE capability, and verifies that the capability has been set.

The target stage copies the file from the build stage and verifies that the capability is still there.

If you build the image, you will see something like this:

Step 1/9 : FROM alpine:3.13 AS build
 ---> 6dbb9cc54074
Step 2/9 : RUN apk add --no-cache libcap
 ---> Running in ba84c379d829
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/x86_64/APKINDEX.tar.gz
(1/1) Installing libcap (2.46-r0)
Executing busybox-1.32.1-r6.trigger
OK: 6 MiB in 15 packages
Removing intermediate container ba84c379d829
 ---> 63981d74e1c7
Step 3/9 : RUN touch /test.txt
 ---> Running in 88f7a0a68ffd
Removing intermediate container 88f7a0a68ffd
 ---> 3141f87d2515
Step 4/9 : RUN setcap cap_net_bind_service=ep /test.txt
 ---> Running in ac5ef21dd5ab
Removing intermediate container ac5ef21dd5ab
 ---> b4b644cb8895
Step 5/9 : RUN getcap /test.txt
 ---> Running in ca6e66b96b62
/test.txt cap_net_bind_service=ep
Removing intermediate container ca6e66b96b62
 ---> b6f1e8d41018
Step 6/9 : FROM alpine:3.13 as target
 ---> 6dbb9cc54074
Step 7/9 : RUN apk add --no-cache libcap
 ---> Running in 66416797eca3
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/x86_64/APKINDEX.tar.gz
(1/1) Installing libcap (2.46-r0)
Executing busybox-1.32.1-r6.trigger
OK: 6 MiB in 15 packages
Removing intermediate container 66416797eca3
 ---> a830ccb62d9a
Step 8/9 : COPY --from=build /test.txt /test.txt
 ---> 81e226b3ed96
Step 9/9 : RUN getcap /test.txt
 ---> Running in b9f4e8c817a8
Removing intermediate container b9f4e8c817a8
 ---> c8d490215c87
Successfully built c8d490215c87

You can see that the capabilities (and extended attributes generally) are not preserved when you COPY files across the stages.

So, what are the choices?

Use Buildx or BuildKit. With BuildKit enables, COPY does transfer all file attributes. For example, if you run the build command as docker buildx build --progress=plain . for the above Dockerfile, the output will look like this:

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 298B done
#1 DONE 0.1s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/alpine:3.13
#3 DONE 2.0s

#4 [build 1/5] FROM docker.io/library/alpine:3.13@sha256:69e70a79f2d41ab5d6...
#4 resolve docker.io/library/alpine:3.13@sha256:69e70a79f2d41ab5d637de98c1e0b055206ba40a8145e7bddb55ccc04e13cf8f done
#4 sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba 0B / 2.81MB 0.2s
#4 sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba 1.05MB / 2.81MB 0.3s
#4 sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba 2.81MB / 2.81MB 0.5s
#4 sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba 2.81MB / 2.81MB 0.6s done
#4 extracting sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba
#4 extracting sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba 0.3s done
#4 DONE 1.0s

#5 [build 2/5] RUN apk add --no-cache libcap
#5 0.538 fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/x86_64/APKINDEX.tar.gz
#5 1.159 fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/x86_64/APKINDEX.tar.gz
#5 1.798 (1/1) Installing libcap (2.46-r0)
#5 1.838 Executing busybox-1.32.1-r6.trigger
#5 1.842 OK: 6 MiB in 15 packages
#5 DONE 2.3s

#6 [build 3/5] RUN touch /test.txt
#6 DONE 0.2s

#7 [build 4/5] RUN setcap cap_net_bind_service=ep /test.txt
#7 DONE 0.2s

#8 [build 5/5] RUN getcap /test.txt
#8 0.070 /test.txt cap_net_bind_service=ep
#8 DONE 0.2s

#9 [target 3/4] COPY --from=build /test.txt /test.txt
#9 DONE 0.1s

#10 [target 4/4] RUN getcap /test.txt
#10 0.076 /test.txt cap_net_bind_service=ep
#10 DONE 0.2s

If you cannot use BuildKit or Buildx (this is the case with our CI/CD system: devs cannot update Docker, ops won’t), you will have to set the extended attributes in the target image. This probably means that you can’t use the scratch image anymore. But you still can use the busybox base image (this will add 1MB overhead) and setfattr command. After that, to remove all unwanted binaries from your image, you can run something like
```
cd /bin && busybox --list | busybox xargs busybox rm && busybox rm getconf && busybox rm busybox
```
This will not reduce the size of the final image. Still, if an attacker finds and exploits an unknown vulnerability in your application, they will not have any tools available to download and run exploits to compromise the system further.
If you must use the scratch base image, and all you need is capabilities, I have written a small static tool, setcap-static (available as a Docker image). The use case is as follows:
```
# ...

FROM scratch
COPY --from=wildwildangel/setcap-static /setcap-static /!setcap-static
COPY --from=build /build/your-binary /your-binary
RUN ["/!setcap-static", "cap_net_bind_service=ep", "/your-binary"]
```
If you copy setcap-static to !setcap-static and run it from the root directory, it will automatically remove itself after setting the privileges.

This will also add two layers of overhead (one for setcap-static binary, circa 50 KiB depending on the image architecture, and the other layer for the extended attributes; unfortunately, the second layer will be as big as you binary — but the same is true for whichever technique you use to modify file attributes).

Multi-stage Docker Builds and Extended File Attributes

Wild Wild Wolf

Multi-stage Docker Builds and Extended File Attributes

Leave a Reply Cancel reply