Multi-stage builds are a great way to keep the size of the resulting image down. They are extremely useful if you want to use scratch
-based images for your application.
The common usage pattern for multi-stage builds is as follows:
FROM alpine:3.13 AS build WORKDIR /build RUN apk --no-cache add build-base build-dependencies-go-here COPY source.c source.c RUN gcc -static source.c -o binary FROM scratch COPY --from=build /build/binary /binary CMD ["/binary"]
This creates two images: the intermediate one (“build”) and the “real” one. The first one contains the build system, all necessary compile-time dependencies, the source code, etc. The destination image contains only the resulting binary.
The great thing about multi-stage builds is that you can sacrifice the size of the intermediate image for Dockerfile readability and better usage of the build cache: you don’t need to run everything in a single RUN
instruction. You can create as many layers as necessary. For example, one layer can contain the build system, the second one — build dependencies, the third one — sources, the fourth one — intermediate build files, the fifth one — the linked binary. If something fails during the compilation stage, Docker will still use the cache to avoid downloading the build system and compile-time dependencies.
However, I have recently found a limitation of copying files between the stages: COPY
ignores all extended file attributes.
First of all, why are they useful and why you may want to use them? In 90% of cases, you probably don’t need them, but when you do, this is probably because of the security requirements.
If your application needs for some reasons use the host networking and you need to bind to a privileged port, you usually to need your application as root
. After initialization is complete, such applications usually drop their privileges and switch to a non-privileged user account. However, quite often, you can do without root privileges. Linux has a concept of capabilities, which can be independently enabled or disabled. If the only privileged thing your application needs is to bind to a privileged port, it could be enough to have just CAP_NET_BIND_SERVICE
capability and don’t run as root
. But how is this possible?
There is a setcap
program that can set file capabilities. For example, to set the CAP_NET_BIND_SERVICE
capability, you can run it like this:
setcap cap_net_bind_service=ep your-binary
Under the hood, setcap
uses cap_set_file()
to set the capabilities; cap_set_file()
stores them in the extended file attributes under the security.capability
key.
Consider the following Dockerfile:
FROM alpine:3.13 AS build RUN apk add --no-cache libcap RUN touch /test.txt RUN setcap cap_net_bind_service=ep /test.txt RUN getcap /test.txt FROM alpine:3.13 as target RUN apk add --no-cache libcap COPY --from=build /test.txt /test.txt RUN getcap /test.txt
The build stage creates test.txt
, sets the CAP_NET_BIND_SERVICE
capability, and verifies that the capability has been set.
The target stage copies the file from the build stage and verifies that the capability is still there.
If you build the image, you will see something like this:
Step 1/9 : FROM alpine:3.13 AS build ---> 6dbb9cc54074 Step 2/9 : RUN apk add --no-cache libcap ---> Running in ba84c379d829 fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/x86_64/APKINDEX.tar.gz fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/x86_64/APKINDEX.tar.gz (1/1) Installing libcap (2.46-r0) Executing busybox-1.32.1-r6.trigger OK: 6 MiB in 15 packages Removing intermediate container ba84c379d829 ---> 63981d74e1c7 Step 3/9 : RUN touch /test.txt ---> Running in 88f7a0a68ffd Removing intermediate container 88f7a0a68ffd ---> 3141f87d2515 Step 4/9 : RUN setcap cap_net_bind_service=ep /test.txt ---> Running in ac5ef21dd5ab Removing intermediate container ac5ef21dd5ab ---> b4b644cb8895 Step 5/9 : RUN getcap /test.txt ---> Running in ca6e66b96b62 /test.txt cap_net_bind_service=ep Removing intermediate container ca6e66b96b62 ---> b6f1e8d41018 Step 6/9 : FROM alpine:3.13 as target ---> 6dbb9cc54074 Step 7/9 : RUN apk add --no-cache libcap ---> Running in 66416797eca3 fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/x86_64/APKINDEX.tar.gz fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/x86_64/APKINDEX.tar.gz (1/1) Installing libcap (2.46-r0) Executing busybox-1.32.1-r6.trigger OK: 6 MiB in 15 packages Removing intermediate container 66416797eca3 ---> a830ccb62d9a Step 8/9 : COPY --from=build /test.txt /test.txt ---> 81e226b3ed96 Step 9/9 : RUN getcap /test.txt ---> Running in b9f4e8c817a8 Removing intermediate container b9f4e8c817a8 ---> c8d490215c87 Successfully built c8d490215c87
You can see that the capabilities (and extended attributes generally) are not preserved when you COPY
files across the stages.
So, what are the choices?
- Use Buildx or BuildKit. With BuildKit enables,
COPY
does transfer all file attributes. For example, if you run the build command asdocker buildx build --progress=plain .
for the above Dockerfile, the output will look like this:#1 [internal] load build definition from Dockerfile #1 transferring dockerfile: 298B done #1 DONE 0.1s #2 [internal] load .dockerignore #2 transferring context: 2B done #2 DONE 0.0s #3 [internal] load metadata for docker.io/library/alpine:3.13 #3 DONE 2.0s #4 [build 1/5] FROM docker.io/library/alpine:3.13@sha256:69e70a79f2d41ab5d6... #4 resolve docker.io/library/alpine:3.13@sha256:69e70a79f2d41ab5d637de98c1e0b055206ba40a8145e7bddb55ccc04e13cf8f done #4 sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba 0B / 2.81MB 0.2s #4 sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba 1.05MB / 2.81MB 0.3s #4 sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba 2.81MB / 2.81MB 0.5s #4 sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba 2.81MB / 2.81MB 0.6s done #4 extracting sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba #4 extracting sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba 0.3s done #4 DONE 1.0s #5 [build 2/5] RUN apk add --no-cache libcap #5 0.538 fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/x86_64/APKINDEX.tar.gz #5 1.159 fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/x86_64/APKINDEX.tar.gz #5 1.798 (1/1) Installing libcap (2.46-r0) #5 1.838 Executing busybox-1.32.1-r6.trigger #5 1.842 OK: 6 MiB in 15 packages #5 DONE 2.3s #6 [build 3/5] RUN touch /test.txt #6 DONE 0.2s #7 [build 4/5] RUN setcap cap_net_bind_service=ep /test.txt #7 DONE 0.2s #8 [build 5/5] RUN getcap /test.txt #8 0.070 /test.txt cap_net_bind_service=ep #8 DONE 0.2s #9 [target 3/4] COPY --from=build /test.txt /test.txt #9 DONE 0.1s #10 [target 4/4] RUN getcap /test.txt #10 0.076 /test.txt cap_net_bind_service=ep #10 DONE 0.2s
- If you cannot use BuildKit or Buildx (this is the case with our CI/CD system: devs cannot update Docker, ops won’t), you will have to set the extended attributes in the target image. This probably means that you can’t use the
scratch
image anymore. But you still can use thebusybox
base image (this will add 1MB overhead) andsetfattr
command. After that, to remove all unwanted binaries from your image, you can run something likecd /bin && busybox --list | busybox xargs busybox rm && busybox rm getconf && busybox rm busybox
This will not reduce the size of the final image. Still, if an attacker finds and exploits an unknown vulnerability in your application, they will not have any tools available to download and run exploits to compromise the system further.
- If you must use the
scratch
base image, and all you need is capabilities, I have written a small static tool,setcap-static
(available as a Docker image). The use case is as follows:# ... FROM scratch COPY --from=wildwildangel/setcap-static /setcap-static /!setcap-static COPY --from=build /build/your-binary /your-binary RUN ["/!setcap-static", "cap_net_bind_service=ep", "/your-binary"]
If you copy
setcap-static
to!setcap-static
and run it from the root directory, it will automatically remove itself after setting the privileges.This will also add two layers of overhead (one for
setcap-static
binary, circa 50 KiB depending on the image architecture, and the other layer for the extended attributes; unfortunately, the second layer will be as big as you binary — but the same is true for whichever technique you use to modify file attributes).