Description
openedon Aug 26, 2016
This proposal was originally from #24711 (comment).
With several changes to the image format, the scope of build caching has been limited to local nodes. This is problematic for architectures which dispatch builds to arbitrary nodes, since pulling new images and data will not fulfill the build cache.
The main concern here is Cache poisoning. The worst part about is that it is not at all obvious that you are affected or protected. It can only be mitigated by limiting the horizon of data that one trusts.
Anything that circumvents that protection, even docker save/load
, is going to open your infrastructure up to injection of malicious content. The proposal in #20316 and previous proposals have not addressed this problem. While we all want fast builds (and I really do), introducing cache poisoning to the build step of the infrastructure must be avoided. Could you imagine the impact if someone could just inject a malicious layer into library/ubuntu
or library/alpine
?
The other aspect to this is the misapplied assumption about the idempotence of shell commands. apt-get update
run twice is never guaranteed to have the same result. Ever. That is just not how it works. If you have a build cache that is never purged, you will never update your upstream software. That may or may not be the intent. Even worse, if this build cache gets filled in with remote data, you probably have no visibility into when that command was run.
The underlying problem here is that with 1.10 changes in the image format, we no longer restore the parent image chain when pulling from a registry. As such, a proper solution to this problem involves something that can control the level of trust for content to a distributed build cache.
Let's look at how we build an image, with FROM alpine
at the top:
docker pull mysuperapp:v0
docker build -t mysuperapp:v1 .
In this simple case, we cannot assume that a remote mysuperapp:v0
and the ongoing build are related, since that would possibly introduce the cache poisoning scenario that we need to avoid. However, one may have local registry infrastructure that they know they can trust. While we can infer parentage (despite other assertions, this is still possible), we may not be able trust that parentage from a build caching perspective from all registries. But, this build environment is special.
What better way than to tell the build process process that you can trust a related image?
docker build --cache-from mysuperapp:v0 -t mysuperapp:v1 .
The above would allow Dockerfile
commands to be satisfied from the entries of mysuperapp:v0
in the build of mysuperapp:v1
. Job done!
No! We still have a problem. Now, my build system has to know tag lineage (mysuperapp:v0<
mysuperapp:v1`). Let's modify the meaning the tagless reference to mean something slightly different:
docker pull -a mysuperapp
docker build --cache-from mysuperapp -t mysuperapp:v1 .
In the above, we pull all the tags from mysuperapp
, any layer of which can satisfy the build cache. In practice, this probably is a little wide for most scenarios, so we can allow multiple --cache-from
directives on the command line:
docker build --cache-from mysuperapp:v0 docker build --cache-from mysuperapp:v1 -t mysuperapp:v2 .
There are many possibilities here to make this more flexible, such as running a registry service specially for the purpose of build caching mybuildcache.internal/mysuperapp
. Did you know that you can just run a registry and rsync the filesystem around without locking? You can also rsync from multiple sources and merge the result safely (kind of). Such a registry can be purged periodically (or some one could submit a PR to purge old data).
We can take this even further, but I hope the point is brought home. This is probably less convenient that the original behavior, but it is a good trade off. It leverages the existing infrastructure and has the possibility of being extended as use cases change.
Closes #18924.