Updated Docker Images for R
In an earlier post I wrote about Docker images for R.
While these images work as expected, they are (IMO) quite large – the base R 3.5.0 is for example about 530 MB.
It is of course possible to install R in a smaller base image like Alpine Linux, but several popular packages have system requirements on Linux that are easily installed on the larger distributions, but difficult to install on smaller distributions.
To get a list of installed programs sorted by size run the following command:
docker run --rm r-base:3.5.0 dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -n
The top 7 in this output is as follows:
24375 g++-7
25997 gcc-7
31070 libicu60
40421 libopenblas-base
52155 libopenblas-dev
134299 libgl1-mesa-dri
179065 openjdk-11-jre-headless
Quite peculiarly the openjdk-11-jre-headless
takes up a lot of space despite being uninstalled after compiling R.
But this figure also turns out to be misleading:
After removing openjdk-11-jre-headless
later it turns out to take up approximately no space.
The package libgl1-mesa-dri
is for graphics.
This is not necessary if we only intend to use the image for computations, but if we later wish to use it a the foundation for e.g. an image with Shiny server it is needed.
The libopenblas
packages provide basic linear algebra subprograms and is the foundation for all math in R – it is unavoidable.
The package libicu60
is for handling unicode and is therefore also needed.
Then we reach the packages related to the GNU C compilers.
Here the numbers are misleading as they actually depend on other packages that sum up to about 170 MB.
The compilers are not needed at runtime, but since most modern R packages contain C++ code they are needed for installing these packages.
The R images
My new sequence of Docker images are as illustrated in the image below.
The r-minimal
image contains only R, the very small remotes package for installing other packages, but no compilers.
r-deps
can be an actual image with runtime dependencies or it can be an intermediate image in a
multi-stage build.
The r-base
image builds on r-minimal
and have C(++) and Fortran compilers.
Finally, the r-test
image builds on r-base
and have the covr package, devtools package, roxygen2 package and testthat package for testing purposes.
These four packages and their dependencies take up quite a lot of space and are time consuming to install, which is why I have a dedicated image.
Since I now use remotes
instead of devtools
for installing packages inside the images, my Dockerfile for r-test
now installs devtools
because it is needed for testing.
ARG R_VERSION
FROM r-base:${R_VERSION}
USER root
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libxml2-dev \
libssl-dev \
libssh2-1-dev \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
USER shiny
RUN Rscript -e 'install.packages(c("covr", "devtools", "roxygen2", "testthat"))' \
&& rm -rf /tmp/*
COPY --chown=shiny:shiny run_tests.R $HOME/package
WORKDIR $HOME/package
CMD ["Rscript", "/home/shiny/run_tests.R"]
For R 3.5.0 the image summary is as follows.
$ docker image ls
REPOSITORY TAG SIZE
r-test 3.5.0 726MB
r-base 3.5.0 530MB
r-minimal 3.5.0 354MB
r-deps 3.5.0 293MB
ubuntu 18.04 87.5MB
I have updated my GitHub repository with the Dockerfiles.
Using the images
Subsequent images are based off either r-minimal
or r-base
depending on how easy it should be to install new packages that needs compiled code.
Say that I want an image that has the jsonlite package installed, but based off r-minimal
.
jsonlite
has C++ code and no system requirements, so it can be installed in r-base
and then the entire folder with R packages is copied into r-minimal
– this is a multi-stage build.
ARG R_VERSION
FROM r-base:${R_VERSION} as deps
RUN Rscript -e "install.packages('jsonlite')"
# ----------------------------------------------------------
ARG R_VERSION
FROM r-minimal:${R_VERSION}
COPY --from=deps --chown=shiny:shiny /usr/local/lib/R/site-library/ /usr/local/lib/R/site-library/
CMD ["R"]
Instead of tying the Dockerfile to a specific image, we specify the R version as a build argument.
An image is therefore built like this:
docker build --build-arg R_VERSION=3.5.0 --tag myimage:mytag .
LaTeX
None of these images include LaTeX/TeXLive even though this is one of the build requirements.
As noted in this RStudio support article LaTeX is needed to produce PDF files from R Markdown.
And some of the documentation during installation.
As I use neither in these images, I do not include LaTeX.
Comparisons with Python
Some Pythonistas have snidely pointed out to me that minimal Python Docker images are much smaller than the ones I have made here.
That is certainly true, but R come with a lot more batteries baked into the core language and standard library.
Consider for example the Docker image built from this Dockerfile that installs numpy in a small Python distribution.
FROM python:3.7-slim
RUN python -m pip install --no-cache --compile --user numpy
These images now have the following stats.
$ docker image ls
REPOSITORY TAG SIZE
numpy 3.7 219MB
python 3.7-slim 143MB
I am honestly not sure if this discards the 17 MB that are downloaded to install numpy, but the added space would still be comparable to the installed libopenblas
packages.
If we where to add matplotlib for plotting and other Python packages that provide counterparts to R’s built-in functionality, we would probably end up having an image of the same size as r-minimal
.