Docker Images for R
I use Docker for various purposes, two of which is working with R and Shiny.
In this post I will go through the Docker images I use for R.
I will share more details on the usage in a later post.
The starting point are images from the Rocker Project that provide a lot of R-related Docker images.
Docker images are layered and in the context of R images I apply this in the following manner:
- I have one image with only base R based on an operating system image. No user installed packages. The image is called
r-base
.
- I have one image based on the base R image with the devtools package installed. The image is called
r-devtools
.
The rest of my images built on r-base
and r-devtools
.
The Dockerfiles are hosted on my GitHub profile.
The base R image
I care deeply about reproducibility.
One downside of R that you can experience in the daily use is if you return to a an old project on a new computer.
Simply installing the latest version of R and the necessary packages will quite likely result in different versions of R and the packages.
If you are lucky your code will still run and return the same results.
Otherwise, you have to update your code or try to find and install the required packages in the version you had when you wrote the code.
Packages like packrat and checkpoint aim to help you manage the packages you use in a project.
However, you still need access to old versions of R.
Actively maintained Linux distributions often only have the bleeding edge in the repositories.
An alternative is to compile R yourself – which is what happens in the Version-stable Rocker images.
As noted in the source repository, this does not guarantee very long term reproducibility, as we still need a base image with the dependencies.
Besides the compiled R the Version-stable Rocker images use Microsoft’s daily snapshot of the official CRAN package repository with the date set to be the last day that a particular version of R was the most recent release.
This means that any official package installed within such an r-base
image (or any image that builds on it) works with that version of R.
That was the big lines.
So why don’t I just use the Version-stable Rocker images?
Because there are a few things I want to do differently.
The base image
The Rocker images are all based on Debian and I prefer Debian’s offspring Ubuntu.
This also gives more consistency with the Docker images created with Azure Machine Learning Studio that we have used at work.
The downside is that not all dependencies are the same in Ubuntu and Debian.
In the Version-stable Rocker images’ GitHub repo there are commands for finding dependencies.
The dependencies needed at runtime can be found with the command
apt-cache show r-base-core | grep ^Depends
The dependencies needed for the compilation can be found with the command
apt-cache showsrc r-base-core | grep ^Build-Depends
I do not use all of the dependencies, as some are for graphics in the above document (marked with X11).
It is important to remove the build dependencies afterwards:
The final r-base
image takes up around 440 megabytes when they are removed and a whooping 2.2 gigabytes when they are not removed.
The user
By default, the user in a Docker container is root
, but this is discouraged in the best practices for writing Dockerfiles.
Docker imposes no restrictions for the non-root user in the containers, so my choices aim to make life easier in a later image that do have requirements: Shiny Server.
I create a user called shiny
in a group called shiny
, that owns a global package directory as well as a all coming subfolders (packages) using a sticky permission.
The purpose of the r-devtools
image is to make it easier to make images with custom packages.
Custom packages
The devtools
package makes it easy to install custom packages in an image.
Consider the package MyPackage
's folder:
MyPackage
├── DESCRIPTION
├── Dockerfile
├── man
├── MyPackage.Rproj
├── NAMESPACE
├── R
└── tests
Here the Dockerfile
look as follows:
FROM r-devtools:3.4.4
COPY --chown=shiny:shiny . /tmp/MyPackage
RUN Rscript -e 'devtools::install("/tmp/MyPackage")' \
&& rm -rf /tmp/*
CMD ["R"]
When we build this image from the MyPackage
folder we copy all content in the folder into /tmp/MyPackage
in the image.
We can then use devtools
to install the package and remove the source.
Every statement in a Dockerfile results in an intermediate image.
When building the same image repeatedly it means that a succesful step does not have to be rebuilt, but if an image changes the remainding images also have to be rebuilt.
If MyPackage
has many dependencies the command devtools::install("/tmp/MyPackage")
can take a long time.
During experimentation where the files in MyPackage
changes this results in long build time for the final image.
To work around this I often install the dependencies in a separate RUN
before the COPY
statement, i.e., I include a line like
RUN Rscript -e 'install.packages(c("foo", "bar"))'
before
COPY --chown=shiny:shiny . /tmp/MyPackage