The inst Folder in Other R Packages
When making an R package the inst folder is for files/folders that should be copied unmodified into the installed R package folder.
One of my usecases is to include test data in inst/testdata
.
When the package’s test suite is executed, data is loaded and can be used as part of the test.
However, the devil is in the detail:
Things may behave differently when executing the statements of the test interactively compared to running devtools::test
.
The two packages
To make things more concrete I here make two packages:
testload
and testconceal
located in the folder /home/robert/Documents/R
.
The testload package
The testload
package has the following content:
testload
├── DESCRIPTION
├── inst
│ └── testdata
│ └── testload_data.csv
├── testload.Rproj
├── NAMESPACE
└── R
└── lookup.R
The location of the installed testload
is /home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testload
and the contents of the folder is as follows (with the unimportant parts left out):
testload
├── DESCRIPTION
├── help
│ └── ...
├── html
│ └── ...
├── Meta
│ └── ...
├── NAMESPACE
├── R
│ └── ...
└── testdata
└── testload_data.csv
The location of the file testload_data.csv
can be found with the command
system.file("testdata/testload_data.csv", package = "testload")
that returns
"/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testload/testdata/testload_data.csv"
In my actual usecase I have a number of interdependent packages, with test data in a specific format that needs to be parsed.
It therefore makes sense to have a single function handling this in one of the packages.
The tricky part is locating the test data, so this is all my example function does here in lookup.R
:
get_testdata <- function(filename, pkg) {
system.file(filename, package = pkg)
}
I noticed an odd behavior with get_testdata
when I ran tests.
A nice thing about running tests with the testthat package and devtools::test
is that it makes all functions of the package available – not just the exported ones.
Looking at the code of devtools::test
this is accomplished by calling devtools::load_all
.
(As a side node, I highly recommend load_all
when developing packages.)
Hence we can emulate the behavior of devtools::test
with load_all
.
To illustrate what happens, consider these commands when the testload
project is open:
> testload::get_testdata("testdata/testload_data.csv", "testload")
[1] "/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testload/testdata/testload_data.csv"
> devtools::load_all()
Loading testload
> testload::get_testdata("testdata/testload_data.csv", "testload")
[1] "/home/robert/Documents/R/testload/inst/testdata/testload_data.csv"
That is, load_all
makes get_testdata
find testload_data.csv
in the source code folder instead of the installed folder.
The testconceal package
The testconceal
package has the following content:
testconceal
├── testconceal.Rproj
├── DESCRIPTION
├── inst
│ └── testdata
│ └── testconceal_data.csv
├── NAMESPACE
└── R
Now trying to look for the testdata in the testconceal package does not work as for the testload package when we are in the testconceal
project:
> testload::get_testdata("testdata/testconceal_data.csv", "testconceal")
[1] "/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testconceal/testdata/testconceal_data.csv"
> devtools::load_all()
Loading testconceal
> testload::get_testdata("testdata/testconceal_data.csv", "testconceal")
[1] ""
That’s odd!
Not finding any test data will (hopefully!) break the tests.
However, just running the body of get_testdata
still works:
> system.file("testdata/testconceal_data.csv", package = "testconceal")
[1] "/home/robert/Documents/R/testconceal/inst/testdata/testconceal_data.csv"
Silent failure
The peculiar thing is that devtools
silently replaces system.file
when attached.
Looking up the documentation for system.file
before devtools
is attached yields only one result.
But after attaching devtools
we have two choices:
> library(devtools)
> ?system.file
Help on topic ‘system.file’ was found in the following packages:
Package Library
base /opt/R/3.5.1/lib/R/library
pkgload /home/robert/R/x86_64-pc-linux-gnu-library/3.5.1
Choose one
1: Find Names of R System Files {base}
2: Replacement version of system.file {pkgload}
Selection:
From the docs:
[system.file
] is meant to intercept calls to base::system.file()
… It is made available when a package is loaded with load_all()
.
The pkgload::system.file
does its work by using find.package
whose documentation states that
If lib.loc
is NULL, then loaded namespaces are searched before the libraries.
So a solution is to specify where system.file
should look for the test data.
The package folders are available with the .libPaths
function.
In my case:
> .libPaths()
[1] "/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1"
[2] "/opt/R/3.5.1/lib/R/library"
The get_testdata
function is updated as follows:
get_testdata <- function(filename, pkg) {
system.file(filename, package = pkg, lib.loc = .libPaths()[1])
}
Now we get consistent results:
> testload::get_testdata("testdata/testconceal_data.csv", "testconceal")
[1] "/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testconceal/testdata/testconceal_data.csv"
> devtools::load_all()
Loading testconceal
> testload::get_testdata("testdata/testconceal_data.csv", "testconceal")
[1] "/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testconceal/testdata/testconceal_data.csv"
The downside of this approach is that when the test data is updated the package must be installed before it can be used.
Furthermore, to avoid these silent empty strings it may be a good idea to ask for errors if no file is found using the mustWork
argument to system.file
.