Robert's Data Science Blog

A C library in an R Package

There are good online tutorials for how to get started with the Rcpp package – in particular the documentation for Rcpp. But what if you want to use the functionality of a C(++) library in an R package?

This simple demonstration package implements a mymean and a mysum function for vectors using the Rcpp package. The mysum function is what I ultimately want in its own library, while mymean is a function in the R package that uses mysum. The mysum function is the same throughout this post: In this post I show how we move from one big cpp file to a mysum library in a separate folder.

If you want an example of including a large C library in an R package, check out the GitHub repo for the haven package.

Create the basic package

Create a minimal package with Rcpp. With RStudio (File > New Project... > R Package using Rcpp) or these commands:

devtools::create()
usethis::use_rcpp()
I prefer to use the roxygen2 package for documentation. (If the project is created using the point and click way, I first delete the NAMESPACE file.) I therefore add the following to (e.g.) R/utils.R to update the NAMESPACE file correctly when running devtools::document:

#' @useDynLib mypkg, .registration = TRUE
#' @importFrom Rcpp sourceCpp
NULL

With the mymean.cpp that is introduced shortly, the folder hierarchy in the package directory is now:

mypkg
├── DESCRIPTION
├── man
├── mypkg.Rproj
├── NAMESPACE
├── R
│   └── utils.R
└── src
    └── mymean.cpp
When installing the package Hadley Wickham encourages the Build & Reload button in RStudio’s Build pane. In the first build two extra files are automatically generated by Rcpp: R/RcppExports.R and src/RcppExports.cpp.

The build part related to the C++ code is (sans the special compiler flags):

g++ ... -c RcppExports.cpp -o RcppExports.o
g++ ... -c mymean.cpp -o mymean.o
g++ ... -o mypkg.so RcppExports.o rcpp_hello_world.o
This reads as follows: mymean.cpp and RcppExports.cpp are each compiled to an object file. The object files are then linked into a shared object file mypkg.so that R can call. We will see how these compiler commands change during the post.

Only one C++ file

The initial content of src/mymean.cpp are two function – one for summing the elements of a vector and one to compute the average of the elements in a vector:

#include <stddef.h>

#include <Rcpp.h>
using namespace Rcpp;

double mysum(size_t n, double *X) {
	double s = 0.0;
	for (size_t i = 0; i < n; ++i) {
	    s += X[i];
	}
	return s;
}

//' @export
// [[Rcpp::export]]
double mymean(NumericVector x) {
	size_t n = x.size();
	double total = mysum(n, x.begin());
	
	return total / n;
}
There is one small trick here: mysum's second argument X is a pointer to an array of doubles. This is the same as the pointer to the first element of x in mymean, which is available as x.begin().

Using size_t (and therefore also the stddef header) for the size of X is probably overkill for this demo, but it fells more “C like”.

Include library in separate file

By default, any cpp file in the src folder is compiled when running devtools::install. You need a header file to make the functions available between files as in any C(++) project, but that is all.

Include following as src/mysum.cpp:

#include <stddef.h>

double mysum(size_t n, double *X) {
	double s = 0.0;
	for (size_t i = 0; i < n; ++i) {
	    s += X[i];
	}
	return s;
}

The header file src/mysum.h defines the mysum function in the include guard of the same name:

#ifndef MYSUM
#define MYSUM

double mysum(size_t n, double *X);

#endif

In src/mean.cpp we replace the mysum function with an include of the header file:

#include <Rcpp.h>
using namespace Rcpp;

#include "mysum.h"

//' @export
// [[Rcpp::export]]
double mymean(NumericVector x) {
  int n = x.size();
  double total = mysum(n, x.begin());

  return total / n;
}

Now mysum.cpp is compiled separately and the object file mysum.o is included in the shared object file.

g++ ... -c RcppExports.cpp -o RcppExports.o
g++ ... -c mymean.cpp -o mymean.o
g++ ... -c mysum.cpp -o mysum.o
g++ ... -o mypkg.so RcppExports.o mymean.o mysum.o

Include library in separate folder

We move on to have mysum in a subfolder of src.

Include library as C++

Now mysum.cpp is moved to the folder src/sum. The header can also be moved to src/sum, but it is not required. A Makefile is needed now that tells Rcpp which files to compile, what the object files are called and what paths to include. The file is called Makevars on *nix and Makevars.win on Windows and is in the src folder:

CPPFILES = $(wildcard *.cpp sum/*.cpp)

SOURCES = $(CPPFILES)

OBJECTS = $(CPPFILES:.cpp=.o)

PKG_CXXFLAGS = -Isum

The CPPFILES are all the cpp files in src and src/sum. The OBJECTS files have the same base name as the CPPFILES, but their filetype is o instead of cpp. Finally, if the header file mysum.h is moved to src/sum this directory must be included in the compiler’s list of directories.

The only difference in the compiler commands is the change of location for the mysum files:

g++ ... -c RcppExports.cpp -o RcppExports.o
g++ ... -c mymean.cpp -o mymean.o
g++ ... -c sum/mysum.cpp -o sum/mysum.o
g++ ... -o mypkg.so mymean.o RcppExports.o sum/mysum.o

Include library as C

In the src/Makevars we now have a list of C++ files and a list of C files. The union of these are the SOURCES files.

CFILES = $(wildcard sum/*.c)
CPPFILES = $(wildcard *.cpp)

SOURCES = $(CFILES) $(CPPFILES)

OBJECTS = $(CFILES:.c=.o) $(CPPFILES:.cpp=.o)

PKG_CXXFLAGS = -Isum

Using a C library in a C++ library requires a few special lines in the header file, src/sum/mysum.h:

#ifndef MYSUM
#define MYSUM

#ifdef __cplusplus
extern "C" {
#endif

double mysum(size_t n, double *X);

#ifdef __cplusplus
}
#endif

#endif

In the compiler commands the base C compiler gcc is now used instead of the C++ compiler:

gcc ... -c sum/mysum.c -o sum/mysum.o
g++ ... -c mymean.cpp -o mymean.o
g++ ... -c RcppExports.cpp -o RcppExports.o
g++ ... -o mypkg.so sum/mysum.o mymean.o RcppExports.o

The final file structure in mypkg:

mypkg
├── DESCRIPTION
├── man
├── mypkg.Rproj
├── NAMESPACE
├── R
│   ├── RcppExports.R
│   └── utils.R
└── src
    ├── Makevars
    ├── mymean.cpp
    ├── RcppExports.cpp
    └── sum
        ├── mysum.c
        └── mysum.h