However, one trick was necessary to speed up the process.
In my actual usecase with 10.000s of images, it literally took the image generation time from hours to minutes.
Generate images
Let me first generate some data in the following format:
Each row in a tibble has the title of the image and the x
and y
data to be plotted.
I am deliberately using base R plotting – I find this much more convenient than transforming my data into a format that {ggplot2} can handle.
tbl <- tibble::tibble(
x = list(runif(10), runif(8)),
y = list(runif(10), runif(8)),
title = c("a", "b")
)
I then define a function for plotting a row of data:
my_plot <- function(x, y, title) {
plot(x = x, y = y, main = title)
}
I also need a function for saving the plots.
This function is where there are potential speed-ups.
My first version saves each plot and closes the connection:
save_plots <- function(x, y, title, save_dir) {
png(filename = fs::path(save_dir, title, ext = "png"))
my_plot(x, y, title)
dev.off()
}
The plots can now be generated by sweeping through the rows of tbl
:
save_dir <- fs::path_temp("my_images")
fs::dir_create(save_dir)
purrr::pwalk(tbl, save_plots, save_dir = save_dir)
When tbl
is large this is very time consuming.
Faster image generation
The png
function has a trick:
We can specify a “family” of filenames (just as the default filename
argument of png
) and only call dev.off
once after all images are saved.
png(filename = fs::path(save_dir, "image%03d.png"))
purrr::pwalk(tbl, my_plot)
dev.off()
The sprintf
format “%03d” means that each image is numbered sequentially with 3 digits that are left-padded with zeros if needed (this ensures that the lexicographical ordering aligns with the index ordering).
We can see this in the generated images
list.files(save_dir)
## [1] "a.png" "b.png" "image001.png" "image002.png"