Arranging Images on the Command Line
I have been a Mac user for more than a decade, but lately I have started a journey towards replacing my Mac with a Linux laptop.
This leads to a lot of small tasks that have to be carried out differently.
One of them is getting images and videos from an iPad/iPhone to my laptop – without using iCloud.
This may not sound very “data science” like, but I had to go through some steps to get my files into the right folders:
I sort my images by year and month, such that all images from e.g. March 2019 are in Pictures/2019/03
.
I also love using the command line and this was a nice combo.
Get images from iPad
I ended up using the Air Transfer app that exposes a web interface to content on the iPad.
From this interface I can download batches of images in zip files.
However, the images in the zip file are all created at the time of download.
This has not been a problem when accessing the images from a Mac through a cable.
Here the creation timestamp is retained when importing with e.g. Preview.
Get image creation date
Luckily, the Exif data in the image is intact.
When installing ImageMagick on Ubuntu, we also get the program identify
that can access Exif data.
Run the following command to get all the Exif data:
identify -verbose <image file>
The entries related to creation time look like this:
exif:DateTime: 2019:01:12 16:11:02
exif:DateTimeDigitized: 2019:01:12 16:11:02
exif:DateTimeOriginal: 2019:01:12 16:11:02
To limit the output to specific fields (like the time of creation that I am looking for) we can write:
identify -format '%[EXIF:DateTime]' <image file>
The result is then of the form 2019:01:12 16:11:02
.
To get this information for all files I use the following shell script:
#!/usr/bin/env sh
echo "filename;creationtime" > datetime.csv
ls *.JPG | while read f; do
creationdate=`identify -format '%[EXIF:DateTime]' "$f"`
echo "$f $creationdate" >> datetime.csv
done
This reads as follows:
The file datetime.csv
is created/reset with the header line “filename;creationtime”.
All the JPG
files are listed and read individually by the while
. The benefit of using ls | while read f
over for f in `ls`
is that the former handles filenames with spaces (though that is not a problem here).
In the while
loop the creation date is extracted and together with the filename this is added to the end of the file datetime.csv
.
There are also PNG files on my iPad, but these do not have the creation date in the metadata.
Arrange images in folders
A bit of data munging is needed to achieve my goal.
In particular, I have to handle those PNG files with unknown creation date.
To this end, I use R.
First I load in datetime.csv
.
library(readr)
pictures_folder <- file.path(Sys.getenv("HOME"), "Pictures", "unsorted")
creation_date <- read_delim(
file.path(pictures_folder, "datetime.csv"),
delim = ";", escape_double = FALSE,
col_types = cols(
filename = col_character(),
creationtime = col_datetime(format = "%Y:%m:%d %H:%M:%S")
),
trim_ws = TRUE
)
The loaded tibble looks something like this:
> creation_date
# A tibble: 3 x 2
filename creationtime
<chr> <dttm>
1 IMG_0763.JPG 2018-11-02 17:58:26
2 IMG_0962.JPG 2018-12-24 16:57:31
3 IMG_0963.JPG 2018-12-24 16:57:33
Then all unsorted files are read and the JPG files are enriched with the creationtime
by joining with the creation_date
tibble.
From creationtime
we can extract year and month:
all_image_files <- dir(pictures_folder, pattern = "*.(JPG|PNG)", full.names = TRUE) %>%
tibble::as_tibble() %>%
dplyr::rename(frompath = value) %>%
dplyr::mutate(
filename = basename(frompath)
) %>%
dplyr::left_join(creation_date, by = "filename") %>%
dplyr::mutate(
year = lubridate::year(creationtime),
month = lubridate::month(creationtime)
)
A section of the data that illustrates the left_join
:
The PNG files have no creationtime
and therefore no year
or month
:
# A tibble: 6 x 5
filename creationtime year month
<chr> <dttm> <dbl> <dbl>
1 IMG_1197.JPG 2019-01-26 16:28:34 2019 1
2 IMG_1198.JPG 2019-01-26 16:28:37 2019 1
3 IMG_1199.PNG NA NA NA
4 IMG_1200.PNG NA NA NA
5 IMG_1201.JPG 2019-02-01 17:35:08 2019 2
6 IMG_1202.JPG 2019-02-01 17:35:10 2019 2
I choose to fill the NA
's by “Last Observation Carried Forward”, that is, NA
's are replaced by the last non-NA
value above.
A final trick is that I want months in my folders to have two digits such that the lexicographical ordering align with time ordering.
This is solved very neatly by str_pad
.
Now we can construct the new path and remove any rows that have a non-valid path with NA
s.
image_location <- all_image_files %>%
tidyr::fill(year, month, .direction = "down") %>%
dplyr::mutate(
month = stringr::str_pad(month, 2, pad = "0"),
topath = file.path(pictures_folder, year, month, filename)
) %>%
tidyr::drop_na(year, month) %>%
dplyr::select(frompath, topath)
The files shown above now have the following paths (where frompath
is stripped of the folder to fit better on the screen).
> image_location
# A tibble: 7 x 2
frompath topath
<chr> <chr>
1 IMG_1197.JPG /home/robert/Pictures/2019/01/IMG_1197.JPG
2 IMG_1198.JPG /home/robert/Pictures/2019/01/IMG_1198.JPG
3 IMG_1199.PNG /home/robert/Pictures/2019/01/IMG_1199.PNG
4 IMG_1200.PNG /home/robert/Pictures/2019/01/IMG_1200.PNG
5 IMG_1201.JPG /home/robert/Pictures/2019/02/IMG_1201.JPG
6 IMG_1202.JPG /home/robert/Pictures/2019/02/IMG_1202.JPG
This tibble can then be exported for the next step.
readr::write_delim(image_location, file.path(pictures_folder, "image_location.csv"), col_names = FALSE)
Run as a script
The commands can now be collected in a single script to run.
But instead for starting R to run this script, we can turn the R script into a command line script by including an appropriate shebang in the first line:
By saving the as image_location.R
and making it executable with
chmod u+x image_location.R
the script can run by the command ./image_location.R
.
Moving files
Finally the files can be moved based on the image_location.csv
.
I use another shell script that loops through the lines of image_location.csv
, extracting the fromfile
and tofile
using AWK and then moving them.
#!/usr/bin/env sh
while read l; do
fromfile=`echo "$l" | awk '{print $1}'`
tofile=`echo "$l" | awk '{print $2}'`
mv "$fromfile" "$tofile"
done < image_location.csv