Updating R libraries after R version update

In my previous post about R I showed you how to install a local R version and set up your R_LIBS environment variable so that it points to the new R version.

If you already had a working R install, you may be wondering how you can get access to all of the R libraries that you previously installed. You have several options.

You have to decide if you ever want to fall back on the previous R version if something breaks in the current R version. If you are worried about this scenario, then it is useful to maintain the working copy of both the old R executable as well as the libraries that are compatible with that version of R. Maintaing the old option is the ‘safer’ option, which requires additional storage space but you get the peace of mind that if anything happens with the new version, you can fall back to the old one since it’s still in tact. Let’s cover this scenario first.

Let’s assume your directory structure is like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R [11:25:39] 
$ pwd; tree -L 3
/nfs3/CGRB/home/davised/opt/R
.
├── 3.6.0
│   ├── bin
│   │   ├── R
│   │   └── Rscript
│   ├── lib64
│   │   └── R
│   └── share
│       ├── info
│       └── man
└── 3.6.1
    ├── bin
    │   ├── R
    │   └── Rscript
    ├── lib64
    │   └── R
    └── share
        └── man

13 directories, 4 files

So you’ve just updated from 3.6.0 to 3.6.1 and you want to get access to all of the same packages you had previously. The R packages are stored in <version>/lib64/R/library.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R [17:10:51] 
$ ls 3.6.0/lib64/R/library -w 80
ade4          fastmap            matrixStats   Rsamtools
ape           fastmatch          methods       S4Vectors
argparser     foreach            mgcv          scales
assertthat    foreign            mime          segmented
backports     formatR            multtest      seqinr
base          futile.logger      munsell       ShortRead
BH            futile.options     mzR           snow
Biobase       gdata              ncdf4         sourcetools
BiocGenerics  GenomeInfoDb       nlme          sp
BiocManager   GenomeInfoDbData   nnet          spatial
BiocParallel  GenomicAlignments  parallel      spData
...

Let’s copy them over to the new 3.6.1 library directory.

1
$ cp -nr /nfs3/CGRB/home/davised/opt/R-3.6.0/lib64/R/library/* /nfs3/CGRB/home/davised/opt/R/3.6.1/lib64/R/library

We use the -n flag to disallow overwrites, and the -r flag to recurse into the package directories. Now, we need to update any packages to version 3.6.1 if they are available.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R [12:12:49] 
$ echo $R_LIBS
/nfs3/CGRB/home/davised/opt/R/3.6.1/lib64/R/library

# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R [17:16:12] 
$ R --version
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.

$ Rscript -e 'update.packages(repos="https://ftp.osuosl.org/pub/cran", checkBuilt=TRUE, ask=FALSE)'

The update.packages command will find all of the packages that are out of date (with checkBuilt=TRUE an out-of-date package is one that is a different major.minor version. A major.minor.bugfix version difference is fine) will be updated.

NOTE But what if you have bioconductor packages installed? Well, run this after you run the update above:

1
2
3
4
5
6
7
# davised:Linux @ chrom1 in /nfs3/CGRB/home/davised/opt/R [13:24:08]
$ Rscript -e 'BiocManager::install(ask=FALSE)' 
Bioconductor version 3.9 (BiocManager 1.30.10), R 3.6.1 (2019-07-05)
Old packages: 'BiocParallel', 'GenomicAlignments', 'GenomicRanges', 'IRanges',
  'mzR', 'rhdf5', 'Rhdf5lib', 'Rhtslib', 'Rsamtools', 'S4Vectors',
  'SummarizedExperiment'
...

And there you have it. You are on 3.6.1 with the most up-to-date packages. You can set your $PATH and $R_LIBS variables to point back to 3.6.0 and you can recover exactly where you were before the upgrade.

But what if you don’t want to go through the hassle of copying the packages over each time? You can set up a library directory that is agnostic to the version of R that you are using.

For example, lets set up a new directory for our packages here:

1
2
3
# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R [17:24:47] 
$ echo $R_LIBS
/nfs3/CGRB/home/davised/opt/R/library

I set my $R_LIBS in my config file (either ~/.bashrc or ~/.tcshrc for you), and now I need to get some packages in there that I can update from now through all future R upgrades. As I mentioned above, doing it this way means you won’t be able to fall back to the old R version in the future, but you will gain the ability to upgrade to new R versions faster, and you will save hard drive space not having duplicate R library directories for each version of R that you have installed.

So, we need to copy all of the packages that aren’t bundled with R by default. Let’s get that list of packages.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R/library [17:32:15] 
$ Rscript -e 'ip <- as.data.frame(installed.packages()); write(rownames(ip[ip$Priority %in% c("base", "recommended"),]), "base_packages.txt")'

# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R/library [17:32:23] 
$ head base_packages.txt
base
boot
class
cluster
codetools
compiler
datasets
foreign
graphics
grDevices

We need to compare this list in the base_packages.txt file with the list of installed packages, and then copy over just those that aren’t in the base list.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R/library [17:32:31] 
$ ls -1 /nfs3/CGRB/home/davised/opt/R/3.6.0/lib64/R/library > all_installed.txt

# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R/library [17:35:58] 
$ wc -l all_installed.txt 
143 all_installed.txt

# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R/library [17:36:04] 
$ wc -l base_packages.txt 
29 base_packages.txt

# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R/library [17:36:07] 
$ echo 143 - 29 | bc
114

So we have 114 packages we need to copy over. Let’s do it.

1
2
3
4
5
6
# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R/library [17:38:46] 
$ cat base_packages.txt| sed 's/^/^/' | sed 's/$/$/' | grep -vf - all_installed.txt | wc -l
114

# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R/library [17:38:58] 
$ cat base_packages.txt| sed 's/^/^/' | sed 's/$/$/' | grep -vf - all_installed.txt | xargs -I'{}' cp -r /nfs3/CGRB/home/davised/opt/R/3.6.0/lib64/R/library/{} .

This command will copy only those that aren’t in the base list over to the new directory. Now we just need to update them again. Same command as before!

1
2
# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt/R/library [17:40:15] 
$ Rscript -e 'update.packages(repos="https://ftp.osuosl.org/pub/cran", checkBuilt=TRUE, ask=FALSE)'

And if you have bioconductor packages…

1
$ Rscript -e 'BiocManager::install(ask=FALSE)' 

If you take this approach, you can leave your $R_LIBS as-is each time you upgrade R and only run the update.packages() function instead of copying the entire library folder over to the new version. You’ll just have to update your $PATH variable to include the new <R-version>/bin directory.

Happy upgrading!