Conda Tutorial

Getting started with conda

Here are some reasons conda might be a useful tool for your research:

  1. Installing a second version of a system executable for testing.
  2. Software version requirements of a tool that cannot be satisfied using system installs (newer version needed).
  3. Using software that is difficult to compile from source.
  4. Using software packages that have been optimized for conda environments.
  5. Maintaining separate environments for individual research/development projects.

References:

Installation instructions

First, identify a location in your lab’s filespace that makes sense for a conda install. This would usually be something like /nfs1/<Your Dept>/<Your Lab>/<Your username>/opt/conda.

**NOTE: You'll want to install conda in a location _outside_ of your** **$HOME directory (/raid1) because the packages that you install using conda** **could likely fill up your home directory space, especially if you generate** **multiple conda environments under the same conda install.**

If you already installed your miniconda in your $HOME on /raid1, you can still set up your envs_dirs and pkgs_dirs to a /nfs or /dfs target, and the big files will be installed to the networked directory and save space from your $HOME dir.

You’ll download the latest miniconda3 from the conda website: https://docs.conda.io/en/latest/miniconda.html

miniconda is useful over anaconda for routine package installs because we can minimize our install space and don’t have to install extraneous packages that are found in a full anaconda install. Since python2 will be deprecated in the coming year, we recommend our users to install miniconda3 (miniconda2 is python 2 based, while miniconda3 is python 3).

Lets do some housekeeping first.

This is how my opt folder is structured:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# davised:Linux @ waterman in /nfs3/CGRB/home/davised/opt [10:18:12] 
$ tree -L 1
.
├── augustus_config
├── bin
├── blat
├── conda
├── cpanm
├── downloads
├── edirect
├── edirect-dl.pl
├── include
├── lib
├── lib64
├── libexec
├── miniconda2
├── miniconda3
├── ncbi
├── R
├── rust
└── share

If you set up your folder structure the same way, then you can always specify a prefix=/path/to/your/opt when you are configuring new software. You can then add /path/to/your/opt/bin to your $PATH variable and have access to the new software as well. I would encourage you to use something like this for your setup.

You’ll also note that this folder is on /nfs3, NOT in my /raid1 home directory. This is essential so that you do not fill up (accidentally) your home dir, which only has a limit of around 25GB.

Now, let’s set up your miniconda install.

Here are some example commands to help you get set up. We’ll start in bash shell to keep things easy for using conda. conda does not work well with csh/tcsh.

**NOTE:** You'll need to edit these commands to point to your *ACTUAL* folders on `/nfs[0123]` or `/dfs`!!

Also, the purpose of the | grep is to check the md5sum for the file to confirm it was downloaded properly.

1
2
3
4
5
6
bash
export prefix=/path/to/opt
cd $prefix/conda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
md5sum Miniconda3-latest-Linux-x86_64.sh| grep 81c773ff87af5cfac79ab862942ab6b3
bash ./Miniconda3-latest-Linux-x86_64.sh

You should see:

1
81c773ff87af5cfac79ab862942ab6b3  Miniconda3-latest-Linux-x86_64.sh

Which indicates the file was accurately downloaded.

Next, follow the instructions in the interactive shell.

Choose yes to have the installer init conda. This will add some commands in your ~/.bashrc file that will make using conda easier in the bash shell.

Next, we need to configure our conda install. We need to add some additional channels (i.e. installation repositories) as well as disable auto-loading of conda on login.

1
2
3
4
5
conda update -y conda
conda config --set auto_activate_base false
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Adding the channels in this order helps ensure that the most up-to-date versions of software are installed preferentially.

The last step is to explicitly provide the envs and pkgs directories to conda for installation. This will keep any temporary or installed conda packages out of your home directory. Here is some of the help message provided by conda --config describe:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# ######################################################
# ##            Basic Conda Configuration             ##
# ######################################################

# # envs_dirs (sequence: str)
# #   aliases: envs_path
# #   env var string delimiter: ':'
# #   The list of directories to search for named environments. When
# #   creating a new named environment, the environment will be placed in
# #   the first writable location.
# # 
# envs_dirs: []

# # pkgs_dirs (sequence: str)
# #   env var string delimiter: ','
# #   The list of directories where locally-available packages are linked
# #   from at install time. Packages not locally available are downloaded
# #   and extracted into the first writable directory.
# # 
# pkgs_dirs: []

So, let’s add some directories to help our conda install to the correct place:

1
2
3
4
conda config --add envs_dirs $prefix/conda/envs
conda config --add pkgs_dirs $prefix/conda/pkgs
exit
bash

Your ~/.condarc file should look something like this:

1
2
3
4
5
6
7
8
9
channels:
  - conda-forge
  - bioconda
  - defaults
envs_dirs:
  - /nfs3/CGRB/home/davised/opt/conda/envs
pkgs_dirs:
  - /nfs3/CGRB/home/davised/opt/conda/pkgs
auto_activate_base: false

To generate a conda environment to work in, you’ll run the commands:

1
2
conda create -n test_env plotly=4.4.1 notebook=6.0.1 ipywidgets=7.5.1
conda activate test_env

Best practice is to include a version number with each package that you want to include in your conda environment. This will increase future reproducibility of your research, especially when you want to share your conda environment with your colleagues. You can find what versions of your package are available using conda search <pkg_name>. Furthermore, the best way to eliminate conflicts in your packages is to install all of the packages that you require at the creation of your environment.

You’ll want to make sure that any environment variables that are referencing non-conda managed packages/libraries are cleared upon activation of your conda environment. You can clear them yourself using unset VARIABLE or export VARIABLE=. You can automate this by adding these commands to a shell script that you source each time you activate your environment. Alternatively, we can generate some files that will set our environment variables properly on activate and deactivate of our environment.

Let’s say we want to use a new install of perl in our conda environment. We would want to clear any custom PERL5LIB and cpanm installation environment variables first. Let’s do so.

1
2
3
4
5
cd /path/to/conda/envs/test_env
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Then, we’ll paste these values into ./etc/conda/activate.d/env_vars.sh:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/bin/sh
export OLD_PERL5LIB=${PERL5LIB}
export OLD_PERL_CPANM_HOME=${PERL_CPANM_HOME}
export OLD_PERL_LOCAL_LIB_ROOT=${PERL_LOCAL_LIB_ROOT}
export OLD_PERL_MB_OPT=${PERL_MB_OPT}
export OLD_PERL_MM_OPT=${PERL_MM_OPT}
unset PERL5LIB
unset PERL_CPANM_HOME
unset PERL_LOCAL_LIB_ROOT
unset PERL_MB_OPT
unset PERL_MM_OPT

Then, we’ll paste these values into ./etc/conda/deactivate.d/env_vars.sh:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/bin/sh
export PERL5LIB=${OLD_PERL5LIB}
export PERL_CPANM_HOME=${OLD_PERL_CPANM_HOME}
export PERL_LOCAL_LIB_ROOT=${OLD_PERL_LOCAL_LIB_ROOT}
export PERL_MB_OPT=${OLD_PERL_MB_OPT}
export PERL_MM_OPT=${OLD_PERL_MM_OPT}
unset OLD_PERL5LIB
unset OLD_PERL_CPANM_HOME
unset OLD_PERL_LOCAL_LIB_ROOT
unset OLD_PERL_MB_OPT
unset OLD_PERL_MM_OPT

This syntax will work for any environment variable that we might want to unset inside the environment, and then reset upon exiting the environment.

Additional thoughts

You’ll want to make sure to set some environment variables on activation of conda environments, especially those that are dependent upon R (setting R_LIBS properly). Additionally, when using perl in a conda environment, it’s advisable to also install perl-app-cpanminus if you have any perl modules that you want to install that aren’t also available through conda. Lastly, if you have python packages that are required for your work, first install any that are available through conda itself and install packages using pip last. DO NOT use –user as you usually would when using pip outside of a conda environment!

Feel free to email me if you have any additional questions about getting conda set up.