Conda Tutorial
Getting started with conda
Here are some reasons conda might be a useful tool for your research:
- Installing a second version of a system executable for testing.
- Software version requirements of a tool that cannot be satisfied using system installs (newer version needed).
- Using software that is difficult to compile from source.
- Using software packages that have been optimized for conda environments.
- Maintaining separate environments for individual research/development projects.
References:
- https://docs.conda.io/projects/conda/en/latest/user-guide/index.html
- https://kaust-vislab.github.io/introduction-to-conda-for-data-scientists/index.html
Installation instructions
First, identify a location in your lab’s filespace that makes sense for a conda
install. This would usually be something like
/nfs1/<Your Dept>/<Your Lab>/<Your username>/opt/conda
.
If you already installed your miniconda in your $HOME
on /raid1
, you can
still set up your envs_dirs and pkgs_dirs to a /nfs
or /dfs
target, and the
big files will be installed to the networked directory and save space from your
$HOME
dir.
You’ll download the latest miniconda3 from the conda website: https://docs.conda.io/en/latest/miniconda.html
miniconda is useful over anaconda for routine package installs because we can minimize our install space and don’t have to install extraneous packages that are found in a full anaconda install. Since python2 will be deprecated in the coming year, we recommend our users to install miniconda3 (miniconda2 is python 2 based, while miniconda3 is python 3).
Lets do some housekeeping first.
This is how my opt
folder is structured:
|
|
If you set up your folder structure the same way, then you can always specify a
prefix=/path/to/your/opt
when you are configuring new software. You can then
add /path/to/your/opt/bin
to your $PATH variable and have access to the new
software as well. I would encourage you to use something like this for your
setup.
You’ll also note that this folder is on /nfs3
, NOT in my /raid1
home
directory. This is essential so that you do not fill up (accidentally) your home
dir, which only has a limit of around 25GB.
Now, let’s set up your miniconda install.
Here are some example commands to help you get set up. We’ll start in bash shell to keep things easy for using conda. conda does not work well with csh/tcsh.
**NOTE:** You'll need to edit these commands to point to your *ACTUAL* folders on `/nfs[0123]` or `/dfs`!!Also, the purpose of the | grep
is to check the md5sum for the file to confirm
it was downloaded properly.
|
|
You should see:
|
|
Which indicates the file was accurately downloaded.
Next, follow the instructions in the interactive shell.
Choose yes to have the installer init conda. This will add some commands in
your ~/.bashrc
file that will make using conda easier in the bash shell.
Next, we need to configure our conda install. We need to add some additional channels (i.e. installation repositories) as well as disable auto-loading of conda on login.
|
|
Adding the channels in this order helps ensure that the most up-to-date versions of software are installed preferentially.
The last step is to explicitly provide the envs
and pkgs
directories to
conda for installation. This will keep any temporary or installed conda
packages out of your home directory. Here is some of the help message provided
by conda --config describe
:
|
|
So, let’s add some directories to help our conda install to the correct place:
|
|
Your ~/.condarc
file should look something like this:
|
|
To generate a conda environment to work in, you’ll run the commands:
|
|
Best practice is to include a version number with each package that you want to
include in your conda environment. This will increase future reproducibility of
your research, especially when you want to share your conda environment with
your colleagues. You can find what versions of your package are available using
conda search <pkg_name>
. Furthermore, the best way to eliminate conflicts in
your packages is to install all of the packages that you require at the
creation of your environment.
You’ll want to make sure that any environment variables that are referencing
non-conda managed packages/libraries are cleared upon activation of your conda
environment. You can clear them yourself using unset VARIABLE
or
export VARIABLE=
. You can automate this by adding these commands to a shell
script that you source each time you activate your environment. Alternatively,
we can generate some files that will set our environment variables properly on
activate and deactivate of our environment.
Let’s say we want to use a new install of perl in our conda environment. We would want to clear any custom PERL5LIB and cpanm installation environment variables first. Let’s do so.
|
|
Then, we’ll paste these values into ./etc/conda/activate.d/env_vars.sh
:
|
|
Then, we’ll paste these values into ./etc/conda/deactivate.d/env_vars.sh
:
|
|
This syntax will work for any environment variable that we might want to unset inside the environment, and then reset upon exiting the environment.
Additional thoughts
You’ll want to make sure to set some environment variables on activation of conda environments, especially those that are dependent upon R (setting R_LIBS properly). Additionally, when using perl in a conda environment, it’s advisable to also install perl-app-cpanminus if you have any perl modules that you want to install that aren’t also available through conda. Lastly, if you have python packages that are required for your work, first install any that are available through conda itself and install packages using pip last. DO NOT use –user as you usually would when using pip outside of a conda environment!
Feel free to email me if you have any additional questions about getting conda set up.