# Frequenty Asked Questions # New User 'onboarding' / FAQs ## Assumptions You have experience using the command-line; if not, see Training below ## Accounts ### How do I login? (SSH) When using mac or linux-based operating systems, you should have access to a terminal and the ssh program that will let you connect to the infrastructure. Using windows, you will need to use the Windows Subsystem for Linux (WSL) or putty. ### What server do I login to? You should log in to shell.cgrb.oregonstate.edu using the information provided upon signing up. In order to eliminate the need to use duo, you can set up ssh keys that help confirm your identity. ### How do I change my password? The passwd program allows you to change your password. You’ll have to enter your current (even temporary) password before entering your new password twice. If you have ssh keys set up, entering your password just to log in is not required. Your password still is useful for signing in to ### What if I forget my password? You have to submit a support ticket at ### What am I allowed to do on the shell server? On `shell.cqls.oregonstate.edu`, a machine named vaughan, you can edit text files, submit jobs to our queuing system (currently SGE), and do basic text processing. Jobs requiring lots of processors and/or memory will be killed. ### What am I NOT allowed to do on the shell server? Why? Most processing jobs will be killed. This is so that everyone has equal access for logging in to shell.cqls and that processing jobs are not slowing down the shell machine. If all processors on the shell.cqls machine were used, then users would have a difficult time logging in and currently logged in users would have difficulty submitting jobs to SGE. If your command on shell.cqls gets killed, please submit the job using `SGE_Batch` or `SGE_Array`. ### What is the default shell? The default shell is tcsh. Users can request a change of default shell to bash by submitting a ticket. ### What is a shell? The shell is a command-line interface between you (the user) and the computer or server. The shell interprets what you type as commands and interprets the commands such that the computer or server understands what you want to do. ### Do I have a quota for my $HOME directory? Users have a 25GB quota for their home directories. ### How do I check my quota? Use the quota -s command to see your current usage and quota. ### What do I do if I go over my quota? You will need to remove files if you exceed your quota such that you get under the 25GB limit. ### What should I store in my $HOME directory? Minimal configuration and other files should be stored in your `$HOME` directory. All processing should be done on networked filesystem drives and the local `/data` drives of the processing machines. ### How do I edit my $PATH variable and save it across log-ins? The exact changes you need to make depend on your `$SHELL` (either `bash` or `tcsh`). I suggest making the change temporarily first, and then change your configuration file (`~/.bashrc` for `bash`, and `~/.cshrc` for `tcsh`). You'll need the full path to the directory that contains the program(s) you want to add to your `$PATH`. To temporarily add the programs to your `$PATH`, run the appropriate command below (`export` for `bash`, `setenv` for `tcsh`): ```console export PATH=/path/to/new/directory:${PATH} setenv PATH /path/to/new/directory:${PATH} ``` After you test out the new command, then you can add those lines to your config file: ```console echo 'export PATH=/path/to/new/directory:${PATH}' >> ~/.bashrc echo 'setenv PATH /path/to/new/directory:${PATH}' >> ~/.cshrc ``` Keys to doing this properly are: 1. Make sure to use the `>>` append redirect so you don't overwrite your config (`>` will overwrite the file). 2. Make sure to use single quotes `'` for your echo command, otherwise the `${PATH}` variable will get expanded unecessarily. You can also just edit the appropriate file with a text editor (`vim`, `emacs`, `nano`) as you feel comfortable. After you add the changes to your config files, the updated `$PATH` will get loaded on every new log-in shell. ## File Transfers ### What server should I use to transfer files via SFTP/SCP? You must use `files.cqls.oregonstate.edu` for file transfers. File transfers are disabled on `shell.cqls.oregonstate.edu`. ### What is SFTP? SFTP stands for Secure File Transfer Protocol. SFTP allows for file transfers, both to and from the infrastructure, using the same security provided by ssh. ### What is SCP? SCP stands for Secure Copy Protocol. The scp program allows secure file transfers from the infrastructure to your own computer. ### Can I use FTP? FTP (File Transfer Protocol) is insecure and CQLS servers will not host files over FTP. However, you can use the ftp program from `files.cqls.oregonstate.edu` to transfer files externally, e.g. to NCBI, if necessary. ### Why can’t I transfer files via SFTP to shell.cgrb? The `shell.cqls.oregonstate.edu` machine should be used for processing only, not file transfers. ### Can I transfer files using a Windows drive share to the infrastructure? TBD ### I want to publish data via the web, how can I do this? You can be provided a lab directory on the files.cqls.oregonstate.edu machine to publish data externally. Please submit a support request to find out more information. [See here for examples](https://files.cqls.oregonstate.edu/) ### Can I access my files via the web [See above.](#i-want-to-publish-data-via-the-web-how-can-i-do-this) ## Storage ### What is ZFS / NFS? [ZFS](https://itsfoss.com/what-is-zfs/) is a file system and volume management technology that scales indefinitely and emphasizes zero data loss. We use the ZFS protocol on our networked file system (NFS) drives. ### What is DFS / Quobyte? DFS is a distributed file system. Its use has been discontinued at the CQLS. ### What is stored on NFS, what should I be using it for? All data and outputs should be stored on the NFS. We recommend using the `/data` drives, which are specific to each compute node, for doing the analysis and then copying the results back onto a NFS location. ### Does my lab have NFS space? You may have access to NFS; you will need to ask the post-doc/professor in your lab who will know. ### I accidentally deleted a file; are my files backed up? The `$HOME` directory is backed up. Each NFS location may or may not be backed up depending on if your lab pays for storage backup. Contact [Support](mailto:support@cqls.oregonstate.edu) if you need to start backups on your space. Please contact [Matthew Peterson](mailto:matthew@cqls.oregonstate.edu) if you need to recover previously backed up data. ### What is tape backup? Tape backup is a long term storage backup for recovery after some disaster. All sequencing runs are copied to tape prior to deletion. Each lab is still responsible for copying and maintaining raw sequencing data; tape backup is used for emergencies only, and **is not guaranteed**. ### How do I get more space added to our NFS space? Please contact [Support](mailto:support@cqls.oregonstate.edu) to purchase more NFS disk space. ## Batch Processing ### What is batch processing? A batch process is one that can run without human interaction. When we submit processes in a non-interactive mode on the infrastructure, we are submitting batch processes. ### What is SGE? SGE is Son of Grid Engine, which is a queuing system. SGE allows us to submit batch jobs to different compute nodes across the infrastructure, such that each job runs when resources permit. ### What is an SGE queue? We have different queues available on the infrastructure so that each lab may have different resources available at any given time. ### What queues are available to me? Not all labs/colleges have access to the same resources. You can see which resources are available to you by running `SGE_Avail`. ### How do I submit jobs? There are multiple ways to submit jobs. A single job can be run with the `SGE_Batch` command. More information about queueing systems will be released in a [separate post](../queuing-system). ### How do I check out a compute node for interactive use? Make sure you have a queue with the `I` (interactive) attribute using `SGE_Avail`. Then, check out the node using the `qrsh` command. You can request multiple processors using the `qrsh -pe thread N` option, where N is the number of processors you'd like to check out. ### What is SGE Array? `SGE_Array` allows easy submission of array jobs, which are most commonly used when a user has a command that they want to run on many individual inputs (10s-1000s). Instead of submitting hundreds or thousands of jobs, you can submit a single array job and control how many tasks are running at once. ### How do I check the status of jobs? The `qstat` command allows you to see which jobs are running. You can look at a single task by running `qstat -j $JOBID` to see more about a specific task. ### How do I kill jobs? You kill jobs with the `qdel $JOBID` command. ### What happens if I see an ‘E’ state? If you see an E state, you should run `qstat -j $JOBID` to determine what the error message is. Sometimes, error state can be cleared with the `clear_Eqw_job.sh` command. Other times, the job should be killed with `qdel` so that you can change settings before re-submitting the job. ### How do I know how long my bioinformatics job will run? We suggest running a small test dataset through your pipeline(s) to determine an expected amount of processing time/resource utilization. The user is responsible for ensuring that their jobs are only using the amount of resources requested in the queuing system. Please monitor your jobs (you can `qrsh` to the same node that your job is running on to check the health of the machine using e.g. `htop`) to ensure everything is going as requested. ### How does my lab obtain more processing resources? Most colleges have access to shared computing resources; if you are a member of a college where you think you should have access to machines and they are not available, please submit a [support request](https://shell.cqls.oregonstate.edu/support/). If you or your college does not have resources available, we have machines available to rent for up to 6 months in length. If you have more needs than that, you can email [Support](mailto:support@cqls.oregonstate.edu) and discuss other options, including current costs of purchasing machines. ### Another lab has asked me to collaborate with them, but I cannot access their files or compute resources, what do I do? Email [Support](mailto:support@cqls.oregonstate.edu) and cc the appropriate collaborators to get access to their files. ## Support Tickets ### How do I follow up to obtain further support? To request general support, use the [support form](https://shell.cqls.oregonstate.edu/support/) Please use the 'cgrb-support' option for general questions and 'cgrb-software' option for software requests. ### How do I accurately describe my issue? Please provide information regarding: * What machine you are having an issue with (use `qstat` to see the node) * What software you are trying to run * What your expected output is, and what the observed output is * A copy/paste of any error messages you may have * How to reproduce the issue * What you may have tried to resolve the issue * Any links to the software or software help pages that might help If you are submitting a software install request, please provide a link to the github page or other source material. ### How do I check on the status of my support ticket? Please follow up by emailing [Support](mailto:support@cqls.oregonstate.edu) for support requests, and [Ed Davis](mailto:ed@cqls.oregonstate.edu) for software requests. ## Training ### How can I learn more about using the command line? We offer 'Intro to Unix/Linux' and 'Command-line data analysis' courses - see the [Workshops](https://cqls.oregonstate.edu/training/workshops) page for more information. We also offer one-on-one training for an hourly fee; please email [the bioinformatics team](bioinformatics@cqls.oregonstate.edu). ## Software ### Conda #### How do I get conda set up? Follow the [instructions here.](../using-the-system-miniconda-3-install) #### How do I fix my broken login/configs The raw configuration files can be found here: ```console /local/cluster/etc/inits/.bashrc /local/cluster/etc/inits/.cshrc ``` You can make a backup of your current file and then copy the raw configuration files into your home directory. You can also remove a `~/.tcshrc` file if it's present. ```console mv ~/.bashrc ~/.bashrc.bak mv ~/.cshrc ~/.cshrc.bak rm -f ~/.tcshrc cp /local/cluster/etc/inits/.bashrc ~ cp /local/cluster/etc/inits/.cshrc ~ ``` Then log out and log back in. If your configuration seems fixed, you can add some of the modifications from your backups to the newly refreshed config files. #### If I already have conda set up, how do I access the system envs? If you set up conda prior to February 2023, you likely do not have access to the latest conda envs when you run `conda env list`; they are installed in `/local/cluster/conda-envs/envs`. If that's the case for you, please run these commands to get access to them: ```console bash conda config --append envs_dirs /local/cluster/conda-envs/envs conda config --append pkgs_dirs /local/cluster/conda-envs/pkgs ``` #### Where can I learn more about conda env activation? [See this link from the conda documentation](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#macos-and-linux) For most conda environments on our infrastructure, I run the scripts in `/local/cluster/conda/conda_*_setup.sh` to resolve version mismatches. #### R isn't working in my conda env, why? You likely have `$R_LIBS` or `$R_LIBS_USER` set and the R is pulling libraries from your home directory or other location that are incompatible with the R environment. You can manually unset those env vars or go to the base env of your conda directory and run `bash /local/cluster/conda/conda_R_setup.sh` to automatically set up the appropriate env vars on `conda activate` and `conda deactivate`. For a copy/paste option if the conda env is active: ```console cd $CONDA_PREFIX conda deactivate bash /local/cluster/conda/conda_R_setup.sh conda activate . ``` [See above for more information](#where-can-i-learn-more-about-conda-env-activation) #### Python is not working or has version mismatches You may need to unalias python `unalias python`. You can set it in your `~/.bashrc` or `~/.cshrc` files as well. You will have to fully type out `/local/cluster/bin/python2` or `/local/cluster/bin/python3` or add `/local/cluster/bin` in your `$PATH` upstream of `/usr/bin` e.g. `export PATH=/local/cluster/bin:${PATH}` so you don't have to type it out fully. Your python is probably pulling from your `.local` install. You can manually unset those env vars or go to the base env of your conda directory and run `bash /local/cluster/conda/conda_python_setup.sh` to automatically set up the appropriate env vars on `conda activate` and `conda deactivate`. For a copy/paste option if the conda env is active: ```console cd $CONDA_PREFIX conda deactivate bash /local/cluster/conda/conda_python_setup.sh conda activate . ``` [See above for more information](#where-can-i-learn-more-about-conda-env-activation) #### Perl is not working or has version mismatches Your perl is probably pulling from your `PERL5LIB` env var. You can manually unset those env vars or go to the base env of your conda directory and run `bash /local/cluster/conda/conda_perl_setup.sh` to automatically set up the appropriate env vars on `conda activate` and `conda deactivate`. For a copy/paste option if the conda env is active: ```console cd $CONDA_PREFIX conda deactivate bash /local/cluster/conda/conda_perl_setup.sh conda activate . ``` [See above for more information](#where-can-i-learn-more-about-conda-env-activation) #### Software is not working due to a mismatch in linked libraries (lib.so missing) The compiler on our infrastructure is old (gcc 4.8.5), and does not provide the most up-to-date linked libraries. The conda `LD_LIBRARY_PATH` is not getting set properly. You can manually set your `LD_LIBRARY_PATH` to include `$CONDA_PREFIX/lib`, or you can go to the base env of your conda directory and run `bash /local/cluster/conda/conda_perl_setup.sh` to automatically set up the appropriate env vars on `conda activate` and `conda deactivate`. For a copy/paste option if the conda env is active: ```console cd $CONDA_PREFIX conda deactivate bash /local/cluster/conda/conda_LD_setup.sh conda activate . ``` [See above for more information](#where-can-i-learn-more-about-conda-env-activation) ### What software do I use for… #### Adapter trimming For automated adapter trimming, we currently recommend fastp. fastp is a good option for situations where you have adapters on the 3’ end of reads due to read-through of short inserts into the sequencing adapter. For trimming of primer sequences or other custom sequences, we suggest using bbduk.sh or cutadapt. #### Short read alignment bwa mem #### Spliced alignment STAR or hisat2 #### Long read alignment minimap2 #### Genome assembly (Illumina) SPAdes #### Genome assembly (long read) flye or nextdenovo #### Genome annotation (prokaryote) Bakta #### RNA-Seq quantification salmon #### Differential gene expression analysis deseq2 #### Pairwise sequence alignment blast or diamond #### Multiple sequence alignment `mafft --auto` #### Phylogenetic tree construction IQ-TREE; fasttree can be useful for preliminary analysis #### Orthologous group calculation (Prokaryote) PIRATE for cultured organisms; PPanGGOLiN for MAGs/SAGs anvio is useful for pangenome analysis as well #### Principal component analysis or other ordination/dimensional reduction Using R: prcomp for principal component analysis (PCA); vegan for nonmetric multidimensional scaling (NMDS) or constrained ordination e.g. redundancy analysis (RDA); ape for principal coordinate analysis (PCoA) ### How do I… #### Figure out how to run a program * Use -h e.g. `$program -h` * Use --help e.g. `$program --help` * Use help e.g. `$program help` * Use man (this works for system installed things like cat, mkdir, ls) e.g. `man $program` * Use tldr (works for common programs, awk, sed, tar, wget) e.g. `tldr $program` * Examine the script with less e.g. `which $program; less -S /local/cluster/bin/$program` (Note: does not work for compiled software) * Search the program name on google * Search the program name on the updates website You can also try `helpme` for some curated help data. #### Display a formatted markdown file on the command line Use `glow` #### Download reads from NCBI Use `prefetch` and `fasterqdump`. See [here](../using-sra-toolkit) for more info. #### Download genomes from NCBI You can use the [data-hub](https://www.ncbi.nlm.nih.gov/data-hub/genome/) to get genome data. Use `files.cqls.oregonstate.edu` for downloads. You can also use the [get_assemblies](https://github.com/davised/get_assemblies) program (use `python3 -m pip install --user get-assemblies` to install). #### Generate a BLASTDB Use `makeblastdb -in INPUT.fasta --dbtype [nucl|prot]` to generate your BLASTDB. Please submit using `SGE_Batch -c ...`. #### Do a BLASTN or BLASTP search Use `blastp` or `blastn`. Use the `-help` flag for options. Do not use `blastall` as it is old and unsupported now. ## Miscellaneous ### My terminal output is garbled Run `reset`. This should reset the output on your screen and you should be able to continue as normal. ### Do my sequences have adapters? All index sequences will not be included in the raw reads. Adapters could be on the 3' end of reads, depending on the library prep type. Please check the FASTQC reports sent by Matthew with each sequencing run to determine if your reads have adapters. Use the `fastp` program to remove them. ### What is going on with my 16S sequencing results? See [this page](../16s-sequencing) for some information regarding the 16S preps provided at the CQLS.