Submitting data to the NCBI Sequence Read Archive (SRA)

Ed Davis included in CQLS

2023-06-09 457 words 3 minutes

Submitting data to the NCBI Sequence Read Archive (SRA)

Pre-processing steps

Collect fastq.gz files for each sample
Rename fastq.gz files from long name to short name per sample
Get biosample data early (especially if it’s from collaborators)

Basic Checklist

Project Title
Public Description for Project
Grant Funding

Start a New Submission

The example that I will provide is for a host-associated 16S sequencing study.

Go to the link above and start a submission. Fill out the project title, description, grant funding, and departmental information as requested.

Choose Packages for metagenome submitters if you are following along with host-associated 16S data, then MIMS Environmental/Metagenome from the GSC MIxS section on the right. Otherwise, fill out the form with your organism and follow along.

Choose Upload a file using Excel or text format (tab-delimited) that includes the attributes for each of your BioSamples and download the template.

Minimum Biosample Checklist

Note: columns marked with delete should be completely deleted

Columns marked with * are required

env_broad_scale and env_local_scale will have a value if from environmental sources, but not applicable from host sources.

*sample_name
sample_title - delete
bioproject_accession - delete
*organism - mouse metagenome (or other host metagenome)
*collection_date - YYYY-mm-dd as one acceptable format
*env_broad_scale - not applicable
*env_local_scale - animal-associated environment [ENVO:01001002] (or other host-environment)
*env_medium - fecal material [ENVO:00002003] (or other host tissue)
*geo_loc_name - USA: Oregon
*host - Mus musculus (or other species binomial)
*lat_lon - 44.566 N 123.283 W
genetic_mod
host_common_name - mouse (or other species common name)
host_diet
host_genotype
host_sex - male or female
host_subject_id
host_taxid - 10090 or other host taxid
misc_param
perturbation - experimental or control group
neg_cont_type - kit or water

Minimum SRA metadata

sample_name - must match biosample name submitted above
library_ID - can match sample_name
title - 16S metabarcoding of Mus musculus: feces
library_strategy - AMPLICON
library_source - METAGENOMIC
library_selection - PCR
library_layout - paired
platform - ILLUMINA
instrument_model - Illumina MiSeq
design_description - choose from below
- Earth Microbiome Project 16S PCR protocol
- Illumina 16S PCR protocol
filetype - fastq
filename - sample_name_R1.fastq.gz
filename2 - sample_name_R2.fastq.gz
filename3 - delete
filename4 - delete
assembly - delete
fasta_file - delete

Uploading data using ftp

Expand the FTP instructions
Log in to files.cqls.oregonstate.edu using ssh
Navigate to the directory containing the reads
Connect to NCBI over ftp using given credentials:
- ftp -i ftp-private.ncbi.nlm.nih.gov
- -i flag allows multiple transfers without confirming
Change to your given directory on the website
mkdir a new directory for this submission
Use mput to upload multiple files at once
- mput *.fastq.gz
Wait until the files are available to select in the web interface
- May take 10+ minutes
Choose the direcory in the Select preload folder dialog and click continue

Submitting data to the NCBI Sequence Read Archive (SRA)

Submitting data to the NCBI Sequence Read Archive (SRA)

Pre-processing steps

Links

Basic Checklist

Start a New Submission

Minimum Biosample Checklist

Minimum SRA metadata

Uploading data using ftp