# Submitting data to the NCBI Sequence Read Archive (SRA) # Submitting data to the NCBI Sequence Read Archive (SRA) ## Pre-processing steps - [ ] Collect fastq.gz files for each sample - [ ] Rename fastq.gz files from long name to short name per sample - [ ] Get biosample data early (especially if it's from collaborators) ## Links - - - [Sample attributes page](https://submit.ncbi.nlm.nih.gov/biosample/template/?package-0=MIMS.me.host-associated.6.0&action=definition) - [Biosample batch page](https://www.ncbi.nlm.nih.gov/biosample/docs/submission/batch/) ## Basic Checklist - [ ] Project Title - [ ] Public Description for Project - [ ] Grant Funding ## Start a New Submission The example that I will provide is for a host-associated 16S sequencing study. Go to the link above and start a submission. Fill out the project title, description, grant funding, and departmental information as requested. Choose `Packages for metagenome submitters` if you are following along with host-associated 16S data, then `MIMS Environmental/Metagenome` from the `GSC MIxS` section on the right. Otherwise, fill out the form with your organism and follow along. Choose `Upload a file using Excel or text format (tab-delimited) that includes the attributes for each of your BioSamples` and download the template. ## Minimum Biosample Checklist **Note:** columns marked with `delete` should be completely deleted Columns marked with `*` are required `env_broad_scale` and `env_local_scale` will have a value if from environmental sources, but `not applicable` from host sources. - [ ] *sample_name - [X] sample_title - `delete` - [X] bioproject_accession - `delete` - [x] *organism - `mouse metagenome` (or other [host metagenome](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Tree&id=410656&lvl=3&keep=1&srchmode=1&unlock)) - [ ] *collection_date - `YYYY-mm-dd` as one acceptable format - [X] *env_broad_scale - `not applicable` - [X] *env_local_scale - `animal-associated environment [ENVO:01001002]` (or other [host-environment](http://purl.obolibrary.org/obo/ENVO_01001002)) - [X] *env_medium - `fecal material [ENVO:00002003]` (or other [host tissue](http://purl.obolibrary.org/obo/ENVO_01000254)) - [X] *geo_loc_name - `USA: Oregon` - [X] *host - `Mus musculus` (or other species binomial) - [ ] *lat_lon - 44.566 N 123.283 W - [ ] genetic_mod - [X] host_common_name - `mouse` (or other species common name) - [ ] host_diet - [ ] host_genotype - [ ] host_sex - `male` or `female` - [ ] host_subject_id - [ ] host_taxid - `10090` or other host taxid - [ ] misc_param - [ ] perturbation - experimental or control group - [ ] neg_cont_type - `kit` or `water` ## Minimum SRA metadata - [ ] sample_name - must match biosample name submitted above - [ ] library_ID - can match sample_name - [ ] title - `16S metabarcoding of Mus musculus: feces` - [X] library_strategy - `AMPLICON` - [X] library_source - `METAGENOMIC` - [X] library_selection - `PCR` - [X] library_layout - `paired` - [X] platform - `ILLUMINA` - [X] instrument_model - `Illumina MiSeq` - [ ] design_description - choose from below - `Earth Microbiome Project 16S PCR protocol` - `Illumina 16S PCR protocol` - [X] filetype - `fastq` - [ ] filename - `sample_name_R1.fastq.gz` - [ ] filename2 - `sample_name_R2.fastq.gz` - [ ] filename3 - `delete` - [ ] filename4 - `delete` - [ ] assembly - `delete` - [ ] fasta_file - `delete` ## Uploading data using ftp - Expand the FTP instructions - Log in to files.cqls.oregonstate.edu using ssh - Navigate to the directory containing the reads - Connect to NCBI over ftp using given credentials: - `ftp -i ftp-private.ncbi.nlm.nih.gov` - `-i` flag allows multiple transfers without confirming - Change to your given directory on the website - `mkdir` a new directory for this submission - Use `mput` to upload multiple files at once - `mput *.fastq.gz` - Wait until the files are available to select in the web interface - May take **10+ minutes** - Choose the direcory in the `Select preload folder` dialog and click continue