Contents

NCBI datasets 20221007

Installed
This software should be available with no extra configuration.

ncbi-datasets-20221007

Note: syntax will be changing significantly in the version 14 release. Stay tuned…

Getting started

Welcome to NCBI Datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases. You have the choice of getting the data through three interfaces:

  • NCBI Datasets website
  • Command-line tools
  • API: accessible through our Python package, or in combination with other UNIX tools (such as wget and curl).

../images/datasets/datasets_getting_started.png

Data delivery

How is the data delivered?

NCBI Datasets delivers data and metadata as a cohesive data package contained in a zip archive. When unzipped, files can be found in the folder ncbi_dataset/data.

What do we mean by “cohesive”?

For all data packages, users can include multiple files associated with the requested accession. For example, if users want to download the human reference genome assembly, they can also simultaneously select from transcript, protein, GFF, GTF, GBFF, and metadata files. For more information about data packages and their contents, please see our Data packages page.

Where can I learn more about NCBI Datasets?

You can read more about how to use NCBI Datasets by checking out our How-to guides where you can find instructions on how to download data and metadata for genomes, genes, ortholog sets, and viruses. Additionally, we also have an extensive documentation page for our API and detailed information about our command-line tools.


Location and version

1
2
3
4
$ which datasets
/local/cluster/bin/datasets
$ datasets version
13.43.2

help message

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ datasets help
datasets is a command-line tool that is used to query and download biological sequence data
across all domains of life from NCBI databases.

Refer to NCBI's [download and install](https://www.ncbi.nlm.nih.gov/datasets/docs/download-and-install/) documentation for information about getting started with the command-line tools.

Usage
  datasets [command]

Data Retrieval Commands
  summary              print a summary of a gene or genome dataset
  download             download a gene, genome or coronavirus dataset as a zip file
  rehydrate            rehydrate a downloaded, dehydrated dataset

Miscellaneous Commands
  completion           generate autocompletion scripts
  version              print the version of this client and exit
  help                 Help about any command

Flags
      --api-key string   NCBI Datasets API Key
  -h, --help             help for datasets
      --no-progressbar   hide progress bar

Use datasets help <command> for detailed help about a command.

software ref: https://www.ncbi.nlm.nih.gov/datasets/docs/v1/