Information on relevant sequence databases can be found
by following the links below. Additionally, the first issue every year of
Nucleic Acids Research contains status reports
from the curators of the major databases.
is the division of GenBank that contains
"single-pass" cDNA sequences,
or Expressed Sequence Tags, from a number of organisms.
Entries from the DNA Databank of Japan (DDBJ)
are wholly incorporated into GenBank.
Nucleotide Sequence Database is a comprehensive database of DNA
and RNA sequences collected from the scientific literature and
patent applications and directly submitted from researchers and
sequencing groups. Data collection is done in collaboration with
GenBank (USA) and the DNA Databank of Japan (DDBJ).
The Ensembl project produces genome databases for vertebrates and other eukaryotic species.
Ensembl is a joint project between
EMBL - EBI and the Wellcome Trust
is the NIH genetic sequence database, an annotated collection
of all publicly available DNA sequences. There are approximately
1,622,000,000 bases in 2,356,000 sequence records as of June 1998.
The complete release
notes for the current version of GenBank are available by
FTP. A new release is made every two months. GenBank is part of
the International Nucleotide Sequence Database Collaboration,
which is comprised of the DNA DataBank of Japan (DDBJ), the European
Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These
three organizations exchange data on a daily basis.
IPI - International Protein Index -
provided a top level guide to the main databases that described the proteomes of higher eukaryotic organisms.
The databases are no longer updated, and the last releases were on the 27th September 2011. The suggested
replacements are the UniProt complete proteomes.
has not been updated since 2006 and should be considered obsolete.
NCBI maintains composite, non-identical protein and nucleic acid
databases for their search tools
The entries in the protein database,
, have been compiled from GenBank CDS translations,
PIR, SWISS-PROT, PRF, and PDB. NCBI has made strong efforts to
cross-reference the sequences in these databases in order to avoid
OWL has not been updated since May 1999,
and should be considered obsolete.
The Brookhaven Protein Data
Bank (PDB) is a database of three-dimensional structures.
This means that entries are invariably well characterised, with
reliable sequence data which can also be found in the other databases.
Entries which are unique to PDB tend to be variant proteins, with
distorted structures, which were used to refine a structural determination.
(Protein Information Resource) database was initiated at the NBRF
in the early 1960's by the late Margaret O. Dayhoff as a collection
of sequences for the study of evolutionary relationships among
proteins. The database is now an international collaboration of
three data centers: the NBRF, the Munich Information Center for
Protein Sequences (MIPS), and the Japan International Protein
Information Database (JIPID). The three centers cooperate to produce
and distribute a single database of `wild-type' protein sequences.
The Protein Research Foundation
of Japan database contains protein sequences abstracted from scientific
UniProt (Swiss-Prot & Trembl)
Protein knowledgebase consists of two sections.
Swiss-Prot, which is manually annotated and reviewed, and
TrEMBL, which is automatically annotated and is not reviewed.
UniProt is a collaboration between the European
Bioinformatics Institute (EBI), the SIB Swiss
Institute of Bioinformatics and the Protein
Information Resource (PIR)- EBI).