Nucleic Acid Databases

Search Databases at NCBI (GenBank)
The NCBI provides facilities to search for sequences by text or by sequence similarity and to submit new sequences.

Search Databases at GSDB
The GSDB provides facilities to search for sequences by text or by sequence similarity and to submit new sequences.

Search Databases at DDBJ
The DDBJ provides facilities to search for sequences by text or by sequence similarity and to submit new sequences.

BLAST search of databases at NCBI
BLAST is a program that allows you to search for similarity between your query sequence and the gene sequences held at the NCBI.

BLAST search of databases at NCGR
BLAST is a program that allows you to search for similarity between your query sequence and the gene sequences held at the NCGR.

BLAST search of human chromosome databases
Allows the searching of a DNA database containing all human sequence data available from the Sanger Centre.

The sequence data contains:

Finished human genomic sequence
Unfinished human genomic sequence
CpG island sequences

BLASTula, the server of Blast servers
BLASTula, the server of Blast servers: a group of pages offering a unique access to more than 40 different Blast servers world-wide operating on original sets of sequences.

BLAST documentation
"classical" BLAST analyses
BLAST searches operating on specialised databases
enhanced BLAST analyses (Wu-Blast, Beauty-Blast, Prodom-Blast, ...)

BLAST2 search of databases at EMBL
BLAST2 is a program that allows you to search for similarity between your query sequence and the gene sequences held at EMBL. It is similar to the original BLAST program, but it includes gaps in the alignments.

INCA with BLAST / Entrez
Iterative Neighborhood Cluster Analysis

INCA is a Java applet that runs BLAST

INCA is a Java 1.02 applet. Give INCA a starter sequence and it finds related sequences. INCA runs BLAST on the starter sequence and then runs BLAST on the matching sequences. INCA keeps track of all the results. INCA originally accessed the Entrez predefined sequence neighbors. Now INCA uses the BLAST server to find sequence neighbors dynamically. Using BLAST instead of Entrez to find neighbors permits one to adjust search parameters as needed, and can improve search results.

Expressed Sequence Tags (dbEST)
dbEST (Expressed Sequence Tag) sequences are 'single pass' partial DNA sequences derived from clones randomly selected from cDNA libraries. dbEST is maintained by NCBI and included in the GenBank database. Because these data differ from traditional GenBank entries and thus require special processing and annotation, NCBI also makes them available in a separate database, dbEST. The full reports contain information on the availability of physical cDNA clones and mapping data in collaboration with the Genome Data Base at Johns Hopkins University.

dbGSS - Genome Survey Sequence
Contains contact information about the contributors, experimental conditions and genetic map locations of the Genome Survey Sequence division of Genbank/EMBL.

SRS-FASTA: Similarity Search of GenBank Subsets
This is a search of your query sequence against subsets of nucleic and protein databanks. These subsets are chosen by you with keyword selections in the sequence documentation.

There may be times when you will get better information by eliminating unwanted sections of the databanks before performing a sequence search. Given the large size and constant updates to the biosequence databanks, it is difficult to produce subsets of these data directly for similarity searching. By coupling similarity search software (FastA) with keyword selection software (SRS), one can provide such searches fairly efficiently.

Sequence Retrieval System (SRS)
A powerful search tool with links between more than 20 molecular biology databases (EMBL, SwissProt, PIR, PDB, Prosite ...) allowing complex searches

WWW-Query - sequence data and multivariate analysis
This is a World-Wide Web server for accessing sequence collections indexed with ACNUC and for performing multivariate analyses on sequences. General collections like GenBank or EMBL can be accessed, as well as specialized data banks like Hovergen or NRSub.

Indexation with ACNUC makes possible the building of queries using many criteria to retrieve sequences. Criteria are based on mnemonics, accession numbers, keywords, taxonomic data, bibliographic references, dates of insertion in the bank, the nature of the genome from which a sequence has been obtained, etc. Also, the notion of subsequence introduced in ACNUC allows to retrieve idependently genomic fragments of biological interest like CDS, tRNAs, rRNA, snRNAs, etc.

The result of each query is represented by a list of sequences and this list is temporarily stored in our server. By this way, it is possible to re-use a previous list to build more complex queries or to perform treatments on a set of sequences. Up to now, these methods consist mainly in programs for performing multivariate analyses on the CDS or the proteins. These methods are: Principal Component Analysis (PCA), COrrespondence Analysis (COA), and Multiple Correspondence Analysis (MCA).

GeneNet
GeneNet is a meta-search system for the analysis of sequence similarity and is designed for helping biologists to analysis sequences efficiently via WWW. It also performs periodical searching that prevents biologists from repetitive analysis of the same sequence.

GeneNet can communicate simultaneously with four databases (GenBank in NCBI, PDB, BLOCKS, and KEGG.) which are widely used. For protein sequences, searches are performed to four databases described above. For DNA sequences, only GenBank analysis is possible.

REBASE The Restriction Enzyme Database
REBASE is a collection of information about restriction enzymes, methylases, the microorganisms from which they have been isolated, recognition sequences, cleavage sites, methylation specificity, the commercial availability of the enzymes, and references - both published and unpublished observations

Multi-Cut - A Data Base of Restriction Endonuclease Buffers
Multi-cut is a database of restriction endonuclease buffers. It finds compatible buffers for a list of enzymes that you want to use in a multiple restriction endonuclease digest. Multi-Cut searches through activity data from the catalogs of several major restriction endonuclease manufacturers and finds buffers that will work with all of the endonucleases in the reaction.

Sequence Tag Alignment and Consensus Knowledgebase (STACK)
Aims to make the most comprehensive representation of the sequence of each of the expressed genes in the human genome.

Codon Usage Database
A query box to search a codon usage table for an organism, is presented. Search can be done via the Latin name or common name.

Alphabetical lists of all organisms and lists for organisms with 100 or more CDS's in Genbank available, are also presented.

ImMunoGeneTics Database (IMGT)
IMGT, the international ImMunoGeneTics database, is a high-quality integrated database specialising in Immunoglobulins (Ig), T cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species, created in 1989 by Marie-Paule Lefranc (Universiti Montpellier II, CNRS). IMGT, a European project since 1992, works in close collaboration with EBI. At present, IMGT includes two databases: IMGT/LIGM-DB, a comprehensive database of Ig and TcR from human and other vertebrates, with translation for fully annotated sequences, and IMGT/HLA-DB, a database of the human MHC referred to as HLA (Human Leucocyte Antigens). The IMGT server provides a common access to all Immunogenetics data.

EPD Eukaryotic Promotor Database
The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of experimentally characterised eukaryotic POL II promoters.

The Tumor Gene Database
A database of genes associated with tumorigenesis and cellular transformation. This database includes oncogenes, proto-oncogenes, tumor supressor genes/anti-oncogenes, regulators and substrates of the above, regions believed to contain such genes such as tumor-associated chromosomal break points and viral integration sites, and other genes and chromosomal regions that seems relevant.

Nucleic Acid Database (NDB) Project
The goal of the Nucleic Acid Database Project is to assemble and distribute structural information about nucleic acids.

Structures may be selected by making choices based on a large variety of structural and experimental characteristics.

The user can then view the structure's coordinates in either NDB or PDB format, view the structure's full NDB entry, view the structure using either a local viewer or the remote viewer (RasMol), or display the structure's atlas entry.

DNA Patents Database
The DNA Patents Database, compiled by the National Academy of Sciences (USA) contains the full text of patents. It is set up to provide the key biological information about each patent - which genes are included, the techniques used in their discovery and the precise extent of the claims made in each patent.

Molecular Probe Data Base (MPDB)
Contains information on ca. 4000 synthetic oligonucleotides with a sequence of up to 100 nucleotides.

HIV Sequence Database
The HIV Sequence Database focuses on five primary goals:

Collecting HIV and SIV sequence data (since 1987)
Curating and annotating this data
Computer analysis of HIV and related sequences
Production of software for the analysis of (sequence) data
Publication of the data and analyses on this site and in a yearly printed publication, the HIV sequence Compendium.

DB Search does a straightforward search on a large number of database fields. Ouput comes in the form og Genbank-style sequences.

HIV-MAP allows searching on fewer fields, but that can find all sequences that overlap or partially overlap a region, optionally clip out that region from a longer sequence, and can even produce an alignment of the selected sequences. This interface does allow searches by subtype, and country, and accession number or sequence name, and can produce output files in Genbank, Fasta, and Intelligenetics format.

Sequence analysis
- HIV-BLAST runs a BLAST search against our database, which contains only HIV and SIV sequences
- TreeMaker can be used to produce simple trees; it is an interface to the Joseph Felsenstein's DnaDist, Neighbor, and Drawtree/Drawgram programs.
- SeqPublish to replace identical columns in an alignment are replaced by dashes for publication
- HXB2 Numbering Engine to find position numbers in HIV relative to HXB2
- RIP: Intersubtype Recombination Analysis, a program for detecting evidence of inter-subtype recombination.
- Search for hypermutationin a dinucleotide context
- Vespa: Signature Pattern Analysis, a program for identifying sites which are shared by one group of sequences, and are rare in another group
- SNAP: Synonymous-Nonsynonymous Analysis Program calculates syn and nonsyn values for an alignment
- Principal Coordinate Analysis (PCOORD), an interface to the program by Des Higgins for identifying patterns of correlated positions in an alignment
Other programs
- ODprep/ODfit calculate antibody titers based on concentration and optical density data.
- HMA gel analysis, our interface to HDent and HDdist, programs for analysing data from heteroduplex mobility and tracking assays.

Euchromatin Network
The Euchromatin Network is designed to help researchers and others interested in the latest developments in our studies of this most active part of the genome within the cell nucleus.

With increasing accuracy, resolution, and sensitivity, our cell biology methods are revealing new and important information about the role of active euchromatin in the life of the cell, during embryogenesis and cell differentiation, during the hormone response and the immune response, during neoplasia and organ regeneration.

Proteins have been described as the"agents" whereby the cell accomplishes its many metabolic functions. DNA has been described as the"library" whereby the cell stores the structural blueprints for each protein of that individual. RNA is now being recognized as the"spark" whereby the cell activates specific genes of the genome for expression as proteins in the cell. Such"riboregulators" are being recognized in the animal, the plant and even the bacterial world.

Euchromatin is that unique combination of DNA, RNA and proteins which allows this magnificent cellular program within the cell nucleus to proceed with accuracy, safety and flexibility.

Human Tumor Gene Index (hTGI)
The Human Tumor Gene Index (hTGI) has two major goals:

to identify genes expressed during development of human tumors
to discover new human genes
cDNA Libraries
- cDNA Library Browser
- cDNA Library Sources
- Summary Tables of Libraries, Genes and Sequences
cDNA Clones
- IMAGE Consortium
- Tumor Suppressor and Oncogene Directory
- Summary Tables of Libraries, Genes and Sequences
Genes
- Gene Discovery
- Tumor Suppressor and Oncogene Directory
- Summary Tables of Libraries, Genes and Sequences
- GeneExpress
Gene Expression
- Digital Differential Display
- cDNA Expression Profiler
- Serial Analysis of Gene Expression Map

Intron Sequence Information System (ISIS)
This contains information on spliceosomal introns. ISIS contains phylogenetic and protein homology categories, information about individual sequences and various bioinformatics analyses of taxonomical groupings of sequences using non-redundant subsets of the data.

Ares Lab Yeast Intron Database
This site contains information about the spliceosomal introns of the yeast Saccharomyces cerevisiae. This class of introns presents special problems for the annotation and analysis of eukaryotic genome sequences. Splice sites themselves are information-poor, and their recognition by the splicing apparatus is highly context-dependent. At present we do not understand splice site context well enough to predict which potential splice sites will be used, and thus how the genomic sequences will be expressed.

Exon-Intron Database
An exhaustive database of protein-coding intron-containing genes.

Any Comments, Questions? Support@hgmp.mrc.ac.uk

Welcome to the GenomeWeb Nucleic Acid Databases

Search Major Sequence Databases

BLAST searches

Other Searches

Miscellaneous Nucleic Databases

Detailed information on the above options

Welcome to the GenomeWeb
Nucleic Acid Databases