Prokaryotic Genome Databases

E. coli Genome Center
The E. coli Genome Center is a laboratory of the Genetics Department, College of Agricultural and Life Sciences, at the University of Wisconsin - Madison Campus.

This is the complete sequence of the E. coli K-12 genome and several of the E. coli phages. The data is being analysed.

Functional search of the known genomes using E. coli
A tool for function keyword searching. They have built a table that relates E. Coli to Saccharomyces cerevisiae, Methanococcus jannaschii and Bacillus subtilis using blast with a Karlin Altschul score of < 10E-17. This keyword searching tool will print a list of every sequence identifier that is close to the E. Coli gene of the cluster where the keyword is found.

EcoCyc: Encyclopedia of E. coli Genes and Metabolism
EcoCyc is a project to describe the genes and intermediary metabolism of the bacterium E. coli. It will describe each pathway and bioreaction of E. coli metabolism, and the enzyme that carries out each bioreaction, including its cofactors, activators, inhibitors, and the subunit structure of the enzyme. When known, the genes encoding the subunits of an enzyme will be listed, as well as the map position of a gene on the E. coli chromosome. In addition, the KB will describe every chemical compound involved in each bioreaction, listing synonyms for the compound name, the molecular weight of the compound, and in many cases its chemical structure.

GenProtEC - E.coli genome and proteome database
GenProtEC is a database dedicated to E. coli genome and proteome. It aims to provide biochemists with the most updated and consolidated information about E. coli genes and proteins resulted from both the traditional experimental reseach and our computational analysis.

RegulonDB - a database on transcriptional regulation in Escherichia coli
RegulonDB is a DataBase that integrates biological knowledge of the mechanisms that regulate the transcription initiation in E.coli, as well as knowledge on the organization of the genes and regulatory signals into operons in the chromosome. The operon is the basic structure used in RegulonDB to describe the elements and properties of transcriptional regulation. The current version contains information around 500 regulatory mechanisms, mainly for sigma 70 promoters.

Blast protein against the complete genomes of Saccharomyces, Methanococcus, E. coli and B. subtilis
This Blast interface allows you to search against two subsets of the available putative open reading frames of these genomes using blastp. A

Computational Functional Genomics
This contains comparative E. coli and Yeast Genome, Transcriptome, Proteome, Physiome and Biome data

Comparative genome sequences
E.coli in-frame genome engineering
Gene clustering
Yeast mRNA Abundance Data
Motif software
E.coli motifs
E.coli Proteomics: Subcellular localization, abundance, protein sequence, ESI-MS, 2D Gels
E.coli multiplex competition selections (phenome)

Genome Information Broker for Microbial Genomes
We compiled the DDBJ entries that had been submitted by Escherichia coli genome project teams worldwide into a preliminary database from 0 min. to 100 min. It includes the sequences from 28 min. to 50min., that the Japan Escherichia coli genome project team determined in 1996 and registered into DDBJ by December 18th.

The E. coli Index
These pages contain a comprehensive guide to information relating to the model organism Escherichia coli.

TubercuList - Mycobacterium spp. genome data
Its purpose is to collate and integrate various aspects of the genomic information from M. africanum, M. bovis, M. bovis BCG, M. canetti, M. microti, and above all, M. tuberculosis. TubercuList provides a complete dataset of DNA and protein sequences derived from the paradigm strain M. tuberculosis H37Rv, linked to the relevant annotations and functional assignments. It allows one to easily browse through these data and retrieve information, using various criteria (gene names, location, keywords, etc.).

The M. pneumoniae Genome Project
The genome has a length of 816394 bp with a G+C content of 40.01 %. We predicted 677 open reading frames (ORFs) with an average molecular weight of 39500 kDa. Adding the number of ORFs to the amount of RNAs (5S-, 16S-, 23S-rRNA, 33 tRNAs, 4.5S RNA, 10Sa RNA and RNaseP RNA) we define 716 coding regions (88.7%) in the genome. Allmost 6 % of the genome is engaged by the 4 repeptitive sequences of the P1 operon of M. pneumoniae. The derived gene density is one gene per 1.14 kb. So far, 50 % of all ORFs/genes showed a significant sequence homology to defined ORFs/genes with known function from other bacteria.

SubtiList - Bacillus subtilis genome project
This is a database dedicated to the analysis of the Bacillus subtilis genome: SubtiList.

The purpose of this database is to integrate various aspects on the genomic information of B. subtilis, the paradigm of sporulating Gram-positive bacteria. As such, it provides a clean dataset of non-redundant DNA sequences of B. subtilis (strain 168), associated to relevant annotations and protein sequences. It allows one to easily browse through these data and retrieve information, using various criteria (gene names, keywords, location, etc.).

The data contained in SubtiList originates mainly from the B. subtilis genome sequencing project, but this dataset also benefits from the B. subtilis entries present in the EMBL/GenBank/DDBJ databanks.

Micado: MICrobial Advanced Database Organization
The database is primarily devoted to the Bacillus subtilis genome sequencing program. It links the genetic map of the microbe with its sequences, together with other bacteria. DNA comes from primary databanks entries, plus data from the SubtiList database.

Haemophilus influenzae Rd Genome Database (HIDB)
The Haemophilus influenzae Rd genome is the first genome of a free living organism to be completed. This page offers access to the latest versions of the sequence data and related annotation.

Mycoplasma genitalium Genome Database (MGDB)
The Mycoplasma genitalium genome is the first genome of a gram positive-like bacterium to be completed. This page offers access to the latest versions of the sequence data and related annotation.

TB Genomes analysis server
The Mycobacterium tuberculosis genome analysis server.

BLAST searches of predicted ORFS and annotated ORFS and post-blast search tools
MYCdb web browser and data retrieval
Three levels of query complexity/views on the Genome and its associated data
Links to other major sites and viewers
'Find a gene' Using established gene names

The server is unique in that it provides an easy to use interface to browse MYCdb with inclusion of all public sequence EMBL entries of TB Genome sequences in MYCdb . The easy to use Web interface includes features to search MYCdb, retrieve files and provides a graphical view of the data.

Links for retrieved sequences allow connection to the DDBJ javaserver in Japan.

Searches can be performed to retrieve genome sequence, predicted ORFS and annotated ORFS, of both the Sanger sequenced TB genome and partial leprae genome.

CyanoBase
Synechocystis/CyanoBase provides an easy way of accessing the sequence and all-inclusive annotation data through image maps, keyword searches and the gene category list.

The cyanobacterium carries a complete set of genes for oxygenic photosynthesis, which is the most fundamental life process on the earth. This organism is also interesting from an evolutional viewpoint, for it was born in a very ancient age and has survived in various environments. Chloroplast is believed to have evolved from cyanobacterial ancestors which developed an endosymbiontic relationship with a eukaryotic host cell.

Chlamydia Genome Project
The goal of the Chlamydia Genome Project is to determine the DNA sequence of the chromosome of Chlamydia trachomatis, serovar D (D/UW-3/Cx), trachoma biovar, and L2/434/Bu, LGV biovar. The project is a collaborative effort involving scientists at the University of California at Berkeley and Stanford University .

Pseudomonas Genome Project
The bacterium Pseudomonas aeruginosa causes significant infections in humans. People with cystic fibrosis, burn victims, individuals with cancer, and patients requiring extensive stays in intensive care units are particularly at risk. Greater knowledge of the DNA sequence of the Pseudomonas genome will suggest directions for novel drug development and new therapeutic strategies for treating these infections.

ARCHAIC: ARCHAebacterial Information Collection
The aim of ARCHAIC is to analyze archaebacterial genomic DNA sequences that have been determined and will be determined by ourselves and by other groups, by the same standard in a consistent way, in order to understand the overall organization of these genomes and in order to permit comparison of different species on the basis of their genomic DNA sequences.

Ureaplasma urealyticum - The Complete Genomic Sequence

Data Analysis
Sequence Data
Contact Information

Pyrococcus horikoshii OT3 database
NITE first worked on a hyperthermophile found in the hot waters of the Okinawa Trench. (The microorganism can grow at high temperatures, favoring a temperature of 98C.) The DNA of this organism has 1.74 million base pairs. Among these, it is estimated that there are 2,061 genes for heat-resistant proteins and enzymes.

M. thermoautotrophicum gene classification table

Amino Acid Metabolism
Purine, Pyrimidine, Nucleoside and Nucleotide Metabolism
Sugars
Transcription and Translation
Cellular Processes and Cofactor Metabolism
Energy Metabolism
RNA products
Other

Methanococcus jannaschii Functions database
This page provides an updating of the functional content of the first completely sequenced Archaeal genome, this of Methanococcus jannaschii.

Sulfolobus solfataricus P2 Genome Project
Our goal has been to sequence the entire 3.0 Mbp genome of Sulfolobus solfataricus P2. We aimed at high quality sequencing with a low error rate, thus we have attempted to sequence the entire genome on both DNA strands. With minor exceptions, this goal has been achieved.

The WWW Virtual Library: Microbiology
This has links to many microbiology sites.

DOE Microbial Genome Program
Description of the DOE Microbial Genome Initiative, including which organisms are being sequenced and who is contracted to sequence them.

Genome Information Broker for Microbial Genomes
The GIB holds information on the following genomes:

Saccharomyces cerevisiae
Aquifex aeolicus
Bacillus subtilis
Borrelia burgdorferi
Chlamydia trachomatis
Escherichia coli
Haemophilus influenzae
Helicobacter pylori
Mycobacterium tuberculosis
Mycoplasma genitalium
Mycoplasma pneumoniae
Synechocystis PCC6803
Treponema pallidum
Archaeoglobus fulgidus
Methanobacterium thermoautotrophicum
Methanococcus jannaschii
Pyrococcus horikoshii

The following services and views of the data are available:

Genomic View : displays genome information in diagram.
Retrieve Clone : retrieves clone information.
Retrieve ORF : retrieves ORF information.
Retrieve Gene : retrieves gene information.
Thumbnail sketch of this server : illustrates a brief overview of the server system.
Genome sequence FTP service : Allows you to obtain nucleotide sequences.

TIGR Microbial Database
TIGR Microbial Database: a listing of microbial genomes completed and in progress.

HOBACGEN : Homologous Bacterial Genes Database
HOBACGEN is a database system that contains all the protein sequences of bacteria organized into families. It allows one to select sets of homologous genes from bacterial species and to visualize multiple alignments and phylogenetic trees. Thus HOBACGEN is particularly useful for comparative genomics, phylogeny and molecular evolution studies on bacteria.

Microbial Genomics
These microbial genome pages were created as a reference for the community and contain a list of current or completed eubacterial, archaeal and eukaryotic genome sequencing projects. Each main page includes the name of the organism being sequenced, which sequencing group(s) are involved in the effort, background information on the organism, and its current evolutionary location

The Archaeon Pyrobaculum aerophilum Genome Project
Pyrobaculum aerophilum is a rod-shaped hyperthermophilic archaeon that has recently been isolated from a boiling marine hole. The goal of the project is to complete sequencing and annotating its 2.3 Mbp genome.

Microbial genomes at NCBI
A collection of microbial genome information.

Genomes at LMB
Directories of the sequences of selected organisations.

Microbial Genomes at Sanger Centre
The Sanger Centre microbial sequencing effort is concentrated on pathogens and model organisms.

Data is accessible in a number of ways; for each organism there is a BLAST server, allowing you to search the sequences with your own query and retrieve the matching contigs. Sequences can also be downloaded directly by FTP. The Sanger Centre also provides an omniBLAST server, allowing a quick search of all the sequence databases provided here.

In addition, for those organisms being sequenced using a cosmid approach, finished and annotated cosmids are submitted to EMBL and other public databases (these continue to be accessible by BLAST and FTP from here). The annotation for genomes sequenced by whole genome shotgun will be released upon publication.

Any Comments, Questions? Support@hgmp.mrc.ac.uk

Welcome to the GenomeWeb Prokaryotic Genome Databases

E. coli

Other organisms

Other prokaryote genome information

Detailed information on the above options

Welcome to the GenomeWeb
Prokaryotic Genome Databases