Protein Family Databases

The AAA Protein Superfamily
The AAA (for ATPases Associated with various cellular Activities) protein superfamily is characterized by a highly conserved module of approximately 230 amino acid residues including an ATP binding consensus, present in one or two copies in the AAA proteins. AAA proteins are found in all organisms (Archaea, Eubacteria, Eukaryota: Protista, Fungi, Plants, Animals) and are essential for, e.g., cell cycle functions, vesicular transport, mitochondrial functions, peroxisome assembly, and proteolysis.

Aldehyde dehydrogenase (ALDH)
Aldehyde dehydrogenase (ALDH), often in tandem with Alchohol dehydrogenase, acts in detoxifying a wide variety of organic compounds, toxins and pollutants. Defects in ALDH leads to Sjogren-Larsson syndrome in humans.

CAZy - Carbohydrate-Active enZYmes

TITLE The CBS domain web page LINK http://www.sanger.ac.uk/Users/agb/CBS/CBS.html ; was LINK http://www.sanger.ac.uk/~agb/CBS/CBS.html ; Set Up: 29 June 1998 - GWW ; Last Checked: 14 Sept 2000 - GWW TEXT The CBS domain is widespread: found in all species. The CBS domain is named after Cystathionine Beta Synthase. All CBS domains identified to date occur in the cytoplasm or nucleus. The domain is about 60 residues long, and usually found in two or four copies per protein.

The Chaperonin Home Page
The Chaperonin Home Page is designed to be a repository for information about the important class of heat shock proteins known as chaperonins. It includes the GroEL/GroES mutation databases.

Chromatin Structure & Function Page
This site is intended to disseminate information regarding the rapidly evolving and highly exciting field of chromatin structure, proteins that modify chromatin structure, and the effects that these modifications have on cell function.

other chromatin sites
other Chromatin-Associated Proteins sites
papers & meetings

Chromo shadow domain
The chromo domain was originally identified as a protein sequence motif common to the Drosophila chromatin proteins, Polycomb (Pc) and Heterochromatin protein 1 (HP1).

Cytochrome P450 family
The family referred here to as FAD-dependent pyridine nucleotide reductases (FADPNR) includes FAD flavoproteins belonging to the family of pyridine nucleotide-disulphide oxidoreductases (glutathione reductase, trypanothione reductase, lipoamide dehydrogenase, mercuric reductase, thioredoxin reductase, alkyl hydroperoxide reductase), iron-sulphur protein reductases involved in oxidative metabolism of a variety of hydrocarbons.

Cytokines
This provides information about cytokines and their receptors including topological, evolutionary and mechanistic relationships between the molecules, and illustrations of known three-dimensional structures.

Dictionary of Cytokines
A dictionary of cytokine alternative names, elated factors, signal transducers etc. Links to other Cytokine-related sites.

Cytokines
This is a useful site for anyone interested in cytokines, adhesion molecules, growth factors and related agents.

The International Cytokine Society

society affairs
newsletters
The Cytokine-Interferon Open Forum

Cytokine Family cDNA Database (dbCFC)
The Cytokine Family cDNA Database (dbCFC) is a collection of EST (Expressed Sequence Tag) records of cytokines deposited in the NCBI GenBank. It provides information about the identification of EST records to cytokine members and related data contained in other databases including GenBank, dbEST, GDB, Online Mendelian Inheritance in Man (OMIM), The Transgenic/Targeted Mutation Database (TBASE), Unique Human Gene Sequence Collection (UniGene), Anatomical Expression Database of Human Genes (BodyMap), Mouse Genome Database (MGD) and Human/Mouse Homology Relationships.

Cytokines Online Pathfinder Encyclopaedia (COPE)
COPE consists of 6000 WWW pages hypertexted with 43000 links and 14000 references and covers all aspects of Cytokine research.

The DExH/D protein family database
This is a database covering the putative RNA helicases of the DEAD, DEAH, and DExH proteins.

DExH/D proteins are essential in all aspects of the RNA metabolism in the cell: they play important roles in

transcription
pre-mRNA splicing
RNA export
RNA degradation
ribosome biogenesis
translation
mitochondrial RNA splicing
development
replication of many viruses

EF-Hand Calcium-Binding Proteins
The EF-Hand Calcium-Binding Proteins Data Library is a growing collection of published sequence, structural, functional, and other information about EF-hand calcium-binding proteins and their roles in cellular signal transduction.

Eph Receptor Tyrosine Kinases & Their Ligands
This is a database for the Eph family of receptor protein tyrosine kinases and their ligands, the ephrins. Much excitement regarding this new family results from their roles in developmental neurobiology as molecular guides for axons. However, they may be involved in many other processes; cancer, angiogenesis, haematopoiesis, and kidney development. The expression patterns of members in this family suggest that their functions during development and in the adult organism is still relatively unknown.

ESTHER - ESTerases and alpha/beta Hydrolase Enzymes and Relatives
ESTHER (for esterases, [alpha]/[beta] hydrolase enzymes and relatives) is a database aimed at collecting in one information system, sequence data together with biological annotations and experimental biochemical results related to the structure-function analysis of the enzymes of the family.

FYVE finger
The FYVE finger is a novel zinc finger-like domain found in several proteins involved in membrane trafficing. The basic motif consists of 8 Cysteins, 4 of which are part of the core motif R+HHC+XCG (where '+' is a positively charged residue and 'X' is any aminoacid). This finger has only been observed as a single copy in each of the proteins and it has been shown to bind 2 zinc ions per finger.

G protein-coupled receptor database (GCRDb)
GCRDb was started in 1989 to keep track of all new sequence data of this biologically important class of proteins. The systematic collection of these data has been a large undertaking which has been aided by Amos Bairoich, Gert Vriend, Kevin Lynch and others.

Globin Gene Server
This provides access to sequence alignments and experimental results for the beta-like globin gene cluster of mammals

Glucoamylases
Glucoamylase (also known as amyloglucosidase) is an important industrial enzyme used in saccharification steps in both in Starch Enzymatic Conversion and in Alcohol Production.

Glycosyltransferases
This guide lists, and gives WWW links to sequence databases for:

1. Cloned eukaryotic glycosyltransferases involved in the biosynthesis of glycoproteins, glycolipids, glycosylphosphatidylinositols and other complex glycoconjugates (together with journal references)

2. Cloned prokaryotic glycosyltransferases involved in lipopolysaccharide biosynthesis

3. Cloned glucuronyltransferases and some yeast chitin synthases (Swissprot links only)

However, this guide does not cover glycosyltransferases involved in the metabolism of sucrose, trehalose, glucan or many other polysaccharides, nor does it list the many expressed sequence tags (ESTs) or predicted Caenorhabditis elegans glycosyltransferase sequences.

Histone, Histone Sequence Database
Database of aligned histone protein sequences. Also contains sequences of proteins identified as containing the histone fold motif. Structures of all known histone and histone fold proteins.

Information included regarding discrepancies between similar sequence entries in different source databases. Multiple sequence alignments for each histone.

Homeobox
Information relevant to homeobox genes (in particular about classification/evolution).

HOX Pro database
The HOX-Pro is aimed at

analysis and classification of regulatory regions in diverse homeobox and related genes-controllers of invertebrate and vertebrate development;
comparative analysis of organisation of HOX clusters and"hox-based" genetic networks for C.elegans, sea urchins, Drosophila and vertebrates;
analysis of phylogeny and evolution of homeobox genes and clusters.

The Integrin Page
Integrins are receptor proteins which are of crucial importance. They are the main way that cells both bind to and respond to the extracellular matrix.

InBase, The New England Biolabs Intein Database
Protein splicing is defined as the excision of an intervening protein sequence (the INTEIN) from a protein precursor and the concomitant ligation of the flanking protein fragments (the EXTEINS) to form a mature extein protein and the free intein (Perler 1994). Protein splicing results in a native peptide bond between the ligated exteins (Cooper 1993). Extein ligation differentiates protein splicing from other forms of autoproteolysis. Inteins are named with a 3 letter genus/species designation followed by the extein gene name. If more than 1 intein is present in an extein gene, the inteins are given a numerical suffix.

Ion Channel Network
The Ion Channel Network (ICN) is a pilot WWW site aimed at making distributed information about ion channel molecules more 'accessible' and 'systematic' in coverage.

Ion Channel Resources
Resources for ion channel research

researchers
ion channel toxins
human Kv sequences
recent articles
reference list
biophysical software tools
publications
ion channel basics
ion channel links

Inteins - protein introns
Inteins are proteins inserted in-frame and translated together with their host proteins. The precursor protein then undergoes protein splicing resulting in two products: the host protein and the intein. This reaction is autoproteolytic.

This site mainly focuses on intein sequence motifs and evolution.

The Kinesin Home Page
Kinesin is a mechanochemical protein capable of utilizing chemical energy from ATP hydrolysis to generate mechanical force. In the presence of ATP, kinesin can bind to and move on microtubules. The ability to translocate along the microtubule lattice has led to the classification of kinesin as a microtubule motor protein. Kinesin is unrelated in sequence to the other known class of microtubule motor proteins, the dyneins, and is thought to perform functions in the cell distinct from the dyneins.

Kinesins
Information on the kinesin protein family.

Labial Homeobox
The homeodomain sequences and references of over 40 putative labial genes among metazoan organisms, as well as several hexapeptide sequences, are presented in alignments.

Lipase Engineering Database
This provides information on sequence and structure of lipases to facilitate protein engineering.

MADS-box Gene
The MADS box is a highly conserved sequence motif found in a family of transcription factors. The conserved domain was recognized after the first four members of the family, which were MCM1, AGAMOUS, DEFICIENS and SRF (serum response factor). The name MADS was constructed form the "initials" of these four"founders".

MEROPS - The Peptidase Database
This database employs a structure-based classification of peptidases by clan, and family. Each peptidase is given a unique MEROPS identifier. Links give access to database entries for the enzymology, protein and nucleic acid sequences, tertiary structures, genetics and more.

Metallothionein
Metallothioneins (MTs) are ubiquitous low molecular weight proteins and polypeptides of extremely high metal and sulfur content. They are thought to play roles both in the intracellular fixation of the essential trace elements zinc and copper, in controlling the concentrations of the free ions of these elements, in regulating their flow to their cellular destinations, in neutralising the harmful influences of exposure to toxic elements such as cadmium and mercury and in the protection from of a variety of stress conditions.

Olfactory Receptor DataBase
ORDB is a database of sequences of olfactory receptor proteins. It contains public and private sections which provide tools for investigators to analyze the functions of this very large gene family of G protein-coupled receptors. It also provides links to a local cluster of databases of related information, and to other relevant databases worldwide.

Pentapeptide repeat
The pentapeptide repeat is found in several bacterial proteins of uncertain function [1]. Pentapeptide repeat proteins contain a striking repeat of five residues which can be clearly seen in self dot-plots. The repeat can be approximately described as A(D/N)LXX, where X can be any amino acid. This family is found to have many members in the bacterial genome of the cyanobacterium Synechocystis sp.

PROMISE - The Prosthetic groups and Metal Ions in Protein Active Sites Database
The PROMISE (Prosthetic centres and metal ions in protein active sites) database aims to gather together comprehensive sequence, structural, functional and bibliographic information on proteins which possess prosthetic centres, with an emphasis on active site structure and function.

Protein Kinase Database Project
The Protein Kinase Database Project at SDSC aims to create a system that is narrowly focused on kinases, phosphatases, and related molecules. It will integrate structural, genetic, and molecular biological data.

Protein Spotlight
Protein Spotlight is a periodical electronic review from the SWISS-PROT group of the Swiss Institute of Bioinformatics (SIB). It is published on a monthly basis and consists of articles focused on particular proteins of interest.

The Ribonuclease P Database
The RNase P Database is a compilation of RNase P sequences, sequence alignments, secondary structures, three-dimensional models, and accessory information. The database primarily contains information on the bacterial and archaeal enzymes, focusing on the RNA subunit. Some information is also included on the eucaryal and organellar RNase P RNAs.

SAND domain
The SAND domain adds to the burgeoning set of domains present in modular chromatin-associated proteins. The functions of most of these domains are not at all well understood, and gaining a better understanding will be one key to understanding how chromatin is assembled and regulated. The SAND domain appears in various nuclear contexts. Sp100/Sp140 are found in recently described nuclear bodies or dots, discrete structures within the nucleus that do not yet have known functions.

SANT domain
SANT domains are a epeated motif in N-CoR, the nuclear receptor co-repressor.

The Thyroid Hormone Receptor Resource
The TRR provides a variety of information on TRs as well as more general information shared with other NHRR sites.

Wnt and Frizzled gene Homepage
Wnt proteins are now recognized as one of the major families of developmentally important signaling molecules, with mutations in Wnt genes displaying remarkable phenotypes in the mouse, Caenorhabditis elegans, and Drosophila. Among functions provided by Wnt proteins are such intriguing processes as embryonic induction, the generation of cell polarity, and the specification of cell fate.

Any Comments, Questions? Support@hgmp.mrc.ac.uk

Welcome to the GenomeWeb Protein Family Databases

Detailed information on the above options

Welcome to the GenomeWeb
Protein Family Databases