|
PROSITE
PrositeScan - search the PROSITE database with your sequence
ProfileScan - Search the profiles-entries in PROSITE with your sequence
Frame-ProfileScan - Search DNA sequence vs. a protein profile database
PatternFind - search a protein database with a pattern
PRINTS
Pfam
ProDom
Blocks
SBASE
MOTIF - Search for protein sequence motifs
ProClass
Clusters of Orthologous Groups (COGs)
MODULES in Proteins
SMART - Simple Modular Architecture Research Tool
3Dee - Database of Protein Domain Definitions
Proteome Analysis @ EBI
InterPro
CluSTr (Clusters of SWISS-PROT+TrEMBL proteins) database
CDD: A Conserved Domain Database and Search Service
PROSITE
PROSITE is a method of determining what is the function of
uncharacterized proteins translated from genomic or cDNA sequences. It
consists of a database of biologically significant sites, patterns and
profiles that help to reliably identify to which known family of protein
(if any) a new sequence belongs.
PrositeScan - search the PROSITE database with your sequence
This allows you to search one or more sequences against the current
release of Amos Bairochs PROSITE database.
ProfileScan - Search the profiles-entries in PROSITE with your sequence
This uses the pfscan program to search a single sequence against all
profile entries in the current release of PROSITE. The PROSITE
collection of protein sequence motifs contains a large number of
patterns and currently only a few profiles. The particular strength of
profiles is that they can be used to describe very divergent protein
motifs.
Frame-ProfileScan - Search DNA sequence vs. a protein profile database
This server uses the frame-search capabilities of pfscan to query the
collection of prosite profiles (including pre-release) with a single DNA
sequence. The six reading frames of the DNA query are inspected.
Coding frameshifts in the DNA sequence are supported. Since
frame-tolerant searches consume lots of cpu-time, DNA sequence length is
limited to about. 2400 bases.
PatternFind - search a protein database with a pattern
This takes a user-defined pattern (PROSITE-format or regular expression)
and searches a protein database. It offers several useful output
options.
PRINTS
PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to
characterise a protein family; its diagnostic power is refined by iterative scanning of OWL. Usually the
motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space.
Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single
motifs: the database thus provides a useful adjunct to PROSITE.
Pfam
Pfam is a high-quality comprehensive collection of protein domain families.
ProDom
PRODOM is a comprehensive collection of protein families. It was
constructed by clustering all complete protein sequences in Swiss-prot by the
clustering algorithm Domainer (Sonnhammer and Kahn, 1994).
The novelty of ProDom is that the
modular arrangement of proteins have been taken into account and whenever
domain boundaries were detected the sequences were cut to produce consistent
families of domains.
Blocks
Blocks is operated by the Fred Hutchinson Cancer Research
Center. An aid to detection and verification of protein
sequence homolgies, Blocks compares a protein or DNA sequence
to a database of protein blocks. Blocks are short multiply
aligned sequences corresponding to the most highly conserved
regions of proteins. The rationale behind searching a
database of blocks is that information from multiply aligned
sequences is present in a concatonated form, reducing
background and increasing sensitivity to distant
relationships.
SBASE
SBASE is a database of annotated protein domains. SBASE is searchable
by subfields, cross-referenced to Swiss-Prot, PROSITE and EMBL, MEDLINE,
MEDLARS, OMIM, PRODOM, PRINTS and BLOCKS.
There is an interface to a Blast mailserver.
MOTIF - Search for protein sequence motifs
Search for protein sequence motifs in PROSITE PATTERN, PROSITE PROFILE,
BLOCKS, ProDom, PRINT, User defined profile.
ProClass
The ProClass database is a non-redundant protein database organized according
to family relationships as defined collectively by ProSite patterns and PIR
superfamilies. The ProClass database can facilitate protein family information
retrieval, unveil domain and family relationships, and classify multi-domained
proteins, by combining global and motif similarities into a single family
organization scheme.
Clusters of Orthologous Groups (COGs)
Clusters of Orthologous Groups (COGs) were delineated by comparing protein sequences encoded in 7
complete genomes, representing 5 major phylogenetic lineages. Each COG consists of individual proteins
or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain.
MODULES in Proteins
The module pages contain information and research tools on mobile protein domains.
SMART - Simple Modular Architecture Research Tool
This does a search with your protein sequence against a database of domain
profiles and displays a nice diagram of the domains together with low complexity regions, transmembrane regions etc.
You can then optionally do a BLAST search of the regions of your sequence which did not match a known domain.
3Dee - Database of Protein Domain Definitions
This database contains definitions of structural domains for all protein
chains in the Brookhaven Protein Databank (PDB) that have 20 or more
residues and are not theoretical models. The domains have been
clustered on sequence similarity and structural similarity to form
families. The families are stored as a hierarchy.
Updating does not require complete regeneration of the database and is almost completely automated so we expect to be able to complete updates every 1-2 months.
Proteome Analysis @ EBI
The genome sequencing projects are providing a vast amount of sequence
data which remain largely unexploited. With access to whole genome
sequences from various organisms and imminent completion of many more,
the SWISS-PROT group at the European Bioinformatics Institute (EBI) has
decided to develop a research-oriented initiative in order to utilise
all the existing resources and provide comparative analysis of the
predicted protein coding sequences of all complete genomes. The two
main projects used in this proteome analysis effort, InterPro and
CluSTr, are aiming to give a new perspective on domain structure and
function, gene duplication and protein families in different genomes.
Proteome analysis has already been produced for a number of completely sequenced organisms.
InterPro
InterPro is an Integrated Resource of Protein Domains
and Functional Sites. InterPro rationalises the complementary
efforts of the PROSITE, PRINTS, Pfam and ProDom database projects.
Each combined InterPro entry includes functional descriptions and literature references, and links are made back to the relevant member database(s), allowing users to see at a glance whether a particular family or domain has associated patterns, profiles, fingerprints, etc. Merged and individual entries (i.e., those that have no counterpart in the companion resources) are assigned unique accession numbers. Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL
CluSTr (Clusters of SWISS-PROT+TrEMBL proteins) database
The CluSTr (Clusters of SWISS-PROT+TrEMBL proteins) database offers an
automatic classification of SWISS-PROT + TrEMBL proteins into groups of
related proteins. The clustering is based on analysis of all pairwise
comparisons between protein sequences. The database provides links to
InterPro, which integrates information on protein families, domains and
functional sites from PROSITE, PRINTS, Pfam and ProDom. CluSTr also has
cross-references to HSSP and PDB.
CluSTr is a useful resource for whole genome analysis and has already been used for the proteome analysis of a number of completely sequenced genomes.
CDD: A Conserved Domain Database and Search Service
Proteins often contain several modules or domains, each with a distinct
evolutionary origin and function. The CD-Search service may be used to
identify the conserved domains present in a protein sequence.
Computational biologists define conserved domains based on recurring sequence patterns or motifs. CDD currently contains domains derived from two popular collections, Smart and Pfam, plus contributions from colleagues at NCBI. The source databases also provide descriptions and links to citations. Since conserved domains correspond to compact structural units, CDs contain links to 3D-structure via Cn3D whenever possible.
To identify conserved domains in a protein sequence, the CD-Search service employs the reverse position-specific BLAST algorithm. The query sequence is compared to a position-specific score matrix prepared from the underlying conserved domain alignment. Hits may be displayed as a pairwise alignment of the query sequence with a representative domain sequence, or as a multiple alignment.
Any Comments, Questions? Support@hgmp.mrc.ac.uk