Protein Database Searching

SWISS-PROT
SwissProt is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc), a minimal level of redundancy and high level of integration with other databases.

SWISS-PROT was established in 1986 and is maintained collaboratively, since 1987, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Outstation - The European Bioinformatics Institute (EBI).

OWL
OWL is a non-redundant composite protein sequence database produced from the following source databases SWISSPROT, PIR 1-3, GenBank (translations), Brookhaven.

PIR Protein Information Resource
PIR is an integrated protein information resource, utilizing information from the DNA Database in Japan (DDBJ) and the Martinsried Institute for Protein Sequences (MIPS).

Search Databases at EBI (EMBL)
The EBI provides facilities to search for sequences by text or by sequence similarity and to submit new sequences.

Search Databases at NCBI (GenBank)
The NCBI provides facilities to search for sequences by text or by sequence simi larity and to submit new sequences.

BLAST search of databases at NCBI
BLAST is a program that allows you to search for similarity between your query sequence and the gene sequences held at the NCBI.

BLAST search of databases at NCGR
BLAST is a program that allows you to search for similarity between your query sequence and the gene sequences held at the NCGR.

BLAST2 search of databases at EMBL
BLAST2 is a program that allows you to search for similarity between your query sequence and the gene sequences held at EMBL. It is similar to the original BLAS T program, but it includes gaps in the alignments.

BLITZ search of databases at EBI
BLITZ does a very sensitive and extremely fast comparison of your protein sequences against the Swiss-Prot protein sequence database using the Smith and Waterman be st local similarity algorithm.

SAS - Sequence Annotated by Structure
SAS is a tool which aims to bridge the gap between protein sequence and structural analysis, and aid identification of a protein sequence, by using structural information to recognise distant homologues.

SAS can apply key structural features to the results of sequence or threading searches at a sequence level. Residues in the sequences of known structures are coloured by selected structural properties, and are displayed using a Web browser or downloadable PostScript file.

ProtoMap: An automatic hierarchical classification of all swissprot proteins
This site offers an exhaustive classification of all the proteins in the swissprot database, into groups of related proteins.

The resulting classification splits the protein space into well defined groups of proteins, most of them are closely correlated with natural biological families and superfamilies. The hierarchical organization may help to detect finer subfamilies that make up known families of proteins as well as interesting relations between protein families.

PDB_ISL (intermediate sequence search)
PDB_ISL (intermediate sequence search) is a sensitive and fast search procedure. It is useful for finding sequences which are in the PDB protein structure database which are homologous to a sequence of unknown structure.

The server utilizes the intermediate sequences which have been collected from a larger sequence database. These sequences have been found from searches of the domains in the SCOP database against NRDB using PSI-BLAST.

GeneNet
GeneNet is a meta-search system for the analysis of sequence similarity and is designed for helping biologists to analysis sequences efficiently via WWW. It also performs periodical searching that prevents biologists from repetitive analysis of the same sequence.

GeneNet can communicate simultaneously with four databases (GenBank in NCBI, PDB, BLOCKS, and KEGG.) which are widely used. For protein sequences, searches are performed to four databases described above. For DNA sequences, only GenBank analysis is possible.

iProClass - Integrated Protein Classification Database
The iProClass is an integrated resource that provides comprehensive family relationships and structural/functional features of proteins, with rich links to various databases.

It currently consists of non-redundant proteins organized with superfamilies, domains, motifs, post-translational modification sites, and links to more than 30 databases of protein families, structures, functions, genes, genomes, literature, and taxonomy. Protein and superfamily summary reports provide rich annotations, including membership information with length, taxonomy, and keyword statistics, full family relationships, comprehensive enzyme and PDB cross-references, and graphical feature display. The iProClass can facilitate classification-driven annotation for protein sequences and complete genomes, and support structural/functional genomics and proteomics research.

EMBL Sequence Alerting System
The sequence alerting system search each day in several databases for news on (homologues of)"your" sequence and will inform you by email if it has detected a new relative.

DNA and protein sequences are accepted as query, but only protein databases are screened. You can specify some parameters of the searches in order to optimise it for your particular problem.

Expasy Sequence Alerting System
New Protein Sequences, which are added to the Swiss-Prot database on a weekly bases can be scanned following a user-defined query. The searches are performed on the current non-cummulative weekly additions only. This will allow researchers to be aware of new protein sequences, related to their interests, before the actual database release.

All Results will be returned to you via E-mail.

You can specify peptide sequences, or Prosite-style patterns or keywords in the various annotation fields.

EC Enzyme Database
This is a repository of information relative to the nomenclature of enzymes. It is primarily based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) and it describes each type of characterized enzyme for which an EC (Enzyme Commission) number has been provided.

MIPS - a database for protein sequences and complete genomes
This gives quick access to the results of analysing the protein in various genomes:

YEAST
ATHALIANA
PROTFAM
ATLAS
ALERT
HPT-Search
FASTA
FETCH
ALIGN
PEDANT
Human cDNA project
MITOP
ORPHEUS

Kabat Database of Proteins of Immunological Interest
This includes amino acid sequences, related nucleotide sequences and aligned sequences. It is maintained by Dr. C. Wu, Technological Institute, Northwestern University.

Protein Research Foundation (PRF)
Comprehensive bibliographies of:

biologically active peptides
chemistry of amino acids, peptides and proteins
primary structures of proteins
higher order structures of proteins
functional sites and motifs in proteins
protein mutagenesis and mutational diseases
Literature Database (PRF/LITDB)
Peptide/Protein Sequence Database (PRF/SEQDB)
Synthetic Compounds Database (PRF/SYNDB)

HotMolecBase
HotMolecBase is a collection of biomedically interesting molecules like p53, prion protein, huntingtin, presenilin-1, and others. The entries for those molecules cover details about their cellular and molecular biology that cannot be found somewhere else on the web in this concentrated form. Their involvement in diseases, together with medical applications (progress in diagnosis and treatment), is also included. HotMolecBase focusses on especially promising molecules that are potential targets for drug development.

2-D PAGE Databases
The Danish Centre for Human Genome Research's 2-D PAGE Databases contain data on proteins identified on various 2-D PAGE reference maps.

Databases for the study of global cell regulation and skin diseases
- Human keratinocytes-IEF Database
- Human keratinocytes-NEPHGE Database
Procedures for preparing 2-D gels
Databases for the study of bladder cancer
- Human transitional cell carcinomas IEF database
- Human transitional cell carcinomas NEPHGE database
Human Urine - IEF database
Other 2-D PAGE Human Protein Databases
- Human MRC-5 Fibroblasts IEF Database
- Human MRC-5 Fibroblasts EPHGE Database
2-D Gel Gallery of Human Cell Types and Fluids
- Cultured Cell Types
- Non Cultured Cell Types
- Fluids

NRL_3D Protein Sequence Structure Database
This is a protein sequence database derived from high resolution x-ray structures of proteins deposited in the Brookhaven National Laboratory's Protein Data Bank (PDB). It is distributed by the Protein Information Resource (PIR), at the National Biomedical Research Foundataion.

O-GlycBase - O-glycosylated proteins
O-GLYCBASE is a revised database of O-glycosylated proteins. Version 2.0 has 127 glycoprotein entries containing 627 O-glycosylation sites. The criteria for inclusion are at least one experimentally verified O-glycosylation site. The terminal sugar linked to serine or threonine is cited when known. The database is non-redundant in the sense that it contains no identical sequences. Mucins have tandem repeat sequences, which are O-glycosylated. This results in some redundancy of the O-glycosylation sites.

PROLYSIS - protease and protease inhibitors
This is a resource for those interested in proteases and their natural or synthetic inhibitors.

PhosphoBase - a database of phosphorylation sites
PhosphoBase is a revised database of phosphorylation sites in proteins. Information about the position of phosphorylated serines, threonines, or tyrosines and relevant kinetic parameters are presented.

Amino Acid Information

General information
Suggested amino acid substitutions
Chemical properties
Structural properties
Genetic properties

PROWL - Databases, Knowledgebases, Software
Databases

MassBank
ProteinInfo
ProFound
PepFrag
MatrixDepot
MassRef

Knowledgebases

Amino acids
Peptides
Protocols
MS contaminants
EC Numbers
Protease families

Software

Archive
PAWS
M/Z
Peptide calculator
DNA calculator
Sugar calculator

BIND - Biomolecular Interaction Network Database
We have designed and implemented a new database encompassing the growing network of protein and other biomolecular interactions, called BIND (Biomolecular Interaction Network Database).

This database will span the complexity of interaction information gathered through experimental studies of biomolecular interactions. Interaction information will come from the literature, submitters and other databases.

BIND contains interaction, molecular complex and pathway records.

Any Comments, Questions? Support@hgmp.mrc.ac.uk

Welcome to the GenomeWeb Protein Database Searching

Major Sequence Databases

Search Databases

Sequence Database Alerting Systems

Other Sequence Databases

Detailed information on the above options

Welcome to the GenomeWeb
Protein Database Searching