Welcome to the GenomeWeb
Protein 3D Structure Analysis

Search for:

These are a collection of protein 3D structure analysis and database sites.

[info] A Guide to Structure Prediction
[info] PDB Protein Data Bank
[info] Structural Classification of Proteins (SCOP)
[info] CATH, The CATH Protein Structure Classification
[info] CPE Protein Structure Prediction Pages
[info] UCLA-DOE Structure Prediction Server
[info] Database of Comparative Protein Structure Models (ModBase)
[info] Protein Sequence Analysis (PSA)
[info] Predicting Protein-3D structures based on homologous sequence search
[info] Sacch3D - Structural Information for Yeast Proteins
[info] The Protein Structure Database (PSdb)
[info] Image Library of Biological Macromolecules
[info] A Library of Proteins Family Cores (LPFC)
[info] Protein Topology Home Page
[info] VAST - Vector Alignment Search Tool
[info] DALI - compare protein structures in 3D
[info] GETAREA - Predicted Solvent Accessible Surface Areas
[info] Biomer - molecular modeling program
[info] 3 Dee - Database of Protein Domain Definitions
[info] Genome-Structure Spider
[info] Folding@home
[info] The CPHmodels Server

Detailed information on the above options

A Guide to Structure Prediction
This is a summary of a general approach to the problem of structure prediction.

The assumption is that you have a sequence of a protein that you want to know more about. Before you start, remember that this approach will not always provide satisfying or complete answers. However, it is increasingly rare that the techniques described here fail to shed any light on a protein sequence. Just a little time to analyse a sequence can possibly save time and money by aiding experimental design.

PDB Protein Data Bank
The PDB is a database of crystallographic protein structures, maintained at the Brookhaven National Laboratory, Upton, NY It contains atomic coordinates for the 3-dimensional structure of biomolecules obtained using x-ray, electron or neutron diffraction, nuclear magnetic resonance or molecular modelling.

Structural Classification of Proteins (SCOP)
SCOP aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provide s a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.

CATH, The CATH Protein Structure Classification
CATH is a hierarchical classification of protein domain structures which clusters proteins at four major levels, class(C), architecture(A), topology(T) and homologous superfamily (H). Hyperlinks are provided to several secondary sources such as PDB summary files and OWL.

CPE Protein Structure Prediction Pages
This holds information related to research on Protein Structure Prediction, i.e. attempts to solve"The folding problem" and particularly information about recent and forthcoming structure prediction competitions, meetings and network services.

UCLA-DOE Structure Prediction Server
The UCLA-DOE Fold-Recognition server is a project aimed to help in the computational analysis and prediction of structure from amino acid sequences. It provides easy access to the results from various programs. These include various methods developed in this lab as well as other methods from around the world. Rather than a set of programs and www links, it is a comprehensive package providing users with computation time, storage and collection of data, and organization of the results for easy analysis.

Database of Comparative Protein Structure Models (ModBase)
ModBase is a queryable database of many annotated comparative protein structure models. The models consist of coordinates for all non-hydrogen atoms in the modeled part of a protein. They are derived by an automated modeling pipeline relying mainly on the program MODELLER.

The database also includes fold assignments and alignments on which the models were based. In addition, special care is taken to assess the overall quality of the models and their accuracy at the residue level.

Protein Sequence Analysis (PSA)
The PSA server analyzes your protein sequence and determines which of 209 sequence-structure models, spanning 15 different protein folding classes, are the most probable explanations of your sequence. The analysis results (PostScript files depicting the folding-class probabilities and secondary-structure probabilities) are returned to you by e-mail.

The PSA e-mail server is particularly suited for analyzing novel sequences that are unlike any others in the sequence databanks.

Predicting Protein-3D structures based on homologous sequence search
This server is dedicated to find homologous PDB sequences to a given query sequence. It uses a version of NRDB that includes all the PDB entries (excluding the BRK_MOD sequences and sequences only containing 'X's). Sequences are compared to this database with PSI-BLAST using an e-value cutoff of 0.001, and a maximum of five iterations.

Coiled coil, transmembrane regions and low complexity regions are automatically filtered out from the query sequence, using COILS, TMpred and SEG, respectively. A graphical overview is given for the matched regions between the query sequence and found hit sequences.

The accuracy of the prediction was estimated to be above 98%, based on the results from a test set of 685 PDB sequences extracted with PDB-select that have less than 25% identity to each other.

Sacch3D - Structural Information for Yeast Proteins
Sacch3D is a facility offered by the Saccharomyces Genome Database to present structural information about yeast proteins. Here you can find text, graphics, and interactive 3D images to help you explore the structure and function of yeast proteins.

The Protein Structure Database (PSdb)
The Protein Structure Database (PSdb) relates secondary (e.g. Helix, Sheet, Turn, Random Coil), supersecondary (e.g., helix-helix interactions), and tertiary information (e.g. Solvent accessibility, internal relative distances, and ligand interactions) to the primary structure. The data for each protein is supplied on a residue by residue basis and encoded in a series of flat ASCII files.

Relationships between the various levels of structure (primary, secondary, tertiary) can be investigated visually using PSdbView, a graphical tool provided to view the information within the PSdb. This tool allows for side by side comparison of residue based data and includes a variety of standard mechanisms for visualizing protein data including Ramachandran plots, C(alpha)-C(alpha) distance plots, and differences in solvent accessible molecular surface area graphs (e.g., differences in the exposed surface with and without including either the ligands, metalions or buried waters in the computations).

RELIBase is an archive for structural data about receptor/ligand complexes.

The main purpose of RELIBase is to provide an selective and efficient access to the receptor/ligand complexes currently deposited in the Brookhaven Protein Databank (PDB) and to make the enormous wealth of information contained in the receptor/ligand structures available for structure based drug design studies.

The www public accss relibase data base and search tools can be used to input a sub-structure search object either by text, a smiles string, or by an interactive java based molecule editor, and the system can perform the following functions:

Image Library of Biological Macromolecules
An access to graphical structural information on biological macromolecules. The Image Library contains structural images of RNA, DNA and proteins deposited at PDB and NDB.

A Library of Proteins Family Cores (LPFC)
Core structures computed from structural alignments of protein families

Protein Topology Home Page
Users supply a target protein domain which can be compared with with a representative set of 3000 domains, or the entire PDB (15300 domains, as of April 1998). You can upload a file containing the description of your target protein, either as a PDB format file, or as a Tops file. The system will email you the domains in the representative set, ordered by distance from your target protein, annotated with their CATH codes (where known) and also with the distance measure. A larger distance measure indicates a remoter topological relationship.

VAST - Vector Alignment Search Tool
VAST Search is a service offered by the NCBI Structure Group that allows to search for structure neighbors starting with 3D-coordinates specified by the user. This service is meant to be used with newly determined protein structures, which are not part of MMDB yet. Structure neighbors for proteins in MMDB can be looked up from MMDB's structure summary pages!

Protein structure neighbors in Entrez are determined by direct comparison of 3-dimensional protein structures with the VAST algorithm. Each of the more than 15,000 domains in MMDB is compared to every other one. From the MMDB structure summary pages, retrieved by Entrez, structure neighbors are available with the click of a button.

DALI - compare protein structures in 3D
With a rapidly growing pool of known tertiary structures, the importance of protein structure comparison parallels that of sequence alignment. We have developed a novel algorithm (DALI) for optimal pairwise alignment of protein structures. The three-dimensional coordinates of each protein are used to calculate residue-residue (Calpha-Calpha) distance matrices. The distance matrices are first decomposed into elementary contact patterns, e.g., hexapeptide-hexapeptide submatrices. Then, similar contact patterns in the two matrices are paired and combined into larger consistent sets of pairs. A Monte Carlo procedure is used to optimize a similarity score defined in terms of equivalent intramolecular distances. Several alignments are optimized in parallel, leading to simultaneous detection of the best, second-best and so on solutions. The method allows sequence gaps of any length, reversal of chain direction, and free topological connectivity of aligned segments. Sequential connectivity can be imposed as an option. The method is fully automatic and identifies structural resemblances and common structural cores accurately and sensitively, even in the presence of geometrical distortions. An all-against-all alignment of over 200 representative protein structures results in an objective classification of known 3D folds in agreement with visual classifications. Unexpected topological similarities of biological interest have been detected, e.g., between the bacterial toxin colicin A and globins, and between the eukaryotic POU-specific DNA-binding domain and the bacterial lambda repressor.

GETAREA - Predicted Solvent Accessible Surface Areas
This calculates the solvent accessible surface area of molecules. To calculate SASA of proteins, supply the name of the local file containing atomic coordinatesin PDB format. There is an otpion to calculate solvation energy and inclusion of molecules other than proteins.

Biomer - molecular modeling program
It should be useful as an educational tool as well as a means to generate structures quickly and easily. It is written in Java, and has the following features:

3 Dee - Database of Protein Domain Definitions
This database contains definitions of structural domains for all protein chains in the Brookhaven Protein Databank (PDB) that have 20 or more residues and are not theoretical models. he domains have been clustered on sequence similarity and structural similarity to form families. The families are stored as a hierarchy.

Genome-Structure Spider
The Structure Spider pages are the results of an Entrez-based Autonomous Agent or"spider". This is a program which follows a path in search of information, which it records in the form of these tables. The Structure Spider does not itself perform any kind of sequence or structure analysis, it uses the precomputed analyses which are stored in the Entrez system.

The spider starts with a list of genomic DNA sequence accession numbers corresponding to a complete genome. This list represents the link between the Entrez Genomes division and the Entrez Nucleotide sequence division. The spider then uses the Entrez Applications Programming Interface (API) to find the proteins associated with each nucleotide sequence. For each of these protein sequences, ITSY requests the protein neighbors (100 Maximum) using an Entrez API neighbor request funciton


The list of neighbors is returned to the ITSY client over the network in the form of a list of accession codes (GI numbers). This list is then used as an argument to another call to the Entrez API to request linked structures from Entrez's structure database, MMDB. If the accession codes of any structures are returned, ITSY makes a row in the table and reports the query protein sequence, the first structure found, and the list of other related structures.

If no structures are found in the first round, then the list of protein sequence neighbors found in the first round is used with another call to the Entrez API to ask for"neighbors of neighbors". This second round of neighboring returns another, longer list of protein sequences. Again this list is passed back to the Entrez API to find if any of these sequence are derived from the structure database. If any are found, a row is made in the table.

At no time does the spider actually look at a sequence. It works entirely with sequence identifiers. The protein-protein comparisons are precomputed using BLAST, and stored in the Entrez database which holds the results as an NxN comparison of the database against itself. These comparisons are uptdated daily.

The second-round of neighboring is made necessary owing to the 100-neighbor limit in the current Entrez API. However, it has been found that by doing this second-round of neighboring, ITSY finds many very weak relationships that seem to substantiated by known motifs or similar functions. It is now well established that structural similarities may be inferred only from short sequence motifs. Nonetheless, some false hits may be found. If the 2nd round neighbors are related by BLAST HSP (high scoring pairs) directly, the structural relationship may be genuine:

Understanding how proteins self-assemble ("protein folding") is a holy grail of modern molecular biophysics. What makes it such a great challenge is its complexity, which renders simulations of folding extremely computationally demanding and difficult to understand.

Our group has developed a new way to simulate protein folding ("distributed dynamics") which should remove the previous barriers to simulating protein folding. However, this method is extremely computationally demanding and we need your help. We have already demonstrated that our distributed dynamics technique can fold small protein fragments and protein-like synthetic polymers. The next step is to apply these methods to larger, considerably more important and complicated proteins. Unfortunately, larger proteins fold slower and thus we need more computers to simulate their folding. While the alpha helix folds in 100 nanoseconds, proteins just a little larger fold 100x slower (10 microseconds). Thus, while 10-100 processors were enough to simulate the helix, we will need many more to simulate these larger, more interesting proteins.

To achieve a significant speedup, we need lots of processors in a given run. Also, since a single run does not tell us much, we need to simulate several runs (10 runs would be a good start) per protein. Thus, we need lots of processors. By running our client that uses the Mithral CS-SDK, you can lend us your machine for as long as you like. The client allows you to run for as little or as long as you like. Even a single day's worth of running is helpful to us.

The CPHmodels Server
CPHmodels is a collection of methods and databases developed to predict protein structures. It currently consists of the following tools:

Any Comments, Questions? Support@hgmp.mrc.ac.uk