Welcome to the GenomeWeb
Protein Multiple Sequence Analysis

Search for:

These are a collection of protein multiple sequence analysis sites.

[info] SIM - Alignment Tool for protein
[info] AMAS - Analyse Multiply Aligned Sequences
[info] ToPLign: Toolbox for Protein Alignment
[info] MSA
[info] DIALIGN
[info] AllAll
[info] PredictProtein
[info] GeneBee
[info] Match-Box
[info] Protein Structure Prediction
[info] Sequence Alignment and Modeling System
[info] THE MEME SYSTEM - Multiple EM for Motif Elicitation
[info] Meta-MEME motif-based hidden Markov models
[info] DbClustal - global multiple alignments by database searches

Detailed information on the above options

SIM - Alignment Tool for protein
SIM is a program which finds a user-defined number of best non-intersecting alignments between two protein sequences or within a sequence.

Once the alignment is computed, you can view it using LALNVIEW, a graphical viewer program for pairwise alignments

AMAS - Analyse Multiply Aligned Sequences
Visualization of physico-chemical properties of the columns of an alignment

ToPLign: Toolbox for Protein Alignment
Computing, analysis and visualization of pairwise, multiple, threading, and parametric alignments.

(Close-to-) Optimal Alignments using the Carrillo-Lipman bound

While standard alignment methods of alignment rely on comparing single residues and imposing gap penalties, DIALIGN constructs alignments by comparing whole segments of the sequences. No gap penalty is employed. This point of view is especially adequate if sequences are not globally related but share only local similarities as is the case in genomic DNA sequences and in many protein families.

Calculate Phylogenetic Trees, Alignments, dSplits, Probabilistic ancestral sequence, {Kabat-Wu, probability, maximum likelihood} variation index, prediction of Surface/Interior/Active site, prediction of parse regions.

PP is an automatic service for protein database searches and the prediction of aspects of protein structure. You send an amino acid sequence and PP returns:

The following features are available upon request:

Further help

GeneBee Multiple alignment: pairwise motifs to multiple motifs to"supermotifs" to construction of multiple alignment.

The Match-Box multiple sequence alignment method circumvents the gap penalty requirement: in the Match-Box method the gaps are the result of the alignment and not a governing parameter of the matching procedure.

The method produces reliable results, as assessed by the tests performed on protein families of known structures and of low sequence similarity.

A reliability score is computed in relation with a threshold of similarity progressively raised to extend the aligned regions to their maximal length. The score obtained at each position of the final alignment is printed below the sequences and allows a discriminant reading of each aligned region.

Several additional outputs present pairwise similarity analyses in order to allow delineation of relevant subsets of related sequences and to avoid alignment of unrelated sequences.

Protein Structure Prediction
This is a hidden Markov model (HMM) protein structure prediction server.

The server has used UCSC's SAM-T98 method to create a library of HMMs, one per PDB structure (about 2500 HMMs total). You can search this database of HMMs with a protein sequence.

Sequence Alignment and Modeling System
The Sequence Alignment and Modeling system (SAM) is a collection of flexible software tools for creating, refining, and using linear hidden Markov models for biological sequence analysis. The model states can be viewed as representing the sequence of columns in a multiple sequence alignment, with provisions for arbitrary position-dependent insertions and deletions in each sequence. The models are trained on a family of protein or nucleic acid sequences using an expectation-maximization algorithm and a variety of algorithmic heuristics. A trained model can then be used to both generate multiple alignments and search databases for new members of the family. SAM is written in the C programming language for Unix machines and MasPar parallel computers, and includes extensive documentation.

THE MEME SYSTEM - Multiple EM for Motif Elicitation
MEME is a tool for discovering motifs in a group of related DNA or protein sequences.

A motif is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences. MEME represents motifs as position-dependent letter-probability matrices which describe the probability of each possible letter at each position in the pattern. Individual MEME motifs do not contain gaps. Patterns with variable-length gaps are split by MEME into two or more separate motifs.

MEME takes as input a group of DNA or protein sequences (the training set) and outputs as many motifs as requested. MEME uses statistical modeling techniques to automatically choose the best width and description for each motif.

Meta-MEME motif-based hidden Markov models
Meta-MEME is a software toolkit for building and using motif-based hidden Markov models of DNA and proteins. The input to Meta-MEME is a set of similar protein sequences, as well as a set of motif models discovered by MEME. Meta-MEME combines these models into a single, motif-based hidden Markov model and uses this model to produce a multiple alignment of the original set of sequences and to search a sequence database for homologs.

DbClustal - global multiple alignments by database searches
The server will:

This method was published by Thompson J.D. et al. in Nucleic Acid Research (Vol.28, pp 2919-2926).

If you want to get more info about Ballast and the anchors you may have a look at http://igbmc.u-strasbg.fr:8080/ballast.html

Any Comments, Questions? Support@hgmp.mrc.ac.uk