|
The Bioperl Project
The Bioperl Project is an
international association of
developers of public domain Perl
tools for computational molecular
biology.
Bioperl Documentation
Documentation on the Bioperl modules.
XML Bioinformatics
Technical description of the XML bioinformatics standards
BioXML
Technical description of the XML bioinformatics standards
Genome Annotation Markup Elements (GAME)
The motivation for GAME is a desire to provide a syntax, together with
some simple tools, that will facilitate the exchange of genomic
annotations. It will enable genome centres, model organism databases,
an individual researchers to clearly specify the conclusions they have
drawn from their analyses of primary sequence data and share these XML
descriptions with one another. The development of GAME was necessary to
allow the Drosophila Genome Project to coordinate their efforts with
Celera, which required a stable and expressive interchange format.
Distributed Sequence Annotation System (DAS)
The solution that we advocate allows sequence annotation to be
decentralized among multiple third-party annotators and integrated on an
as-needed basis by client-side software. A single server is designated
the"reference server." It serves essential structural information about
the genome: the physical map which relates one entry to another (where
an"entry" is an arbitrary segment of the sequence, such as a sequenced
BAC or a contig), the DNA sequence for each entry, and the standard
authorship information. Multiple sites then act as third-party
"annotation servers." Using a web browser-like application, researchers
can interrogate one or more annotation servers to retrieve features in a
region of interest. The servers return the results using a standard
data format, allowing the sequence browser to integrate the annotations
and display them in graphical or tabular form. No attempt is made to
automatically resolve contradictions between different third-party
annotations. Indeed, it is the ability to facilitate comparison among
different centers' annotations that distinguish this proposal.
GFF: a proposed exchange format for gene-finding features
GFF (Gene-Finding Features) is a format specification for describing
genes and other features associated with genomic sequences. This page is
a starting-point for finding out about this format and its use in
bioinformatics. In particular, since its proposal a considerable amount
of software has been developed for use with GFF and this page is intended
as a focus for the collation of this software, whether developed in the
Sanger Centre or elsewhere.
NEXUS file format
Technical description of the NEXUS file format
Genesafe - Gene prediction data sets
Genesafe was created to help the gene predictors to collaborate on training and testing
sets. Genesafe is about making and distributing common datasets for genefinding.
It consists of this set of web pages, a mailing list and a set of data in the ftp site.
BANBURY CROSS - Site for Gene Identification Software Benchmarking
This Benchmark site is intended to be a forum for scientists working in the
field of gene identification and anonymous genomic sequence annotation,
with the goal of improving current methods in the context of very large (in
particular) vertebrate genomic sequences.
GASP1
The goal of this experiment is to obtain an in-depth and objective
assessment of the current state of the art in gene and functional site
predictions in genomic DNA. To this end, participants will predict as
much as possible about a sample genomic region that has been studied
intensively in the past. All participants will be provided with
datasets that can be used to help make predictions or to train
computational methods. There will be no winners or losers. We are
interested in seeing what level of genome annotation is achievable when
the community works together. Results of the experiment will be made
available through this web site after the ISMB '99 meeting.
EST-Confirmed Human Splice Sites
Bibliography on Features, Patterns, Correlations n DNA and Protein Sequences
This bibliography started out with a narrow focus: non-trivial long-range
statistical correlations in DNA sequences. Gradually, I have been collecting
papers on other topics as well. Now I have a collection of papers studying
the most basic features of DNA and protein sequences, those concerning
these sequences as symbolic strings.
A Bibliography on Computational Gene Recognition
The topic of computational gene recognition has become more and more important as long
DNA being sequenced in the Human Genome Project. How do we know where the genes are
located from the sequence information alone? The papers listed in this bibliography are an
accumulation of more than 15 years of research in computational molecular biology on
this topic.
Linking Biological Databases using CORBA
The objective of this work is to combine the data and services of a number of European
partners using CORBA. These partners will provide access to a wide range of distributed data
sources (EMBL nucleotide sequence database, SwissProt, PIR, MSD, GDB, TRANSFAC, P53 and
RHdb). Client applications will be developed that make use of this provision and build on it
to provide integrated views of these data. These integrated views will enable access to the
data at a higher level, in which, the data are assembled into compound objects that hide the
unnatural partitions in these data and represent our understanding of biology more
adequately.
Forsdyke's Bioinformatics background papers
Introduction to bioinformatics theory:
Any Comments, Questions? Support@hgmp.mrc.ac.uk