About the EBI
The European Bioinformatics Institute (EBI) lies in the fifty-five acres of landscaped parkland in rural Cambridgeshire that make up the Wellcome Trust Genome Campus. The Campus also houses the Sanger Centre and the UK Medical Research Council Human Genome Mapping Project Resource Centre (HGMP). Together, the three institutes provide one of the world's largest concentrations of expertise in genomics and bioinformatics. The mission of the EBI is to ensure that the growing body of information from molecular biology and genome research is placed in the public domain and is accessible freely to all facets of the scientific community in ways that promote scientific progress. The EBI serves researchers in molecular biology, genetics, medicine and agriculture from academia, and the agricultural, biotechnology, chemical and pharmaceutical industries. The EBI does this by building, maintaining and making available databases and information services relevant to molecular biology, as well as carrying out research in bioinformatics and computational molecular biology.
History of the EBI
The European Bioinformatics Institute is a non-profit academic organisation that forms part of the European Molecular Biology Laboratory (EMBL). The EMBL is an international network of research institutes funded by contributions from fifteen countries and dedicated to research in molecular biology. The roots of the EBI lie in the EMBL Nucleotide Sequence Data Library, which was established in 1980 at the EMBL laboratories in Heidelberg, Germany and was the world's first nucleotide sequence database. The original goal was to establish a central computer database of DNA sequences, rather than have scientists submit sequences to journals. What began as a modest task of abstracting information from literature, soon became a major database activity with direct electronic submissions of data and the need for highly skilled informatics staff. The task grew in scale with the start of the genome projects, and grew in visibility as the data became relevant to research in the commercial sector. It soon became apparent that the EMBL Nucleotide Sequence Data Library needed better financial security to ensure its long-term viability and to cope with the sheer scale of the task. There was also a need for research and development to provide services, to collaborate with global partners to support the project, and to provide assistance to industry. To this end, in 1992, the EMBL Council voted to establish the European Bioinformatics Institute and to locate it at the Wellcome Trust Genome Campus in the United Kingdom where it would be in close proximity to the major sequencing efforts at the Sanger Centre and HGMP Resource Centre. From 1992 through to 1995, a gradual transition of the activities in Heidelberg took place, till in September 1995 the EBI occupied its current location on the Wellcome Trust Genome Campus.
Funding for the EBI is provided largely by the Member States of EMBL, but in recent years the EBI has had substantial support from the Commission of the European Union. Other projects are supported by contributions from the pharmaceutical and biotech industry. In addition, although its contribution to the running costs of the EBI is modest, the Wellcome Trust generously provides the facilities for the EBI on its Genome Campus at Hinxton.
The EBI is organised under three Programmes: Service, Research and Industry. As the names suggest, each programme emphasises a different aspect of work at the EBI. However, staff from all three programmes are active in providing services, performing original research and supporting industry.
The EBI Service Programme
The Service Programme of the EBI focuses on building, maintaining and providing biological databases and information services to support data deposition and exploitation. Research and Development within the Service Programme investigates the latest methods in database design and interoperability with a view to providing the best possible information services.
The EBI Research Programme
The EBI Research Programme has both pure and applied research activities at the leading edges of computational molecular biology. These activities include the study of molecular evolution, genome comparison, gene prediction, protein motifs, metabolic pathways, sequence-structure relationships, the application of parallel computing in molecular biology, the analysis of biomolecular sequences and 3D structures, new biological databases, and navigation tools for linking databases.
The EBI Industry Programme
The EBI Industry Programme was established to meet the special needs of the biotechnology, chemical and pharmaceutical industries, but still remain consistent with the public domain policy of the EBI. The programme aims to help industry adapt quickly to, and maximise benefits from, innovations in bioinformatics. The programme comprises training and education through regular workshops on leading edge topics in both biology and computing, plus the development of databases and services, with a special emphasis on the promotion and development of standards.
Resources Available from the EBI
The Internet has had a huge impact on molecular biology and bioinformatics and most of the information services of the EBI are provided through this medium. For full details of the analysis tools and databases currently available at EBI please see the EBI www pages.
Analysis Tools and Database Access
The EBI maintains versions of all the major public domain sequence database searching and analysis tools, e.g. FASTA (Smith & Waterman, 1981), BLAST (Altschul et al., 1990), CLUSTALW (Thompson et al., 1994) and Smith & Waterman (Smith & Waterman, 1981) implementations. The EBI also hosts tools such as DALI (Holm & Sander, 1997), a service for comparing protein structures in three dimensions and revealing biologically interesting similarities, and GeneQuiz, a system for highly automated analysis of protein sequences for the prediction of biochemical function.
A major utility is the SRS system that was developed at EMBL Heidelberg and EBI and is now deployed at sites around the world. SRS is a program for the indexing and cross-referencing of databases of textual information and provides unified access to molecular biology databases, integration of analysis tools and advanced parsing tools for disseminating and reformatting information stored in ASCII text.
The services that the EBI develops and offer centre around the major databases of molecular biology information that it maintains. These are the EMBL Nucleotide Sequence Database, the TrEMBL and SWISS-PROT protein sequence databases, the Macromolecular Structure Database (EBI-MSD) of 3D co-ordinates of biological macromolecules, and the RHdb database of radiation hybrid maps. Aside from the major database projects, the EBI is involved in the preparation and distribution of over 70 other databases dedicated to particular areas of molecular biology.
In order to remain state-of-the-art, the EBI services must be constantly reviewed and upgraded. This requires an understanding of the latest developments and trends in biological research and information technology. The work of the EBI Services R&D teams aims to exploit the newest technologies in order to improve services and meet the demands of an ever more sophisticated end-user community. This has included a significant move towards object-oriented technology and providing direct access to databases using CORBA technology.
EMBL Nucleotide Sequence Database
In December 1998, the EMBL Nucleotide Sequence Database (usually referred to as EMBL) contained over 2.3 billion nucleotide base pairs in more than two million entries that contain sequence information and associated annotation. It is produced in close collaboration with GenBank in the USA and DDBJ in Japan. Together, these three sites constitute the global deposition sites for nucleotide sequence information, and every twenty-four hours the three databases exchange information. This information exchange requires carefully co-ordinated protocols. EMBL also contains all the sequence data from the European patent literature. The EBI has developed automated methods to propagate updates to remote copies of the database making it easy for users to maintain a complete and up-to-date local copy of the EMBL database. Future challenges include developing the EMBL database to reflect the deposition of whole genomes, and the ability to enhance existing sequence information with new annotation. A major emphasis of the EMBL Nucleotide Sequence Database is on the collection of information from international genome centres. Collaboration with the Sanger Centre, which is located on the Wellcome Trust Genome Campus next to the EBI, has enabled the EBI to have substantially automated the importing of data from genome projects into the central EMBL database.
SWISS-PROT and TrEMBL Protein Sequence Database
The SWISS-PROT protein sequence database was begun at the University of Geneva in 1986, and since 1987, it has been produced in a joint collaboration with the EBI. Detailed collaborations with world experts have enabled SWISS-PROT to have accurate and comprehensive high quality annotation. It is SWISS-PROT that has pioneered the notion of providing extensive cross-references between biological information sources that help to create a network of interacting databases. However, proteins and nucleotide coding regions are being deposited at a faster rate than even the SWISS-PROT teams can handle. To this end, the EBI developed TrEMBL as a supplement to SWISS-PROT. TrEMBL is an automatic computer-annotated database of the translations of coding domains in EMBL that are not currently in SWISS-PROT. The protein sequence annotation is derived from annotations of the nucleotide sequence, analogies with already understood proteins, plus references to patterns and motifs characteristic of particular protein functions.
Macromolecular Structure Database
The Macromolecular Structure Database Project at EBI (EBI-MSD), provides the European Centre for management of data on biological macromolecules. The EBI-MSD is the European deposition site for macromolecular structures and jointly manages the world archive of data on macromolecular structure with the US-RCSB (Research Collaboratory for Structural Bioinformatics) who manage the Protein Data Bank (PDB) in the USA. The group at EBI has developed a new database of biological macromolecules and, following an extensive processing of the PDB has populated the database with annotated data. In a move towards providing higher quality data in the future, a program suite has been developed at EBI to harvest experimental and computational data relevant for the final deposition of structural models. During 1999 EBI-MSD will introduce new tools developed at EBI for deposition and access to the macromolecular structure database.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). Basic local alignment search tool. J Mol Biol 215(3), 403-10.
Bairoch A., & Apweiler R. (1999). The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res 27(1), 49-54.
Etzold, T., Ulyanov, A., & Argos, P. (1996). SRS: Information Retrieval System for Molecular Biology Data Banks. Methods in Enzymology 266, 114.
Holm, L. & Sander, C. (1997). Dali/FSSP classification of three-dimensional protein folds. Nucleic Acids Res 25(1), 231-4.
Rodriguez-Tomé, P. & Lijnzaad, P. (1997). The Radiation Hybrid Database. Nucleic Acids Res 25(1), 81-84.
Smith, T. F. & Waterman, M. S. (1981). Identification of common molecular subsequences. J Mol Biol 147(1), 195-7.
Stoesser G., Tuli, M.A., Lopez, R., & Sterk, P. (1999). The EMBL Nucleotide Sequence Database. Nucleic Acids Res 27(1), 18-24.
Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22), 4673-80.