A genome sequence is a long sequence written in a four letter code — 3 billion letters in the case of a human genome. But what is the meaning — how is the code deciphered? Traditionally this is left to professional annotators who use information from a number of sources (for instance, knowledge about similar genes in other organisms) to work out where a gene starts, stops, and what it does. Even the "gold standard" of professional annotation is an exceptionally slow process. However, new technology may provide a faster solution.
The Public Library of Science is harnessing the power of the Internet to improve access to information, and to facilitate discussion and the understanding of science. In this week’s issue of PLoS Biology, we are very pleased to present information on an independent project working towards the same goals. Andrew Su, John Huss III and colleagues describe their efforts to establish a ‘Gene Wiki’ — an online repository of information on human genes, stored within Wikipedia. They envision a network of articles, created by a computer program and enhanced by user comments, which will describe the relationship and functions of all human genes.
There is a lot of potential information about any given gene — its name, sequence, position on a chromosome, the protein(s) it encodes, other gene(s) it interacts with, etc. and presenting this information is referred to as ‘gene annotation.’ As information may come from many different researchers working independently, it is important that resources exist to collect the information together. Existing annotation libraries include Gene Portals and Model Organism Databases — however, the information stored in these is considered to be definitive, which requires constant updates by specific experts and formal presentation of information. The work reported in this week’s PLoS Biology is intended to allow a much more flexible, organic accumulation of science, with all readers also able to edit and add to the Gene Wiki pages.
In order to stimulate the development of this Wikipedia based resource, Andrew Su and colleagues developed a system that automatically posts information from existing databases as ‘stub’ articles on Wikipedia. A computer program downloads information from one system, formats it according to Wiki formatting and the ‘stub’ template that the authors have designed, and — if a page does not already exist for that gene — posts the information on Wikipedia. The authors are confident that their stubs will seed the posting of more detailed information from scientists who encounter them on Wikipedia — and they report that, so far, they appear to be succeeding: the absolute number of edits on mammalian gene pages has doubled.
[Natalie Bouaravong @ Public Library of Science]