The article discuses how the technologies created in Dr. Wilkinson’s laboratory will help biological researchers automate the analysis of the big amount of data produced by high-throughput biological tools.
With the price of high-throughput biological analysis decreasing daily, there is now a growing gap between the amount of data biologists can produce (even in small laboratories) and their training and expertise in management and statistical analyses of data at such a massive scale. As a result, in the "big data" world, there is now evidence that biomedical research is in-crisis! Recent surveys show that upwards of 50% of peer-reviewed and published experiments cannot be reproduced, sometimes even using the same starting dataset! This reveals that not only are researchers, in many cases, unable to properly analyze their data and/or accurately report how they did those analyses, but also that the primary check-and-balance system at the core of Science - peer-review - is also failing in the "big data" world. There are other symptoms of this crisis, including the phenomenon of assertion-drift, where a researcher cites another's work, but slightly changes the statement, thus leading to a gradual change in the interpretation of key biological observations over several years, even with no additional evidence. And finally, there is the emergence of The Web as one of the key tools by which biologists to explore and share their research output at near-instantaneous speeds.Thus, these errors are rapidly disseminated and, unfortunately, re-used by other researchers.
In this paper, Dr. Mark Wilkinson discusses how several of the technologies created in his laboratory will help researchers automate the analysis of their "big data" by pulling-in the expert techniques and methodologies used by highly-trained statisticians and bioinformaticians worldwide. These analytical approaches are automatically fine-tuned by the biologists'computers, resulting in a personalized analytical pipeline that, nevertheless, follows rules and guidelines created by global experts. The resulting analysis can not only be run "at the touch of a button", but it also creates a perfect record of everything it did during that analysis, allowing both the researcher, and the peer-reviewers, to know exactly what analyses were done, and how. Moreover, the entire analytical approach is re-usable, and can be shared as an executable file with other researchers doing the same kinds of study. The paper discusses how we will design and evaluate tools that make this process simple and useful to biological researchers, and also how these technologies align with, and support, other emergent global initiatives related to scientific publishing, such as NanoPublications.
This paper by Mark Wilkinson was awarded the Prize for the Best Peer-reviewed Article in the “International Academy, Research, and Industry Association's Service Computation 2013 Conference”, Valencia, Spain (27/may-1/june 2013) based on the final manuscript, the reviewer's comments, and the quality of the platform presentation at the conference.