BIOLOGICAL INFORMATICS

Group leader: Mark Wilkinson - Investigador senior programa Isaac Peral

Esta dirección de correo electrónico está siendo protegida contra los robots de spam. Necesita tener JavaScript habilitado para poder verlo. +34 913364592 / 914524900 Ext:25550 ( Lab Bioinformática )

Personnel:

 

Visit our lab homepage for more details


 

Reproducibility through Semantics

On top of the natural complexity of biology itself, is the technical complexity of biological datasets and their analysis. Biological data, and the tools and algorithms designed to analyse them, have been generated by thousands of independent research groups worldwide, and are published over a wide array of non-coordinating Websites often in non-standardized formats or databases. Given the inter-connections between biological systems, a researcher may therefore find themselves relying on an unfamiliar data type or analytical tool from another species to answer any given biological question in their species of interest.

 

 

Our SADI (Semantic Automated Discovery and Integration) project extends these core principles into the domain of analytical algorithms and tools. SADI requires that every analytical tool must describe (using semantic technologies) the kinds of biological entities it is capable of analysing, and what inter-entity relationships it discovers as a result of its analysis. This then allows machines to automatically match any given dataset, with a set of tools capable of analysing that dataset to generate a biological relationship of interest to the researcher.

 

Our SHARE (Semantic Health and Research Environment) project takes this yet one step further. We propose that scientific hypotheses can be formally modeled using the OWL languge. This model is then decomposed into its individual assertions - some derived from prior knowledge, some representing the core hypothetical proposition. From there, we automatically match each of those assertions to a tool on the Web capable of retrieving or discovering data matching that assertion. SHARE then automatically "pipelines" those tools together and executes the analysis, thereby automatically generating a result dataset that attempts to meet the criterion in the hypothetical model. This comes close to achieving our desired end-point, where the biological researcher need only pose the question in order to obtain the answer - with all technical steps in-between being automated. Moreover, the resulting analysis is entirely transparent and reproducible, since all steps are selected, recorded, and executed without manual intervention, and each step is associated with a specific logical assertion in the initial hypothesis.

 

FAIR Data - Findable Accessible Interoperable and Reusable Our lab are lead participants in the FAIR Data initiative. In addition to being co-authors of the FAIR Principles, we are exploring how these principles can be used to make science more transparent. When data and knowledge is FAIR, it becomes easier to find, and therefore easier to validate against prior biological knowledge and data. We examine how FAIR publication of scientific assertions might be automatically compared to similar assertions in the scholarly literature, providing a means to both explore the liklihood of truth of a given assertion, as well as provide a richer collection of citations, ensuring that all relevant scholars are properly credited.

 

Representative Publications

Wilkinson, MD; Sansone, S-A; Schultes, E; Doorn, P; Bonino da Silva Santos, LO; Dumontier, M. 2018. "A design framework and exemplar metrics for FAIRness". Scientific Data. DOI: 10.1038/sdata.2018.118".

Townend, GS; Ehrhart, F; Kranen, HJ; Wilkinson, M; Jacobsen, A; Roos, M; Willighagen, EL; Enckevort, D; Evelo, CT; Curfs, LMG. "MECP2 variation in Rett syndrome ‐ an overview of current coverage of genetic and phenotype data within existing databases". Human Mutation. DOI: 10.1002/humu.23542".

Roos, M; López Martin, E; Wilkinson, MD. 2017. "Preparing Data at the Source to Foster Interoperability across Rare Disease Resources", p. 165-179. In M. Posada de la Paz, D. Taruscio, and S. C. Groft (eds.), Rare Diseases Epidemiology: Update and Overview. Springer International Publishing, Cham. DOI: 10.1007/978-3-319-67144-4_9".

Illana, A; Marconi, M; Rodríguez-Romero, J; Xu, P; Dalmay, T; Wilkinson, MD; Ayllón, MÁ; Sesma, A. 2017. "Molecular characterization of a novel ssRNA ourmia-like virus from the rice blast fungus Magnaporthe oryzae". Archives of Virology. DOI: 10.1007/s00705-016-3144-9".

Wilkinson, MD; Verborgh, R; Bonino da Silva Santos, LO; Clark, T; Swertz, MA; Kelpin, FDL; Gray, AJG; Schultes, EA; van Mulligen, EM; Ciccarese, P; Kuzniar, A; Gavai, A; Thompson, M; Kaliyaperumal, R; Bolleman, JT; Dumontier, M. 2017. "Interoperability and FAIRness through a novel combination of Web technologies". PeerJ Computer Science. DOI: 10.7717/peerj-cs.110".

Mons, B; Neylon, C; Velterop, J; Dumontier, M; da Silva Santos, LOB; Wilkinson, MD. 2017. "Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud". Information Services & Use. DOI: 10.3233/isu-170824".

Rodríguez Iglesias, A; Rodríguez González, A; Irvine, AG; Sesma, A; Urban, M; Hammond-Kosack, KE; Wilkinson, MD. 2016. "Publishing FAIR Data: an exemplar methodology utilizing PHI-base". Frontiers in Plant Science. DOI: 10.3389/fpls.2016.00641".

Wilkinson, MD; Dumontier, M; Aalbersberg, IJ; Appleton, G; Axton, M; Baak, A; Blomberg, N; Boiten, J-W; da Silva Santos, LB; Bourne, PE; Bouwman, J; Brookes, AJ; Clark, T; Crosas, M; Dillo, I; Dumon, O; Edmunds, S; Evelo, CT; Finkers, R; Gonzalez-Beltran, A; Gray, AJG; Groth, P; Goble, C; Grethe, JS; Heringa, J; ’t Hoen, PAC; Hooft, R; Kuhn, T; Kok, R; Kok, J; Lusher, SJ; Martone, ME; Mons, A; Packer, AL; Persson, B; Rocca-Serra, P; Roos, M; van Schaik, R; Sansone, S-A; Schultes, E; Sengstag, T; Slater, T; Strawn, G; Swertz, MA; Thompson, M; van der Lei, J; van Mulligen, E; Velterop, J; Waagmeester, A; Wittenburg, P; Wolstencroft, K; Zhao, J; Mons, B. 2016. "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data. DOI: 10.1038/sdata.2016.18".

Aranguren, ME; Wilkinson, MD. 2015. "Enhanced reproducibility of SADI web service workflows with Galaxy and Docker". GigaScience. DOI: 10.1186/s13742-015-0092-3".

Nakada, T; Boyd, JH; Russell, JA; Aguirre-Hernández, R; Wilkinson, MD; Thair, SA; Nakada, E; McConechy, MK; Fjell, CD; Walley, KR. 2015. "VPS13D gene variant is associated with altered IL-6 production and mortality in septic shock". Journal of Innate Immunity. DOI: 10.1159/000381265".

Pawluczyk, M; Weiss, J; Links, MG; Egaña Aranguren, M; Wilkinson, MD; Egea-Cortines, M. 2015. "Quantitative evaluation of bias in PCR amplification and next-generation sequencing derived from metabarcoding samples". Analytical and Bioanalytical Chemistry. DOI: 10.1007/s00216-014-8435-y".

Marconi, M; Rodriguez-Romero, J; Sesma, A; Wilkinson, MD. 2014. "Bioinformatics tools for Next-Generation RNA sequencing analysis ", p. 371-391. In A. Sesma and T. von der Haar (eds.), Fungal RNA Biology. Springer International Publishing Switzerland. DOI: 10.1007/978-3-319-05687-6_15".

Katayama T; Wilkinson M; Aoki-Kinoshita K; Kawashima S; Yamamoto Y; Yamaguchi A; Okamoto S; Kawano S; Kim J-D; Wang Y; Wu H; Kano Y; Ono H; Bono H; Kocbek S; Aerts J; Akune Y; Antezana E; Arakawa K; Aranda B; Baran J; Bolleman J; Bonnal R; Buttigieg P; Campbell M; Chen Y; Chiba H; Cock P; Cohen K; Constantin A. 2014. "BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains". J. Biomed. Semantics 5:5.

Dumontier, M; Baker, C; Baran, J; Callahan, A; Chepelev, L; Cruz-Toledo, J; Del Rio, N; Duck, G; Furlong, L; Keath, N; Klassen, D; McCusker, J; Queralt-Rosinach, N; Samwald, M; Villanueva-Rosales, N; Wilkinson, M; Hoehndorf, R. 2014. "The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery". Journal of Biomedical Semantics 5:14.

Samadian S; McManus B; Wilkinson M. 2014. "Automatic detection and resolution of measurement-unit conflicts in aggregated data". BMC Med. Genomics 7:S12.

Egana Aranguren, M; Rodriguez Gonzalez, A; Wilkinson, MD. 2014. "Executing SADI services in Galaxy". Journal of Biomedical Semantics. DOI: 10.1186/2041-1480-5-42".

Luciano, JS; Cumming, GP; Kahana, E; Wilkinson, MD; Brooks, EH; Jarman, H; McGuinness, DL; Levine, MS. 2014. "Health Web Science". Foundations and Trends® in Web Science. DOI: 10.1561/1800000019".

Rodríguez González, A; Callahan, A; Cruz-Toledo, J; Garcia, A; Egaña Aranguren, M; Dumontier, M; Wilkinson, M. 2014. "Automatically exposing OpenLifeData via SADI semantic Web Services". Journal of Biomedical Semantics. DOI: 10.1186/2041-1480-5-46

Egaña Aranguren, M; Fernández-Breis, JT; Antezana, E; Mungall, C; Rodríguez González, A; Wilkinson, MD. 2013. "OPPL-Galaxy, a Galaxy tool for enhancing ontology exploitation as part of bioinformatics workflows". Journal of Biomedical Semantics. DOI: 2041-1480-4-2 [pii] 10.1186/2041-1480-4-2".

Katayama, T; Wilkinson, MD; Micklem, G; Kawashima, S; Yamaguchi, A; Nakao, M; Yamamoto, Y; Okamoto, S; Oouchida, K; Chun, HW; Aerts, J; Afzal, H; Antezana, E; Arakawa, K; Aranda, B; Belleau, F; Bolleman, J; Bonnal, RJ; Chapman, B; Cock, PJ; Eriksson, T; Gordon, PM; Goto, N; Hayashi, K; Horn, H; Ishiwata, R; Kaminuma, E; Kasprzyk, A; Kawaji, H; Kido, N; Kim, YJ; Kinjo, AR; Konishi, F; Kwon, KH; Labarga, A; Lamprecht, AL; Lin, Y; Lindenbaum, P; McCarthy, L; Morita, H; Murakami, K; Nagao, K; Nishida, K; Nishimura, K; Nishizawa, T; Ogishima, S; Ono, K; Oshita, K; Park, KJ; Prins, P; Saito, TL; Samwald, M; Satagopam, VP; Shigemoto, Y; Smith, R; Splendiani, A; Sugawara, H; Taylor, J; Vos, RA; Withers, D; Yamasaki, C; Zmasek, CM; Kawamoto, S; Okubo, K; Asai, K; Takagi, T. 2013. "The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies". Journal of Biomedical Semantics. DOI: 2041-1480-4-6 [pii] 10.1186/2041-1480-4-6".

Luciano, JS; Cumming, GP; Wilkinson, MD; Kahana, E. 2013. "The emergent discipline of health web science". Journal of Medical Internet Research. DOI: v15i8e166 [pii] 10.2196/jmir.2499".

McCarthy, L; Vandervalk, B; Wilkinson, M. 2012. "SPARQL Assist language-neutral query composer" BMC bioinformatics, vol. 13 Suppl 1, no. Suppl 1, p. S2.

Samadian, S; McManus, B; Wilkinson, M.D. 2012. "Extending and encoding existing biological terminologies and datasets for use in the reasoned semantic web" Journal of biomedical semantics, vol. 3, no. 1, p. 6, Jul.

Rodríguez-González, A; Torres-Niño, J; Mayer, M. A; Alor-Hernandez, G; Wilkinson, M.D. 2012. "Analysis of a multilevel diagnosis decision support system and its implications: a case study" Computational and Mathematical Methods in Medicine, vol. 2012, pp. 1-9.

Wood, I; Vandervalk, B; McCarthy, L; Wilkinson, M. 2012. "OWL-DL Domain-Models as Abstract Workflows" in Leveraging Applications of Formal Methods, Verification and Validation. Applications and Case Studies, T. Margaria and B. Steffen, Eds. Berlin/Heidelberg: Springer, pp. 56-66.

Centro de Biotecnología y Genómica de Plantas UPM – INIA Parque Científico y Tecnológico de la U.P.M. Campus de Montegancedo
Autopista M-40, Km 38 - 28223 Pozuelo de Alarcón (Madrid) Tel.: +34 91 4524900 ext. 1806 / +34 91 3364539 Fax: +34 91 7157721. Localización y Contacto

Síguenos en: