Several weeks ago a broad group of stakeholders in the area of research data usability, including researchers, scholarly publishers, administrative and governing agencies, and funding agencies, published a clear and concise set of principles that should guide how scientific output is published in the future (doi:10.1038/sdata.2016.18). The FAIR Principles hold that scholarly research outputs of all kinds - not only peer-reviewed articles, but the data and analytical workflows upon which those articles are based - should be Findable, Accessible, Interoperable, and Reusable. While the principles provide high-level guidance and a means for measuring the “FAIRness” of your publications, they avoid making any technological or methodological recommendations, and therefore do not suggest how FAIRness could be achieved.
In this Frontiers Technical Advance article, we provide, to our knowledge, the first formal proposal of tools and methodologies for publishing data with the specific aim of meeting or exceeding each of the FAIR principles. Our target dataset was the plant-specific portion of the Pathogen Host Interaction Database (PHI-Base). The data was re-modeled to enhance interoperability with external resources such as PubMed and UniProt, and was extensively annotated with both domain information to enhance discoverability of relevant data, and citation and license information to aid scholarly re-use. The resulting dataset was then re-published using an open standard, and made available for public query. The paper provides some demonstrative queries that span PHI Base, UniProt, and PubMed to show how the use of open, interoperable standards can both reduce the burden of data owners by reducing the amount of third-party data they need to host, and improve the ability of the research community to automate integration over disparate big-data resources.