español    español

Classifying the certainty of scholarly text

The grammatical structures that researchers use to express their ideas are intended to convey varying degrees of certainty or speculation. There are a variety of categorization systems for academic certainty; however, these systems have not been objectively validated, especially in terms of representing the reader's interpretation of a statement's certainty, versus the level of certainty the author intended to convey. In this study, researchers were asked, over a series of questionnaires, to tell us their perception of various scientific statements, using three different certainty classification systems. We found that there are three distinct categories of certainty in a spectrum that ranges from highest to lowest. We demonstrate that these categories can be detected automatically, using a Machine-Learning model, with an accuracy of 89.2% in a corpus classified by the author, and an accuracy of 82.2% compared to the (raw) results of the questionnaires. This then provides a mechanism to capture the degree of certainty being expressed by a published statement which is being automatically text-mined, ensuring that the subtle linguistic cues of certainty (or doubt) can be preserved in databases. Finally, we provide the output of our system as a Nanopublication. "NanoPubs" are a formal, machine-readable representation of scholarly knowledge, thus allowing us to reliably transmit this certainty knowledge from one machine to another.


Original Paper:

Prieto, M., Deus, H., Waard, A. de, Schultes, E., García-Jiménez, B., Wilkinson, M.D. 2020. Data-driven classification of the certainty of scholarly assertions. PeerJ 8, e8871. DOI: 10.7717/peerj.8871

Centre for Plant Biotechnology and Genomics UPM – INIA Parque Científico y Tecnológico de la U.P.M. Campus de Montegancedo
Autopista M-40, Km 38 - 28223 Pozuelo de Alarcón (Madrid) Tel.: +34 91 0679100 ext. 79100 Fax: +34 91 7157721. Location and Contact

Síguenos en: