Genomic selection is a breeding tool which employs genomic information to estimate an individual's phenotypic value. Its effectiveness is heavily influenced by the quality of the training population used to construct the prediction model. Numerous studies have described algorithms for training population selection, but a systematic comparison among them is lacking, impairing their implementation.
In our work we have compared a wide range of algorithms for optimizing the composition of the training set across 7 datasets belonging to 6 different crops and with varying levels of population structure, genetic architecture, and trait heritability. Furthermore, we have also described a novel genetic-based approach to determine the optimal size of the training population. Our study provides crucial insights into establishing benchmark guidelines for constructing training populations in genomic selection. CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50-55% (targeted) or 65-85% (untargeted) is needed to achieve 95% accuracy.
Fernández-González, J., Akdemir, D., Isidro y Sánchez, J. 2023. A comparison of methods for training population optimization in genomic selection. Theoretical and Applied Genetics 136, 30. DOI: 10.1007/s00122-023-04265-6