A comparison of methods for training population optimization in genomic selection
In our work we have compared a wide range of algorithms for optimizing the composition of the training set across 7 datasets belonging to 6 different crops and with varying levels of population structure, genetic architecture, and trait heritability. Furthermore, we have also described a novel genetic-based approach to determine the optimal size of the training population. Our study provides crucial insights into establishing benchmark guidelines for constructing training populations in genomic selection. CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50-55% (targeted) or 65-85% (untargeted) is needed to achieve 95% accuracy.
Original Paper:
Fernández-González, J., Akdemir, D., Isidro y Sánchez, J. 2023. A comparison of methods for training population optimization in genomic selection. Theoretical and Applied Genetics 136, 30. DOI: 10.1007/s00122-023-04265-6