Comparing the efficiency of sampling strategies to establish a representative in the phenotypic-based genetic diversity core collection of orchardgrass (Dactylis glomerata L.)
Establishing a core collection that represents the genetic diversity of the entire collection with a minimum loss of its original diversity and minimal redundancies is an important problem for gene bank curators and crop breeders. In this paper, we assess the representativeness of the original genetic diversity in core collections consisting of one-tenth of the entire collection obtained according to 23 sampling strategies. The study was performed using the Polish orchardgrass Dactylis glomerata L. germplasm collection as a model. The representativeness of the core collections was validated by the difference of means (MD%) and difference of mean squared Euclidean distance (d‒D%) for the studied traits in the core subsets and the entire collection. In this way, we compared the efficiency of a simple random and 22 (20 cluster-based and 2 direct cluster-based) stratified sampling strategies. Each cluster-based stratified sampling strategy is a combination of 2 clusterings, 5 allocations and 2 methods of sampling in a group. We used the accession genotypic predicted values for 8 quantitative traits tested in field trials. A sampling strategy is considered more effective for establishing core collections if the means of the traits in a core are maintained at the same level as the means in the entire collection (i.e., the mean of MD% in the simulated samples is close to zero) and, simultaneously, when the overall variation in a core collection is greater than in the entire collection (i.e., the mean of d‒D% in the simulated samples is greater than that obtained for the simple random sampling strategy). Both cluster analyses (unweighted pair group method with arithmetic mean UPGMA and Ward) were similarly useful in constructing those sampling strategies capable of establishing representative core collections. Among the allocation methods that are relatively most useful for constructing efficient samplings were proportional and D2 (including variation). Within the Ward clusters, the random sampling was better than the cluster-based sampling, but not within the UPGMA clusters.