Taxonomical evaluation of plant genetic markers by Bayesian Classifier
DNA barcodes are standardized sequences that range between 400-800 bp, vary at different taxonomic levels, and make it possible to identify individuals of species that have been previously assigned taxonomically. Several barcodes have been identified in different groups in the tree of life. However, there are groups that lack an accurate DNA marker, and even more so, accurate strategies that enable verification of their taxonomic affiliation. Several DNA barcodes have been postulated for plants, nonetheless, their classification potential has not been evaluated for metabarcoding, and as a result, it would appear as no one of them excels above the others in this area. One tool that has recently gained traction is Naïve Bayesian Classifiers; this type of classifier is based on the independence of attributes and the allocation of categories in each context. The present study aims at evaluating the classification power of several plant genetic markers that have been proposed as barcodes (trnL, rpoB, rbcL, matK, psbA-trnH and psbK) using a Naïve Bayesian Classifier, in order to determine the markers with higher performance at different taxonomic levels for metabarcoding analysis and to identify problematic genera at the time of species assignment. We propose matK and trnL as potential candidates up to the genus assignment. Some problematic genera (Aegilops, Gueldenstaedtia, Helianthus, Oryza, Shorea, Thysananthus and Triticum) within certain families in a sample could lead to misclassification no matter which marker is used. Finally, we propose recommendations when performing taxonomic identification analysis of plants in samples with multiple individuals.