An assessment of the taxonomic reliability of DNA barcodesequences in publicly available databases

Abstract: 

The applications of DNA barcoding have a wide range of uses, such as in taxonomic studies to help elucidate crypticspecies and phylogenetic relationships and analyzing environmental samples for biodiversity monitoring and conserva-tion assessments of species. After obtaining the DNA barcode sequences, sequence similarity-based homology analy-sis is commonly used. This means that the obtained barcode sequences are compared to the DNA barcode referencedatabases. This bioinformatic analysis necessarily implies that the overall quantity and quality of the reference data-bases must be stringently monitored to not have an adverse impact on the accuracy of species identification. With thedevelopment of next-generation sequencing techniques, a noticeably large number of DNA barcode sequences havebeen produced and are stored in online databases, but their degree of validity, accuracy, and reliability have not beenextensively investigated. In this study, we investigated the extent to which the amount and types of erroneous barcodesequences were deposited in publicly accessible databases. Over 4.1 million sequences were investigated in three large-scale DNA barcode databases (NCBI GenBank, Barcode of Life Data System [BOLD], and Protist Ribosomal Referencedatabase [PR2]) for four major DNA barcodes (cytochrome c oxidase subunit 1 [COI], internal transcribed spacer[ITS],ribulose bisphosphate carboxylase large chain[rbcL],and 18S ribosomal RNA [18S rRNA]); approximately 2%of errone-ous barcode sequences were found and their taxonomic distributions were uneven. Consequently, our present findingsprovide compelling evidence of data quality problems along with insufficient and unreliable annotation of taxonomicdata in DNA barcode databases. Therefore, we suggest that if ambiguous taxa are presented during barcoding analysis,further validation with other DNA barcode loci or morphological characters should be mandated.

Author(s): 
Soyeong Jin
Kwang Young Kim
Min-Seok Kim
Chungoo Park
Keywords: 
18S rRNA
COI
DNA barcoding
ITS
rbcL
taxonomic databases
Article Source: 
Algae 2020; 35(3): 293-301
Category: 
Seaweed composition