DNA barcoding

DNA barcoding is a taxonomic method that uses a short genetic marker in an organism's DNA to identify it as belonging to a particular species. It differs from molecular phylogeny in that the main goal is not to determine classification but to identify an unknown sample in terms of a known classification. Although barcodes are sometimes used in an effort to identify unknown species or assess whether species should be combined or separated, the utility of DNA barcoding for these purposes is subject to debate.

Applications include, for example, identifying plant leaves even when flowers or fruit are not available, identifying insect larvae (which typically have fewer diagnostic characters than adults), identifying the diet of an animal based on stomach contents or faeces, and identifying products in commerce (for example, herbal supplements or wood).

Choice of Locus
A desirable locus for DNA barcoding should be standardized (so that large databases of sequences for that locus can be developed), present in most of the taxa of interest and sequencable without species-specific PCR primers, short enough to be easily sequenced with current technology, and provide a large variation between species yet a relatively small amount of variation within a species.

Although several loci have been suggested, a common set of choices are:


 * For animals and many other eukaryotes, the mitochondrial CO1 gene


 * For land plants, the concatenation of the rbcL and matK chloroplast genes

Mitochondrial DNA
DNA barcoding is based on a relatively simple concept. Most eukaryote cells contain mitochondria, and mitochondrial DNA (mtDNA) has a relatively fast mutation rate, which results in significant variation in mtDNA sequences between species and, in principle, a comparatively small variance within species. A 658-bp region of the mitochondrial cytochrome c oxidase subunit I (COI) gene was proposed as a potential 'barcode'.

However, because all mtDNA genes are maternally inherited (direct evidence for recombination in mtDNA is available in some bivalves such as Mytilus but it is suspected that it may be more widespread ), any occurrences of hybridization, male-killing microoroganisms, cytoplasmic incompatibility-inducing symbionts (e.g., Wolbachia ), horizontal gene transfer (such as via cellular symbionts ), or other "reticulate" evolutionary phenomena in a lineage can lead to misleading results (i.e., it is possible for two different species to share mtDNA, or for one species to have more than one mtDNA sequence exhibited among different individuals).

As of 2009, databases of CO1 sequences included at least 620,000 specimens from over 58,000 species of animals, larger than databases available for any other gene.

Identifying flowering plants
Kress et al. (2005 ) suggest that the use of the COI sequence “is not appropriate for most species of plants because of a much slower rate of cytochrome c oxidase I gene evolution in higher plants than in animals”. A series of experiments was then conducted to find a more suitable region of the genome for use in the DNA barcoding of flowering plants (or the larger group of land plants). One 2005 proposal was the nuclear internal transcribed spacer region and the plastid trnH-psbA intergenic spacer; other researchers advocated other regions such as matK.

In 2009, a collaboration of a large group of plant DNA barcode researchers proposed two chloroplast genes, rbcL and matK, taken together, as a barcode for plants. Jesse Ausubel, a DNA barcode researcher not involved in that effort, suggested that standardizing on a sequence was the best way to produce a large database of plant sequences, and that time would tell whether this choice would be sufficiently good at distinguishing different plant species.

Vouchered specimens
DNA sequence databases like GenBank contain many sequences that are not tied to vouchered specimens (for example, herbarium specimens, cultured cell lines, or sometimes images). This is problematic in the face of taxonomic issues such as whether several species should be split or combined, or whether past identifications were sound. Therefore, best practice for DNA barcoding is to sequence vouchered specimens.

Origin
The use of nucleotide sequence variations to investigate evolutionary relationships is not a new concept. Carl Woese used sequence differences in ribosomal RNA (rRNA) to discover archaea, which in turn led to the redrawing of the evolutionary tree, and molecular markers (e.g., allozymes, rDNA, and mtDNAvage ) have been successfully used in molecular systematics for decades. DNA barcoding provides a standardised method for this process via the use of a short DNA sequence from a particular region of the genome to provide a 'barcode' for identifying species. In 2003, Paul D.N. Hebert from the University of Guelph, Ontario, Canada, proposed the compilation of a public library of DNA barcodes that would be linked to named specimens. This library would “provide a new master key for identifying species, one whose power will rise with increased taxon coverage and with faster, cheaper sequencing”.

Identification of birds
In an effort to find a correspondence between traditional species boundaries established by taxonomy and those inferred by DNA barcoding, Hebert and co-workers sequenced DNA barcodes of 260 of the 667 bird species that breed in North America (Hebert et al. 2004a ). They found that every single one of the 260 species had a different COI sequence. 130 species were represented by two or more specimens; in all of these species, COI sequences were either identical or were most similar to sequences of the same species. COI variations between species averaged 7.93%, whereas variation within species averaged 0.43%. In four cases there were deep intraspecific divergences, indicating possible new species. Three out of these four polytypic species are already split into two by some taxonomists. Hebert et al.'s (2004a ) results reinforce these views and strengthen the case for DNA barcoding. Hebert et al. also proposed a standard sequence threshold to define new species, this threshold, the so-called "barcoding gap", was defined as 10 times the mean intraspecific variation for the group under study.

Delimiting cryptic species
The next major study into the efficacy of DNA barcoding was focused on the neotropical skipper butterfly, Astraptes fulgerator at the Area Conservacion de Guanacaste (ACG) in north-western Costa Rica. This species was already known as a cryptic species complex, due to subtle morphological differences, as well as an unusually large variety of caterpillar food plants. However, several years would have been required for taxonomists to completely delimit species. Hebert et al. (2004b ) sequenced the COI gene of 484 specimens from the ACG. This sample included “at least 20 individuals reared from each species of food plant, extremes and intermediates of adult and caterpillar color variation, and representatives” from the three major ecosystems where Astraptes fulgerator is found. Hebert et al. (2004b ) concluded that Astraptes fulgerator consists of 10 different species in north-western Costa Rica. These results, however, were subsequently challenged by Brower (2006 ), who pointed out numerous serious flaws in the analysis, and concluded that the original data could support no more than the possibility of three to seven cryptic taxa rather than ten cryptic species. This highlights that the results of DNA barcoding analyses can be dependent upon the choice of analytical methods used by the investigators, so the process of delimiting cryptic species using DNA barcodes can be as subjective as any other form of taxonomy.

A more recent example used DNA barcoding for the identification of cryptic species included in the ongoing long-term database of tropical caterpillar life generated by Dan Janzen and Winnie Hallwachs in Costa Rica at the ACG. In 2006 Smith et al. examined whether a COI DNA barcode could function as a tool for identification and discovery for the 20 morphospecies of Belvosia parasitoid flies (Tachinidae) that have been reared from caterpillars in ACG. Barcoding not only discriminated among all 17 highly host-specific morphospecies of ACG Belvosia, but it also suggested that the species count could be as high as 32 by indicating that each of the three generalist species might actually be arrays of highly host-specific cryptic species.

In 2007 Smith et al. expanded on these results by barcoding 2,134 flies belonging to what appeared to be the 16 most generalist of the ACG tachinid morphospecies. They encountered 73 mitochondrial lineages separated by an average of 4% sequence divergence and, as these lineages are supported by collateral ecological information, and, where tested, by independent nuclear markers (28S and ITS1), the authors therefore viewed these lineages as provisional species. Each of the 16 initially apparent generalist species were categorized into one of four patterns: (i) a single generalist species, (ii) a pair of morphologically cryptic generalist species, (iii) a complex of specialist species plus a generalist, or (iv) a complex of specialists with no remaining generalist. In sum, there remained 9 generalist species classified among the 73 mitochondrial lineages analyzed.

However, also in 2007, Whitworth et al. reported that flies in the related family Calliphoridae could not be discriminated by barcoding. They investigated the performance of barcoding in the fly genus Protocalliphora, known to be infected with the endosymbiotic bacteria Wolbachia. Assignment of unknown individuals to species was impossible for 60% of the species, and if the technique had been applied, as in the previous study, to identify new species, it would have underestimated the species number in the genus by 75%. They attributed the failure of barcoding to the non-monophyly of many of the species at the mitochondrial level; in one case, individuals from four different species had identical barcodes. The authors went on to state: "The pattern of Wolbachia infection strongly suggests that the lack of within-species monophyly results from introgressive hybridization associated with Wolbachia infection. Given that Wolbachia is known to infect between 15 and 75% of insect species, we conclude that identification at the species level based on mitochondrial sequence might not be possible for many insects."

Marine biologists have also considered the value of the technique in identifying cryptic and polymorphic species and have suggested that the technique may be helpful when associations with voucher specimens are maintained, though cases of "shared barcodes" (e.g., non-unique) have been documented in cichlid fishes and cowries

Cataloguing ancient life
Lambert et al. (2005 ) examined the possibility of using DNA barcoding to assess the past diversity of the Earth's biota. The COI gene of a group of extinct ratite birds, the moa, were sequenced using 26 subfossil moa bones. As with Hebert's results, each species sequenced had a unique barcode and intraspecific COI sequence variance ranged from 0 to 1.24%. To determine new species, a standard sequence threshold of 2.7% COI sequence difference was set. This value is 10 times the average intraspecies difference of North American birds, which is inconsistent with Hebert's recommendation that the threshold value be based on the group under study. Using this value, the group detected six moa species. In addition, a further standard sequence threshold of 1.24% was also used. This value resulted in 10 moa species which corresponded with the previously known species with one exception. This exception suggested a possible complex of species which was previously unidentified. Given the slow rate of growth and reproduction of moa, it is probable that the interspecies variation is rather low. On the other hand, there is no set value of molecular difference at which populations can be assumed to have irrevocably started to undergo speciation. It is safe to say, however, that the 2.7% COI sequence difference initially used was far too high.

The Moorea Biocode Project
The Moorea Biocode Project is a barcoding initiative to create the first comprehensive inventory of all non-microbial life in a complex tropical ecosystem, the island of Moorea in Tahiti. Supported by a grant from the Gordon and Betty Moore Foundation, the Moorea Biocode Project is a 3-year project that brings together researchers from the Smithsonian Institution, UC Berkeley, France’s National Center for Scientific Research (CNRS), and other partners. The outcome of the project is a library of genetic markers and physical identifiers for every species of plant, animal and fungi on the island that will be provided as a publicly available database resource for ecologists and evolutionary biologists around the world.

The software back-end to the Moore Biocode Project is Geneious Pro and two custom-developed plugins from the New Zealand-based company, Biomatters. The |Geneious Biocode LIMS and Genbank Submission plugins have been made freely available to the public and users of the free Geneious Basic software will be able to access and view the Biocode database upon completion of the project, while a commercial copy of Geneious Pro is required for researchers involved int data creation and analysis.

Criticisms
DNA barcoding has met with spirited reaction from scientists, especially systematists, ranging from enthusiastic endorsement to vociferous opposition. For example, many stress the fact that DNA barcoding does not provide reliable information above the species level, while others indicate that it is inapplicable at the species level, but may still have merit for higher-level groups. Others resent what they see as a gross oversimplification of the science of taxonomy. And, more practically, some suggest that recently diverged species might not be distinguishable on the basis of their COI sequences. Due to various phenomena, Funk & Omland (2003 ) found that some 23% of animal species are polyphyletic if their mtDNA data are accurate, indicating that using an mtDNA barcode to assign a species name to an animal will be ambiguous or erroneous some 23% of the time (see also Meyer & Paulay, 2005 ). Studies with insects suggest an equal or even greater error rate, due to the frequent lack of correlation between the mitochondrial genome and the nuclear genome or the lack of a barcoding gap (e.g., Hurst and Jiggins, 2005, Whitworth et al., 2007, Wiemers & Fiedler, 2007 ). Problems with mtDNA arising from male-killing microoroganisms and cytoplasmic incompatibility-inducing symbionts (e.g., Wolbachia) are also particularly common among insects. Given that insects represent over 75% of all known organisms, this suggests that while mtDNA barcoding may work for vertebrates, it may not be effective for the majority of known organisms.

Moritz and Cicero (2004 ) have questioned the efficacy of DNA barcoding by suggesting that other avian data is inconsistent with Hebert et al.'s interpretation, namely, Johnson and Cicero's (2004 ) finding that 74% of sister species comparisons fall below the 2.7% threshold suggested by Hebert et al. These criticisms are somewhat misleading considering that, of the 39 species comparisons reported by Johnson and Cicero, only 8 actually use COI data to arrive at their conclusions. Johnson and Cicero (2004 ) have also claimed to have detected bird species with identical DNA barcodes, however, these 'barcodes' refer to an unpublished 723-bp sequence of ND6 which has never been suggested as a likely candidate for DNA barcoding.

The DNA barcoding debate resembles the phenetics debate of decades gone by. It remains to be seen whether what is now touted as a revolution in taxonomy will eventually go the same way as phenetic approaches, of which was claimed exactly the same decades ago, but which were all but rejected when they failed to live up to overblown expectations. Controversy surrounding DNA barcoding stems not so much from the method itself, but rather from extravagant claims that it will supersede or radically transform traditional taxonomy. Other critics fear a "big science" initiative like barcoding will make funding even more scarce for already underfunded disciplines like taxonomy, but barcoders respond that they compete for funding not with fields like taxonomy, but instead with other big science fields, such as medicine and genomics. Barcoders also maintain that they are being dragged into long-standing debates over the definition of a species and that barcoding is less controversial when viewed primarily as a method of identification, not classification.

The current trend appears to be that DNA barcoding needs to be used alongside traditional taxonomic tools and alternative forms of molecular systematics so that problem cases can be identified and errors detected. Non-cryptic species can generally be resolved by either traditional or molecular taxonomy without ambiguity. However, more difficult cases will only yield to a combination of approaches. And finally, as most of the global biodiversity remains unknown, molecular barcoding can only hint at the existence of new taxa, but not delimit or describe them (DeSalle, 2006; Rubinoff, 2006 ).

DNA Barcoding Software
Software for DNA barcoding requires integration of a field information management system (FIMS), laboratory information management system (LIMS), sequence analysis tools, workflow tracking to connect field data and laboratory data, database submission tools and pipeline automation for scaling up to eco-system scale projects. Geneious Pro can be used for the sequence analysis components, and the two plugins made freely available through the Moorea Biocode Project, the |Geneious Biocode LIMS and Genbank Submission plugins handle integration with the FIMS, the LIMS, workflow tracking and database submission.