Base Composition, Speciation, and Barcoding


Donald R. Forsdyke (2013) Trends in Ecology and Evolution 28, 73-74. 

[ ]

[This version closely approximates the copyright holder's (Elsevier Ltd.) final version, and is posted here as a "retained right" for scholarly purposes.]

Richard Grantham  
   Richard Grantham 1922-2009
Abstract  Cryptic mutations producing no observable effect on phenotype can both spark divergence into species and assist species identification. Unlike most protein-encoding sequences, a barcode sequence of great utility in phylogenetic analysis displays few interspecies differences at amino acid-determining codon positions (static conventional phenotype), but many at third codon positions (mobile genome phenotype).

The discoveries of genome base composition “dialects,” and of apparently “selfish” DNA elements, enforced a distinction between the conventional phenotype (“organismal phenotype”) that addresses the external environment, and a more inward-looking genome phenotype (“intragenomic phenotype”) that addresses the internal environment [1, 2]. Elaborating on his “genome hypothesis,” Richard Grantham in 1986 supposed the genome phenotype to be of “importance in speciation and systematics in general.” On one hand there are sibling species that are reproductively isolated from each other; they would seem to differ in their genome phenotypes, yet display no obvious phenotypic differences (i.e. static conventional phenotypes) [3]. On the other hand there are within-species varieties, which are not reproductively isolated from each other, yet can display gross differences in their conventional phenotypes (e.g. bulldog and dachshund). Such observations “necessarily decouple morphological diversification from speciation” [4]. Thus, polar and brown bears – long geographically isolated – have recently admixed, as revealed by mitochondrial heteroplasmy. Their reproductive isolation appears incomplete.

 That differences (i.e. mobility) in the genome phenotype might have sparked an initial sympatric divergence, that preceded conventional phenotypic differentiation, was suggested in 1886 by George Romanes [5], and has recently been reinvoked [6, 7]. Venditti, Meade and Pagel note that “the gradual genetic and other changes that normally accompany speciation may often be consequential to the event that promotes the reproductive isolation, rather than causal themselves” [8]. They found that lineages with many branch points (speciation events) have accumulated more third codon position base mutations than lineages with few branch points (fewer speciation events). Such differences in base composition (GC%) were found to distinguish species by Chargaff in 1951, and were connected with reproductive failure in ciliates by Sueoka in 1961 [5]:

“DNA base composition is a reflection of phylogenetic relationship. Furthermore, it is evident that those strains which mate with one another (i.e. strains within the same ‘variety’) have similar base compositions. Thus strains of variety 1 …, which are freely intercrossed, have similar mean GC content.”

There is now evidence both that the complementary pairing of parental chromosomes at meiosis in their offspring can involve direct DNA-DNA interactions, and that exceedingly small differences in base composition (internal environment), should suffice to alter the extrusion of stem-loops from duplex DNA, such that this pairing would fail [5, 9]. Thus, although the parents might be able to produce an offspring, that offspring would be sterile. In an evolutionary sense, the parents would be reproductively isolated from each other, but not from other individuals of the same base composition.

It seems that, in identifying GC% as the base composition “dialect” (or ‘accent’) of DNA, which varies between species, Chargaff may unknowingly have uncovered the ‘holy grail’ of speciation postulated by Romanes, and later elaborated by William Bateson and Richard Goldschmidt [5, 9]. Romanes drew attention to what we would now call non-genic variations (germ line mutations that usually do not affect gene products). Manifest as the sterility of hybrid offspring, these cryptic variations would have tended to isolate a parent reproductively from most members of the species to which its close ancestors had belonged, but not from members that had undergone the same non-genic variation; with these it would be reproductively compatible, producing fertile offspring. Romanes held that, in the general case, this isolation was an essential precondition for the preservation of the anatomical and physiological characters (gene encoded) that would become distinctive of the new species. He referred to his holy grail (speciating factor) as an “intrinsic peculiarity” of the reproductive system. Bateson described his as an abstract “residue”. Goldschmidt’s was a chromosomal “pattern” caused by “systemic mutations.” Variations in GC% would satisfy these postulates.  

As predicted by Grantham [1], the importance of GC% variations to systematics has recently become evident in barcoding for rapid species identification. But why one DNA segment should serve for barcoding and another not, still escapes general attention. Min and Hickey [10] studied the base composition of a popular mitochondrial barcode sequence. In stark contrast to most protein-coding sequences that determine the conventional phenotype, here there is little difference between species at second positions of codons (the major amino acid-determining position). Thus, this DNA segment is cleansed of the differences – potential distractors – that govern the conventional phenotype. In this respect the sequence is evolutionarily static. The differences that remain, largely affecting base composition at third positions of codons (that do not determine amino acids), could be reflective of the way organisms fundamentally differ as species (i.e. differences in their states of reproductive isolation from each other). Whether such mitochondrial differences could ever have sparked species divergence is problematic. But we should note the high mutation rate in mitochondria, and the growing number of reports of paternal leakage and mitochondrial heteroplasmy.

Acknowledgement  Queen’s University has hosted my evolution education webpages since 1998.


  1. Grantham, R., Perrin P. and Mouchiroud D (1986) Patterns of codon usage of different kinds of species. Oxf. Surv. Evol. Biol. 3, 48-8

  2. Orgel, L. E., Crick, F. H. C. and Sapienza, C. (1980) Selfish DNA. Nature 288, 645-646

  3. Sáez, A. G. and Lozano, E. (2005) Body doubles. Nature 433, 111

  4. Venditti, C., Meade, A. and Pagel, M. (2011) Multiple routes to mammalian diversity. Nature 479, 393-396

  5. Forsdyke, D. R. (2001) The Origin of Species Revisited. A Victorian who Anticipated Modern Developments in Darwin’s Theory. McGill-Queen’s University Press

  6. Venditti, C. and Pagel, M. (2009) Speciation as an active force in promoting genetic evolution. Trends Ecol. Evol. 25, 14-20

  7. Lanfear, R., Ho, S. Y. W., Love, D. and Bromham, L. (2010) Mutation rate is linked to diversification in birds. Proc. Natl. Acad. Sci. USA 107, 20423-20428

  8. Venditti, C., Meade, A. and Pagel, M. (2010) Phylogenies reveal new interpretation of speciation and the Red Queen. Nature 463, 349-352

  9. Forsdyke, D. R. (2011) Evolutionary Bioinformatics. 2nd Edition, Springer, New York

  10. Min, X. J. and Hickey, D. A.  (2007) DNA barcodes provide a quick preview of mitochondrial genome composition. PLOS One 3, e325

Go to: Videolectures (Click Here)

Return to: Evolution Index (Click Here)

Return to: Bioinformatics Index (Click Here)

Return to: Homepage (Click Here)


Established August 2012 and last edited on 23 Jan 2013 by Donald Forsdyke