ABSTRACT
Homeobox cluster genes (Hox genes) are highly conserved and can be usefully employed to study phyletic relation ships and the process of evolution itself. A phylogenetic survey of Hox genes shows an increase in gene number in some more recently evolved forms, particularly in verte brates. The gene increase has occurred through a two-step process involving first, gene expansion to form a cluster, and second, cluster duplication to form multiple clusters. We also describe data that suggests that non-Hox genes may be preferrentially associated with the Hox clusters and raise the possibility that this association may have an adaptive biological function. Hox gene loss may also play a role in evolution. Hox gene loss is well substantiated in the vertebrates, and we identify additional possible instances of gene loss in the echinoderms and urochordates based on PCR surveys. We point out the possible adaptive role of gene loss in evolution, and urge the extension of gene mapping studies to relevant species as a means of its sub stantiation.
INTRODUCTION
The homeobox system of genes is generally recognized as having useful attributes for the study of evolution (Shashikant et al., 1991; Kappen et al., 1993). Most prominent among these is its high level of conservation exemplified in the homeobox sequences and the structural organization of the Hox gene clusters. These properties make it possible to identify sequence motifs, genes, and gene clusters that are homologous, and thus can be compared both quantitatively and qualitatively with confidence over a broad spectrum of metazoans. That the homeobox genes play fundamental role in metazoan devel opment also suggests that they may themselves be important to the evolutionary process.
In a recent review (Ruddle et al., 1994), we have shown that homeobox genes have been reported for all the major phyla, this is also true for the clustered Hox genes with the exception of the sponges (Seimiya et al., 1992). Hox gene clusters have been directly demonstrated in Caenorhabditis elegans (Burglin et al., 1991; Burglin and Ruvkun, 1993), Tribolium castaneum (Beeman, 1987), Drosophila pseudoobscura (Randazzo et al., 1993), D. melanogaster (Lewis, 1978; Kaufman et al., 1990), Branchiostomafioridae (Holland et al., this volume; Pendleton et al., 1993), Petromywn marinus (Pendleton et al., 1993), and all jawed vertebrates so far examined. Moreover, good evidence has been presented for Hox cluster duplication giving rise to four clusters, each on a different chromosome, in all amniotes (Kappen et al., 1989; Schughart et al., 1989).
In addition to the clustered Hox genes there exist a number of related homeobox genes which are divergent with respect to the homeobox and other features (Kappen et al., 1993). These fall into a number of groups based on similarity, as for example the Paired, Caudal, and Distal-less type genes. We will refer to these genes as non-clustered or diverged homeobox genes. The homeobox genes have been shown to undergo duplication by both cis (laterally within a chromosome) or trans (chromo some duplication within a genome) processes (Kappen and Ruddle, 1993). Cis duplication can arise by unequal crossing over and trans duplication by polyploidization, although other mechanisms are also possible.
Ohno has suggested that gene duplication by polyploidy serves an important role in evolution (Ohno, 1970). It is argued that developmentally relevant genes become inte grated into developmental pathways that are hierarchical and highly interdependent, and thus they cannot readily mutate or take on new functions without disrupting the overall devel opmental plan. Gene duplication provides a way around this impasse by the retention of old developmental interrelations and the incorporation of newly duplicated genes into new pathways and relationships. The genomic and functional con servation of the homeobox system is consistent with this view (Ruddle et al., 1994). It is interesting that gene duplication is often cited as being adaptive, because it introduces genetic redundancy into developmental systems. However, as viewed here, redundancy might simply be a consequence of devel opmental conservatism. Developmental genes and particu larly homeotic genes are interactive transcriptionally and have been shown to have the properties of a combinatorial system (Wagner, 1994). The insertion of new genes into a network of genes can be expected to introduce new degrees of freedom, and thereby multiple possible avenues of evolu tionary divergence. In this respect, the increase in Hox clusters and the duplication of many non-clustered homeobox genes in the forerunners of the vertebrates can be postulated to have had a profound effect on their evolution and possibly to have contributed to vertebrate radiation (Gould and Eldredge, 1993).
Previous studies have provided support for the idea that the vertebrate Hox cluster gene family has arisen by means of a two step process: firstly, the expansion of the cluster by lateral gene duplication, and secondly the duplication of the clusters by a large domain duplication events, as for example, chro mosome or genome duplication (Ruddle, 1989; Kappen et al., 1989; Schughart et al., 1989). Sequence comparisons of the homeobox domain indicate a relatedness between paralogy groups 1-3, 4-8, and 9-13*. These groups have been termed anterior, medial, and posterior, respectively, based on their patterns of expression along the anterior/posterior axis (Kappen et al., 1989; Schubert et al., 1993; Ruddle et al., 1994)
. These relationships have suggested that these three groups of genes have arisen from an ancestral cluster of three genes (Schubert et al., 1993).
In this report, we will discuss two systems that relate to chordate evolution and gene duplication. One deals with the amplification of non-homeobox genes which are in linkage with the Hox clusters. The second deals with the possible loss of Hox genes in echinoderms and tunicates.
PARALOGOUS GENES IN LINKAGE WITH THE Hox GENE CLUSTERS
An examination of genes in the vicinity of the Hox clusters shows that many are paralogous and map to two or more of the four Hox gene cluster chromosomes (Rabin, et al., 1986; Ferguson-Smith and Ruddle, 1988; Schughart, et al., 1988; Schughart, et al., 1989; Ruddle, 1989; Craig and Craig, 1991, 1992; Hart et al., 1992; Lundin; 1993; Bentley et al., 1993). In order to study these genes more systematically, we have tabulated all the mouse genes that are members of gene families and of which at least one member maps to the chro mosomes bearing Hox gene clusters, namely chromosomes 2, 6, 11 and 15 (Silver, 1993; Siracusa and Abbot, 1993; Moore and Elliott, 1993; Buchberg and Camper, 1993; Mock et al., 1993). The Hox gene clusters themselves are not included in the sample to avoid bias. The identification of gene families is based on sequence similarity. Seventy-four families with a total of 323 genes were identified using these criteria. A rep resentative sample of these genes including 30 families and 203 genes is shown in Fig. 1. A statistical analysis was used to test whether there was excess clustering of members of the gene families on these four chromosomest. The hypothesis of no excess clustering could be rejected at the 0.01 level of con fidence. Four other chromosomes selected on the basis of size similarity to the Hox cluster-bearing chromosomes and number of mapped loci involved, namely mouse chromosomes 4, 9, 12, and 16, were subjected to the same analysis. In this instance the null hypothesis could be not rejected (P = 0.11).
The Hox gene clusters are estimated to have undergone duplication minimally 350 million years ago (Kappen et al., 1989). This figure may represent a gross underestimate since Forey and Janvier (1993) have determined the divergence date between lamprey and gnathostomes to be 435 million years. Sufficient time has elapsed since the duplication event to randomize genes throughout the genome. We base this suppo sition on the randomization of linkage relationships (demon strated) between the mouse and human genomes over a period of approximately 100 million years (Nadeau, 1989, 1991). However, our analysis indicates a proclivity of genes linked to the clusters to retain their primordial linkage relationships. This relationship is all the more striking when one limits con sideration to genes closely linked to the Hox gene clusters. In an extension of this study, we confined our analysis to genes mapping within approximately 30 centiMorgans (cM) centered on the Hox clusters. This distance represents approximately one half of chromosome 15, the smallest of the four chromo somes bearing Hox clusters. Paralogous genes linked within 15 cM to each side of the clusters showed a highly significant association with the clusters (P = 2×10−5). The probability score for paralogous genes outside this region on the same chromosomes was not significant (P = 0.18). Hence, the excess clustering of gene families initially observed for chromosomes 2, 6, 11, and 15 is due to specific clustering around the Hox gene complexes.
A possible explanation for these findings is that the linkage of genes to the Hox clusters is a simple structural remnant of the ancestral linkage pattern prior to cluster duplication. A second explanation is that the linkage of paralogous genes is conserved, because it serves an important biological function. This adaptive point of view is strengthened by the fact that many of the genes in linkage to the Hox cluster genes also serve a developmental function, such as growth factors, receptors, members of signaling pathways, and structural proteins having a developmental role such as the cytokeratins and collagens (see Fig. 1). It is also of interest and of possible significance that genes bearing a sequence or functional simi larity to the mammalian genes in linkage to the Hox clusters are also located in prox1m1ty to the Hox gene clusters in Caenorhabditis (Table 1) and Drosophila (Table 2).
Assuming that the linkage pattern described above is indeed adaptive, one might speculate on its biological basis. One pos sibility is that the expression of the linked genes is regulated by cis regulatory effects that extend over large distances throughout the region bounding the Hox gene cluster. This is consistent with findings that enhancers can modulate gene activity over distances in the range of a hundred kilobases (Forrester et al., 1987; Qian et al., 1991). A second possibility is that products of genes in the Hox cluster domains interact functionally, and thus are co-adaptive and require coordinated evolutionary modification in the sense of positive epistatic interactions. One can postulate that this can be most efficiently accomplished if the genes are in linkage and tend to assort together in populations. At present these and other notions must be regarded as highly speculative, but can serve as the bases of hypotheses to be tested by experimentation.
Hox CLUSTERS IN ECHINODERMS AND LOWER CHORDATES
The structure of the four Hox gene clusters in vertebrates implies that some Hox genes were lost following duplication (Kappen and Ruddle, 1993; Ruddle et al., 1994). Several pos sibilities exist concerning the history of cluster duplication (Kappen and Ruddle, 1993), and one of the simplest models involves the two-fold duplication of a single Hox gene cluster containing ancestral representatives of all 13 paralogy groups, followed by relatively rapid loss of individual genes or cluster segments. Sequence conservation within the homeodomain allows us to estimate the number of Hox cluster genes (paralogy groups 1-10) in a species by the polymerase chain reaction (PCR; Frohman et al., 1990; Murtha et al., 1991; Pendleton et al., 1993). In cases where the number of Hox cluster genes is known, such PCR surveys have shown a recovery rate of more than 85% in a single sampling (Pendleton et al., 1993; and unpublished data).
Surveys of Hox cluster gene number in primitive chordates and echinoderms have provided some curious insights con cerning the relationship between vertebrates and other deuterostomes. The hemichordate Saccoglossus kowaleskii most likely contains a single Hox gene cluster, with represen tatives from each of paralogy groups 1-9 (Pendleton et al., 1993). The Hox cluster number in the cephalochordate Bran chiostoma fioridae is more difficult to assess. Data from a PCR survey revealed the presence of 11 Hox cluster genes in amphioxus (Pendleton et al., 1993). On the basis of amino acid sequence similarity, the data predicted that these Hox genes would be distributed over two clusters. Due to the short sequence (82 bp) amplified by PCR and the high similarity between some paralogy groups, paralogy assignments based on homology are necessarily speculative. Detailed linkage analysis of B. fioridae Hox genes (Holland et al., this volume; Garcia-Fernandez and Holland, personal communication) reveals a single Hox cluster containing representatives from each of paralogy groups 1-10. Comparison of the sequences from both data sets would be quite informative. Our data could be consistent with a single Hox cluster in B. fioridae, although an argument could also be made for the existence of two genes in each of paralogy groups 1 and 3. Nineteen different Hox cluster genes were sampled in the agnathan Petromyzan marinus, suggesting that the closest extant relatives of the true vertebrates, the jawless fish, have at least two and most likely three or four Hox clusters (Pendleton et al., 1993).
Considering the important regulatory role that Hox cluster genes play during animal development (Shashikant et al., 1991; Ruddle et al., 1994), it is appropriate to examine Hox cluster structure in phyla that exhibit unique developmental qualities. The echinoderms comprise a deuterostome phylum that shows early developmental affinities with the the hemi chordates, a phylum suspected to have close affinities with the chordates, but they have a radically different adult body plan that includes secondarily derived radial symmetry. Four Hox cluster genes have been previously isolated from the Hawaiian sea urchin Tripneustes gratilla using hybridization techniques (Dolecki et al., 1986, 1989; Wang et al., 1990). Three of these are related to ‘medial’ paralogy groups (Table 3), and one is most likely a member of ‘posterior’ paralogy group 9. Our own PCR survey of the sea urchin species Strongylocentrotus pur puratus and Lytechinus variegatus identified homologs of these T. gratilla genes, as well as three other Hox cluster genes (Table 3). One of these, Hbox9, is most likely a fourth homolog of ‘medial’ group genes. Two others, Hbox7 and Hbox10, are most highly homologous to ‘posterior’ group 9, and do not appear to be related to ‘posterior’ paralogy groups 10-13. Inter estingly, Hbox7 and Hbox10 are more closely related to one another than either are to posterior paralogy group 9. Paralogy groups 10-13 in amniotes are thought to have arisen by serial tandem duplication events beginning from paralogy group 9 (Kappen et al., 1993; Schubert et al., 1993). This mechanism might also explain the origin of Hbox7 and Hboxl O in echin oderms, but the lack of homology between these sequences and paralogy groups 10-13 in amniotes suggests that such duplica tions took place independently after the divergence of echino derms and other deuterostomes.
Another unique distinction of sea urchin Hox clusters appears to be the curious absence of genes from ‘anterior’ paralogy groups 1-3. ‘Anterior’ Hox cluster genes have been reported for all other metazoan species examined (Except sponges, where no Hox cluster genes have yet been identified; Ruddle et al., 1994). One possible reason for failing to detect Hox cluster sequences in genomic DNA by PCR could be the presence of intrans in the homeodomain. Intrans that disrupt the homeodomain are rare, but the Drosophila ‘anterior’ Hox cluster genes labial and proboscipedia do contain homeo domain intrans. None of the known vertebrate Hox cluster genes have intrans in the homeodomain. The absence of ‘anterior’ Hox cluster genes in the echinoderms is also supported by the isolation of only four non-’ anterior’ genes by hybridization in T. gratilla (see above). Also compelling is the failure to identify ‘anterior’ Hox cluster genes in sea urchin RNA (P. Martinez, personnal communication and unpublished data).
The urochordates are a large and varied subphylum, but share distinct developmental affinities with the chordates. The ascidians, for example, have a tadpole larval stage where they quite literally resemble what could be described as primitive chordates. The larvae are free-swimming and contain struc tures such as a notochord, dorsal nervous system, a primitive brain with paired sensory organs, mesenchyme cells, etc. (Jeffery and Swalla, 1992). After the larvae attach to a substrate, metamorphosis occurs generating a sessile, filter feeding adult form rather unique among coelomates. A previ ously reported search for Hox cluster genes by hybridization in the ascidian Halocynthia roretzi (Saiga et al., 1991) failed to isolate any Hox cluster genes, as only one very diverged non-Hox cluster gene, AHoxl, was described.
We have surveyed four different ascidian species for Hox cluster sequences by PCR. In each case, only one Hox cluster gene sequence could be identified (Table 4). Even more intriguing is the fact that the single Hox gene sequence detected in one ascidian species is entirely different from the ones present in other species. Ciona intestinalis surveys show one Hox cluster sequence related to ‘medial’ (Hox paralogs 4- 8) class paralogy groups (Table 4), while Styela clava shows a clearly different ‘medial’ class Hox cluster sequence. Survey data from the more distantly related genus Molgula relate an even more curious tale. Molgula oculata represents a urodele ascidian species with a tailed larva from which a single Hox cluster gene related to ‘posterior’ paralogy group 10 was iden tified. The closely related species Molgula occulta represents an anural ascidian which displays a tailless larval form. No Hox cluster sequences were detected in M. occulta, while the same non-Hox cluster genes were found that were present in the survey of M. oculata (unpublished data). Hybrids can be formed between these two Molgula species resulting in a hatched larva with a short tail (Swalla and Jeffery, 1990). In a review article on ascidian development, Jeffery and Swalla (1992) comment that anural development has probably evolved more than once in ascidians, suggesting that it may be the consequence of a relatively small number of loss-of-function mutations.
The urochordate Hox cluster data is indeed puzzling. A PCR survey of the pelagic (non-sessile) tunicate Oikopleura dioica (Holland et al., this volume) again is compatible with the presence of a single Hox cluster sequence, and in this case it is most closely related to ‘anterior’ paralogy group 1. It therefore appears that Hox cluster genes from ‘anterior’, ‘medial’ and ‘posterior’ paralogy groups are represented in this phylum. Have the bulk of urochordate Hox cluster gene sequences diverged beyond detection? Have wholesale changes in genomic organization (e.g. intron insertions) occurred? These scenarios seem unlikely since a different Hox cluster gene is present for each genus examined. One possible model considers that the forerunner of the urochordates had a single, complete Hox gene cluster. By any of a number of mechanisms of adaptation, the requirement for Hox cluster function in this species was lost. The single Hox cluster genes that remain in the different urochordate species may have been co-opted for different roles in the newly evolved developmen tal mechanisms. An example of the kind of role taken by these single Hox cluster genes may have already been alluded to, that is, could the single Molgula Hox cluster gene described above be necessary for formation of the larval tail, with its loss resulting in anural development? Do any extant species exist that contain more than one Hox cluster gene? More rigorous examination of Hox cluster structure in the urochordates will be necessary to provide answers to these speculations.
DISCUSSION
In surveys for Hox cluster genes, representatives have been recovered for all of the major phyla with the sole exception of sponges. Moreover, a general correlation can be made between Hox gene number and the more recently evolved phyla (Ruddle et al., 1994). This result is consistent with the postulated con servative notion of development espoused by Ohno (1970). We refer to this process as the ‘gene freeze’ hypothesis. This hypothesis states that evolutionary innovations are facilitated by gene duplication of developmentally significant genes which allows the retention of old developmental functions and the introduction of new. The duplicated gene(s) then becomes integrated into the developmental plan of the organism and likewise becomes constrained with respect to change and is ‘frozen’. This view of the developmental-evolutionary process has additional interesting properties. If we assume that devel opmental control is combinatorial and that developmental genes (eg., Hox genes) are interactive in the form of gene networks then the addition of genes by duplication may increase developmental possibilities geometrically. In other words, the introduction of a single gene may introduce a broad range of developmental possibilities and corresponding evolu tionary options. Increasingly, evidence supports the view that evolution progresses in spurts (Gould and Eldredge, 1993). We submit that a discontinuous rate of evolutionary change is con sistent with the gene freeze model. In this respect, the dupli cation of clusters of Hox genes in the antecedents of the ver tebrates may have had important immediate consequences with respect to the vertebrate radiation.
Gene loss may also play an important role in the evolution of developmental mechanisms. In vertebrates, duplication events that created the four Hox gene clusters were presum ably quickly followed by gene loss until extant cluster struc tures became ‘frozen’ (Kappen and Ruddle, 1993). As yet no remnants of the lost Hox cluster genes (e.g. pseudogenes) have been detected. It will be of great interest to examine the ver tebrate classes in a detailed fashion with respect to the presence and absence of Hox cluster genes, since such data may possibly reveal the detailed patterns of gene loss, which in turn can give insight into class affinities. PCR surveys have provided com pelling evidence that Hox cluster gene loss may also have occurred in other deuterostomes, particularly in the echino derms and urochordates. Sea urchins appear to lack ‘anterior’ Hox cluster genes when they would be predicted to contain them on the basis of their phylogenetic position; they share a common ancestor with arthropods and chordates in which ‘anterior’ Hox cluster genes have been identified. It will be of interest to determine the extent to which ‘anterior’ Hox cluster gene loss, if true, has contributed to the unique anatomical characteristics shared by echinoderms. The ascidian Hox cluster gene data inspire curious speculation considering the central position accorded them with respect to vertebrate origins (Berrill, 1955). One historically prominent theory posits that the ascidian larva, by means of a neotenic process, represents the ancestral form for vertebrate evolution (Garstang, 1894). The apparent loss of most Hox cluster genes and associated functions, if true, suggests that the urochordates are a derived group, diverging rather early from the stem lineage leading to the vertebrates. In addition, Hox cluster gene loss in urochordates, as speculated for the echinoderms, may have had profound consequences regarding developmental pathways and adaptive adult morphology.
The Hox clusters provide an exceptional system for the study of developmental processes, especially with regard to morphology. We have demonstrated the great potential of various deuterostome phyla as model systems to study the role of Hox cluster genes in the evolution of morphology, and the advancement to more complex forms. The detailed study of Hox cluster structure and its evolution will provide new and exciting insights into the origins of the vertebrates and the mechanism of their development.
ACKNOWLEDGEMENTS
We thank several individuals for supplying materials used in these studies: specifically, Joe Minor for Strongylocentrotus DNA, Bill Klein for Lytechinus DNA, Tom Meedel for Ciona sperm, Billie Swalla for Styela and Molgula DNA, and Pedro Martinez for personal communications and insightful discussion. Thanks to S. Pafka for graphics and photography. This work was supported in part by NIH grant GM09966 to FHR.
REFERENCES
First, we calculated the percentage of total mapped loci that fall on each chromosome. For example, the percentages for chromosomes 2, 6, 11, and 15 are 7.4%, 4.6%, 6.3% and 3.8%, respectively. For a gene family, a hit on a particular chromosome is defined as at least one member of the family mapping to that chromosome. For each gene family, there can be a total of 1 to 4 hits, depending on the number of chromosomes containing hits. There are four single hit possibilities (one for each chromosome), six possible two hit combinations, four possible three-hit combinations, and one four-hit combination. The probability of each of these outcomes can be calculated directly from the percentages give above. We note that this probability needs to be corrected for the fact that the family must have contained at least one hit to be ascertained; hence, each multi-hit outcome proba bility is divided by the probability of at least one hit. This correction is similar, in spirit, to that used in segregation analysis for human recessive diseases, where families are ascer tained through at least one affected child (Elandt-Johnson, 1971).
From these probabilities, we then calculate the expected number of single, double, triple, and quadruple hits, and compare these with the observed numbers. From the prob ability distribution for number of hits, we calculate the exact probability of obtaining the actual observed number of hits or greater using simulation; it is these P values that we report. A significant excess of hits over expected indicates clustering of gene families.