ABSTRACT
Alphoid satellite DNA is a family of sequences with an approximately 170 bp periodicity which is found near the centromere of all human chromosomes. The structure of the human Y-chromosome alphoid DNA has been studied in two somatic cell hybrids, 3E7 and 853 (Tyler-Smith & Brown, 1987). The 170 bp alphoid subunits are tandemly repeated and are organized into units approximately 5-7 kb long. A few variant units on the 3E7 Y chromosome contain two extra 170 bp subunits and are approximately 6-0 kb long; the variant units are present in two clusters at least 90 kb apart on the chromosome. On each Y chromosome there is a single major block of alphoid DNA: on the 3E7 Y chromosome it is approximately 440 kb long and on the 853 Y chromosome it is approximately 540 kb long. A long-range restriction map of the 853 block has been constructed covering approximately 1-1 mb of DNA. The distribution of restriction sites suggests that the sequences on one side of the alphoid block may be typical euchromatic DNA, while the sequences on the other side may be another satellite sequence.
(A) Introduction
The centromere is the part of the chromosome that interacts with the spindle apparatus and ensures correct segregation of the chromosome at cell division. During the last 7 years, centromeric DNA from several chromosomes of Saccharomyces cerevisiae has been analysed in detail (Clarke & Carbon, 1980; reviewed by Murray & Szostak, 1985). More recently, centromeric DNA from Schizosaccharomyces pombe has been characterized (Nakaseko et al. 1986). However, the DNA sequences that function as centromeres in mammalian chromosomes have not yet been identified. In both yeast species, an important step in the identification of the centromeric sequences was the construction of a physical map of a region of the chromosome which included the centromere. We would like to construct such maps for human chromosomes. The development of the technique of pulsed-field gel electrophoresis for analysing DNA molecules up to approximately 2000 kb in size (Schwartz & Cantor, 1984; Carle & Olson, 1984) has made it possible to begin this task. This paper will discuss the DNA sequences which have been found near the centromere of the human Y chromosome and will present a restriction map covering approximately 1-1 mb of DNA in this region.
(B) Tandemly repeated sequences found near human centromeres
The genomes of mammals are more than 100 times larger than those of yeasts and the currently available genetic and physical maps are less detailed. Consequently, known genes or anonymous unique sequences have not provided a good starting point for mapping mammalian centromeric region DNA. However, it has been known since the time of the first in situ hybridization experiments to mammalian chromosomes (Pardue & Gall, 1970; Jones & Corneo, 1971) that large blocks of tandemly repeated ‘satellite’ or ‘simple sequence’ DNA are commonly found near mammalian centromeres. These sequences can therefore provide probes with which we can begin to map these regions.
In human chromosomes, three families of tandemly repeated DNA sequences have been found in the centromeric regions. They are listed in Table 1 and each will be discussed briefly.
Alphoid DNA is characterized by a subunit size of approximately 170 bp. It has been found by in situ hybridization at or close to the centromere of all the chromosomes (Manuelidis. 1978; Mitchell, Gosden & Miller, 1985). Manuelidis (1978) reported hybridization to the centromeres of all the autosomes and to the X chromosome when she used a purified but uncloned 340bp EcoRl fragment as a probe. The extent of hybridization differed considerably on different chromosomes: 1. 3, 7, 10 and 19 were heavily labelled, while the Y chromosome was virtually unlabelled. Using a cloned alphoid probe of unknown chromosomal origin called p82H, Mitchell et al. ( 1985) found a more even distribution of the sequence family: all chromosomes, including the Y, were labelled.
Satellites 2 and 3 are the major simple sequence components of the classical satellites II and III/IV respectively (Prosser, Frommer, Paul & Vincent, 1986). Both consist of tandemly repeated variants of the 5 bp sequence TTCCA, although they differ in the extent to which they have diverged from this prototype repeat. Their similarities are more striking than their differences and here they will be considered as the single family satellite 2. 3. Using gradient-purified satellite DNA, Gosden et al. (1975) found two major sites of hybridization: in the pericentric region of chromosome 9 and on the long arm of the Y chromosome. and minor sites near the centromeres of chromosomes 1. 5. 7. 10, 12. 13, 14. 15, 16, 17, 20. 21 and 22. Using a cloned sequence from chromosome 15. Higgins et al. (1985) found hybridization to the pericentric regions of chromosomes 1. 9. 13, 14, 15, 16. 21. and 22. and to the Y-chromosome long arm.
Satellite 1 is the major simple sequence component of the very AT-rich classical satellite 1 and consists in part of variants of 17 bp and 25 bp units (Prosser et al. 1986). Its chromosomal distribution was determined using gradient-purified DNA (Gosden et al. 1975) and was similar to that of satellite 2. 3. although satellite 1 did not hybridize to chromosome 16. The hybridization of satellite 1 to the Y-chromosome long arm may be due to a second component present in this uncloned probe: the 2 · 4 kb sequence (Frommer. Prosser & Vincent. 1984). which is known to be located in this region of the genome (Cooke. Schmidke & Gosden. 1982). Thus the in situ hybridization experiments show that all the human chromosome centromeric regions share some features: they all contain alphoid DNA close to the primary constriction. However, they differ in the amounts of satellites 1 and 2, 3 that are present nearby: the satellites may form a large cytologically visible C band such as that on chromosome 9 or may be undetectable by m situ hybridization. Since the Y chromosome falls into the latter category, investigations of the Y-chromosome centromeric region DNA have begun with the study of the alphoid sequences. The Y chromosome has been particularly suitable for this work because it contains a relatively small amount of alphoid DNA.
(C) Y-chromosome alphoid DNA: short-range structure
The use of somatic cell hybrids containing single human chromosomes on a rodent background has made it possible to study the chromosome-specific organization of repeated sequences in detail (Beauchamp. Mitchell. Buckland & Bostock. 1979). The structure of Y-chromosome alphoid DNA has been studied in this way (Willard. 1985: Wolfe el al. 1985: Tyler-Smith & Brown. 1987) and the results are summarized as a diagram in Fig. 1. The basic structure is the 170 bp subunit, which is tandemly repeated 34 times to form an approximately 5 · 7 kb unit, which in turn is tandemly repeated to form an array several hundred kb long. This section will discuss the structures of the 170bp subunit, the 5-7 kb unit and their variants; section D will discuss the structure of the whole block.
(1) Cloning the alphoid DNA
The somatic cell hybrids used in the study of the Y-chromosome alphoid DNA were 3E7 (Marcus et al. 1976) and 853 (originally called 7631: Burk. Ma & Smith, 1985), independently derived cell lines each of which contains a different Y chromosome as its major but not sole human component. Wolfe et al. (1985) constructed a cosmid library from 3E7 DNA and used an X-chromosome-derived alphoid probe to identify two cosmids containing the major Y-chromosome alphoid sequence. Similarly. Tyler-Smith & Brown (1987) probed a 3E7 cosmid library with p82H (Mitchell et al. 1985) and identified nine cosmids containing the major alphoid sequence.
(2) The 170 bp subunit
Restriction enzyme mapping of the cloned DNA with Haelll showed a repeating structure with a periodicity of approximately 170 bp and sequencing of about 2 kb of DNA from each of two different units revealed the details of the structure (Tyler-Smith & Brown, 1987). The sequence data reported by Wolfe el al. (1985) contained errors and will not be discussed further. Individual 170bp subunits were between 76 and 86% homologous to one another. A Y-chromosome consensus sequence was derived: individual subunits were between 87 and 92% homologous to the consensus. The consensus sequence was compared with alphoid sequences from other chromosomes. For example, it was about 82% homologous to a chromosome-7 sequence (Jorgensen, Bostock & Bak, 1986) and was about 89 % homologous to the X-chromosome consensus sequence (Waye & Willard, 1985).
These results show that the Y-chromosome alphoid subunits are heterogeneous, but are generally more similar to one another than they are to alphoid sequences from other chromosomes.
(3) The 5·7kb unit
Restriction enzyme mapping of both cosmid clones and of genomic DNA showed that the 170 bp subunits were organized into a higher order structure with a periodicity of approximately 5 · 7 kb. Higher order structure has been described on other chromosomes: for example, the X-chromosome unit is approximately 2 kb in size and consists of 12 subunits (Waye & Willard, 1985). Three pieces of evidence demonstrated that there was very little sequence divergence between different Y-chromosome 5 · 7 kb units, (i) Partial sequencing of the same 220 bp region from 14 independently cloned units showed sequence differences at only two positions, (ii) More extensive sequencing of >2 kb of DNA from two different units showed only one difference outside an extra region of 340 bp present in one of the units, discussed below, (iii) Long-range restriction site mapping of the 853 alphoid block (section D & Fig. 3) showed that a site that was absent from one unit was absent from all units: no enzyme cut just once or a few times within the block. These pieces of evidence lead to the conclusion that different units are >99% homologous to one another.
(4) Variant units
Nevertheless, two variant regions of the unit were found in the 3E7 cell line. About one tenth of the units were approximately 6 · 0 kb long. Sequencing of the extra region showed that it consisted of two additional 170 bp subunits of typical Y-chromosome alphoid DNA. Restriction enzyme mapping with Avail revealed that most of the 6 · 0 kb units contained an Avail site, while all of the 5 · 7 kb units lacked the site. The site was not situated in the two extra subunits, but was a point mutation nearby. Mapping of both cosmid clones and genomic DNA showed that the variant units were organized into two clusters at least 90 kb apart on the chromosome. Neither of the two variants was detected on the 853 Y chromosome or on the additional Y chromosomes examined by Wolfe et al. (1985).
(D) Y-chromosome alphoid DNA: long-range structure
Initial studies of the long-range structure of the alphoid DNA in 3E7 and 853 cells were reported by Tyler-Smith & Brown (1987). Five restriction enzymes were identified which did not cut within either block of alphoid DNA; a sixth enzyme, Avail, cut within the 3E7 alphoid DNA at two clusters of sites but did not cut within the 853 alphoid DNA. These enzymes were used to digest high molecular weight DNA and the products of digestion were analysed by pulsed-field gel electrophoresis. The five non-cut enzymes each produced a single major alphoid DNA fragment. This suggested that most of the alphoid DNA was organized as a single block or possibly as multiple blocks with identical restriction site patterns. However, the quantitative estimate of the amount of alphoid DNA per Y chromosome reported by Wolfe et al. (1985). together with the finding that alphoid DNA is not interspersed with other sequences, suggested that there was almost certainly only a single block per Y chromosome. An upper limit to the size of this block was provided by the size of the smallest single restriction fragment. This was estimated at approximately 475 kb for the 3E7 Y chromosome and 575 kb for the 853 Y chromosome. Thus there was a size difference of about l(M)kb between the two blocks.
Sites outside the block were mapped by carrying out double digests. The resulting maps suggested that the flanking sequences were similar on the two chromosomes. A striking feature of these maps was the lack of some restriction sites which would be expected to occur frequently in human DNA. For example, on one side of the alphoid DNA there was between 85 and 1 (X) kb of DNA lacking both a Bam H1 and a Bell site. Their absence suggested that the flanking DNA was not typical human DNA. but might be another simple sequence. This is discussed further in section E. However, the true nature of the flanking sequences was not determined since no cosmid clones extended into them.
The observation of a large difference in size between the alphoid DNA blocks on the two Y chromosomes examined suggests that the tandemly repeated sequences may be very variable on a large scale in the human genome. The Avr/11 site variants provide evidence for variability on a smaller scale, similar to that found on chromosomes 17 and X by Willard et al. (1986).
(1) Detailed restriction site mapping of the 853 alphoid block
The long-range structure of the 853 alphoid DNA has recently been mapped in more detail. A total of 36 restriction enzyme specificities that do not occur within the alphoid DNA were identified. Genomic digests with 30 of these enzymes were analysed by pulsed-field gel electrophoresis. The analysis was facilitated by the use of an apparatus designed by E. M. Southern and known as a ‘waltzer’ (Anand. 1986; Southern. Anand. Brown & Fletcher. 1987). This apparatus has the advantage over the original designs of Schwartz & Cantor (1984) and Carle & Olson (1984) that the DNA tracks are straight. Consequently. the mobilities of DNA fragments in different tracks can be easily and accurately compared.
DNA samples digested singly with each of 26 restriction enzymes, including 24 different specificities. are shown in Fig. 2A.B. In most tracks, a single major alphoid band is seen. The sizes estimated for these bands range from approximately 540 kb for the Avail digest to approximately 12(X) kb for the Nael digest. Sizes measured using this apparatus were commonly 5 to 10% smaller than those of the same fragment measured using a Carle & Olson apparatus (Tyler-Smith & Brown. 1987). Some restriction enzymes. such as Cla\. produced multiple bands. The pattern did not change when the enzyme: DNA ratio was increased (results not shown); double digests showed that there were no sites within the alphoid block. The multiple bands may therefore reflect partial blockage at some sites e.g. by DNA methylation. Other enzymes, such as Noll. did not produce any bands within the limit of size fractionation of the gel. These enzymes therefore lack an accessible pair of sites within the region analysed.
Double digests were carried out with pairs of restriction enzymes and the sizes of the resulting alphoid DNA fragments were measured. Examples of the double digests are shown in Fig. 2C.D. From these and other such digests, a map of 1 • 1 mb of DNA was constructed, on which sites for 20 enzymes could be placed (Fig. 3). An additional seven enzymes were shown to lack even one accessible site within this region (Fig. 2C). The map shown in Fig. 3 provides some indirect evidence about the nature of the sequences flanking the alphoid DNA. hs implications will be discussed in the next section.
(E) What other DNA sequences are present in the Y-chromosome centromeric region?
(1) The left-hand flanking region
The restriction site map shown in Fig. 3 can be divided into three regions: the alphoid block and its left-hand and right-hand flanking sequences (Fig. 4). In the left-hand flanking region, a site for 13 of the 15 enzymes that are expected to cut frequently in human DNA occurs within approximately 20 kb of the alphoid-nonalphoid boundary, forming a striking cluster. The two exceptions are Sac l and Hpa I, these sites are approximately 35 and 55 kb, respectively, from the boundary. Sites for some enzymes that are expected to cut less frequently in human DNA arc dispersed over about 80kb (Clal, Sall and Sfil). This region of DNA, therefore, has a restriction site distribution similar to that expected in a stretch of typical human DNA consisting of a mixture of unique sequences and interspersed repeated sequences. It is possible that the left-hand boundary is the boundary between euchromatic and heterochromatic DNA.
(2) The right-hand flanking region
In contrast, the right-hand flanking region does not have the restriction site distribution expected for typical human DNA. For example, the closest Apal site to the boundary is approximately 4(X) kb away, while the closest Kpn\ site is approximately 2(K)kb away. These enzymes are expected to cut on average every 6 ·4 and 8 ·5 kb. respectively (Drmanac. Pctrovic, Glisin & Crkvenjakov. 1986). This region is therefore likely to consist of a simple sequence. The dispersion of sites suggests that the sequence is less homogeneous than the alphoid DNA. The small cluster of sites at the boundary might indicate a small stretch of more typical DNA or. alternatively, that the proposed simple sequence is rich in these sites throughout.
What could be the identity of this putative block of simple sequence DNA? The most obvious candidates are the satellites 2, 3 and 1, since they are known to be located near alphoid DNA on other chromosomes.
They were not detected by in situ hybridization near the centromere of the Y chromosome, but this failure is easily explained: the very strong signal from the long arm of the Y chromosome would mask a weak signal from the centromere. Fig. 5 sjiows an experiment that investigates the possibility that the flanking sequence is related to satellite 2, 3. 3E7 DNA or 853 DNA was digested with fic/I, Apa\ or S/î I, enzymes that produce DNA fragments in which the alphoid DNA is linked to its right-hand flanking sequence. One half of the gel was probed with an alphoid sequence and the other half with ÂHS5 (Cooke & Hindley. 1979), a satellite 2. 3 sequence. In each track, there is some satellite 2, 3 hybridization at the position of the alphoid fragment. This result is consistent with the hypothesis that satellite 2, 3 DNA may be physically linked to alphoid DNA. However, it does not demonstrate linkage conclusively because two different DNA fragments, one alphoid and one satellite, might comigrate by chance in each case. Additional experiments are required to resolve this issue.
(F) Conclusions
A complete map of a human centromeric region would be expected to extend from the euchromatic DNA on one arm of the chromosome, through the heterochromatic DNA, to the euchromatic DNA on the other arm. The technique of pulsed-field gel electrophoresis has made it possible to construct a restriction site map of the Y-chromosome centromeric region, which covers about 2 ·0% of the chromosome, and to identify three distinct areas on the map, including a block of alphoid DNA and a block of a second simple sequence. Thus, a substantial portion of the Y-chromosome centromeric region has been mapped. This map will form the basis for a more detailed structural analysis of the region and a systematic search for functional mammalian centromeric sequences.
ACKNOWLEDGEMENTS
I thank Lesley Taylor for growing cells, Martin Johnson for constructing the pulsed-field gel electrophoresis apparatus, Rakesh Anand for protocols for the preparation and analysis of high molecular weight DNA. Rakesh Anand and Ed Southern for reading the manuscript and Frank Caddick for photography. This work was funded by the Cancer Research Campaign.