The ultimate informativeness of the zebrafish mutations described in this issue will rest in part on the ability to clone these genes. However, the genetic infrastructure required for the positional cloning in zebrafish is still in its infancy. Here we report a reference cross panel of DNA, consisting of 520 F2 progeny (1040 meioses) that has been anchored to a zebrafish genetic linkage map by 102 simple sequence length polymorphisms. This reference cross DNA provides: (1) a panel of DNA from the cross that was used to construct the genetic linkage map, upon which polymorphic gene(s) and genetic markers can be mapped; (2) a fine order mapping tool, with a maximum resolution of 0.1 cM; and (3) a foundation for the development of a physical map (an ordered array of clones each containing a known portion of the genome). This reference cross DNA will serve as a resource enabling investigators to relate genes or genetic markers directly to a single genetic linkage map and avoid the problem of integrating different maps with different genetic markers, as must be currently done when using randomly amplified polymorphic DNA markers, or as has occurred with human genetic linkage maps.
Genetics, and in particular, genetic linkage maps are helping investigators locate genes involved in essential biological processes. Linkage maps have greatly accelerated the localization of genes in agricultural crops and animals, human, mouse, rat and most recently zebrafish. As outlined by several reviews (Kimmel, 1993; Mullins and Nusslein-Volhard, 1993; Driever et al., 1994; Fishman and Stainier, 1994; Nusslein-Volhard, 1994; Kimmel et al., 1995; Kuwada, 1995; Driever, 1995) and demonstrated in this issue, the zebrafish is an extra-ordinary genetic model system and it will play a major role in the identification of genes involved in normal and pathological processes of vertebrate development.
However, the question now is: how are the genes responsible for these mutants going to be identified? Some of the mutants will be caused by mutations in known genes having similar effects in other organisms, but the majority of the mutated genes will be found using various positional cloning techniques (Collins, 1992). Success of the positional cloning projects will depend on: (1) a large number of genetic markers (spaced every centiMorgan) whose order is well defined, (2) several different physical mapping tools: yeast artificial chromosomes (YAC) libraries, bacterial artificial chromosomes (BAC) libraries, somatic cell hybrid panels, (3) radiation hybrid panels, and (4) a little luck.
The first step in positional cloning is mapping the phenotype of interest to as small of a genetic interval as possible, typically about 1 centiMorgan (cM) or 625 kb for zebrafish (Postlethwait et al., 1994). Since the zebrafish genome is in 25 chromosomes (Endo and Ingalls, 1968), with an estimated genetic size approx. 2635 cM (c.f. Postlethwait et al., 1994) and a physical size of 1.7×109 bp (Postlethwait et al., 1994), approximately 2000 genetic markers will be required. These genetic markers need to be transferrable between crosses with a high degree of fidelity. The construction of the first genetic linkage map based on randomly amplified polymorphic DNAs (RAPDs) by Postlethwait et al. (1994) and commercial availability of these markers has set the foundation for the genetic map and demonstrated the utility of a genetic map for zebrafish. RAPD markers, while necessary and useful at this early stage of zebrafish genomics, are of limited utility until the loci are cloned and primers designed to selectively amplify these loci. Currently, RAPDs are difficult to use in diploids and it is necessary to generate a complete genetic linkage map every time a different cross is made. Proliferation of unique genetic linkage maps for every cross will make it difficult to integrate the various maps. Map integration is a challenging endeavor given that zebrafish strains are not inbred. The literature on genetic mapping in humans provides ample evidence of how difficult map integration can be, and of the amount of duplicate work it requires.
The European Backcross Collaborative Group for the mouse recently developed a panel of 982 backcross progeny that can be used for high resolution mapping, conserved order map comparisons and as the starting point for a physical map (Breen et al., 1994). Unfortunately, this cross was not the same one used to construct the genetic linkage map. Thus 1000 to 2000 genetic markers already placed on the genetic linkage map, must now be mapped on the backcross, resulting in marker substitutions (since some markers are not polymorphic in the backcross) and a tremendous duplication of effort. Even so, this first attempt to assemble a large reference panel of genetically related progeny has shown the power for placement of markers into a ‘fine’ order (0.1 cM).
To prevent the need for multiple genetic maps, we have used a strategy similar to that used for the mouse, and developed a DNA resource for mapping. We report a reference cross DNA panel for the zebrafish, consisting of 520 F2 progeny (providing 1040 meioses) derived from a single set of F1 parents and anchored to the genetic linkage map with 102 simple sequence length polymorphisms (SSLPs). The 102 SSLPs selected from the framework map are a subset of the more than 500 markers, we have now placed on the genetic linkage map. This initial set of SSLPs can now be used to map any cross (assuming the markers are polymorphic in the strains used). Since the genetic map cross is a subset of animals from the reference cross, map integration is automatic. We will provide DNA from this cross to the community, allowing for the placement of newly isolated genes and genetic markers on this cross. As genes are added to this cross, the degree of conserved gene segments with other organisms will become apparent and provide the foundation for identification of positional candidate genes.
MATERIALS AND METHODS
The strains used for the reference cross were derived as follows: AB strain fish (Chakrabarti et al., 1983) were obtained from the University of Oregon, Eugene, zebrafish facilities and maintained at Boston through four generations of inbreeding, and selected in each generation for absence of embryonic lethal phenotypes. A fourth generation female from this line was used for the cross. India (IN) strain fish originated from a collection of wild fish from the northeast of India in 1990. These IN fish were maintained through three generations of inbreeding, and a third generation male was used for the cross. Due to limited breeding and selection, IN fish are probably not free of mutations reducing larval viability.
Among the F1 progeny, one pair was selected to generate 790 F2 fish. This F1 pair was selected from ten tested pairs, since their progeny was free from developmental defects, since less than 25% of embryos or larvae died during the first 2 weeks of development. To generate the entire set of 790 F2 progeny, 11 egg clutches were obtained from crosses at weekly intervals, a total of about 1270 fertilized eggs. The balance of 480 represents fish that died at various stages, or could not be sexed unambiguously. All 790 F2 progeny were grown to adulthood and killed in accordance with the Massachusetts General Hospital Guidelines for Care and Use of Animals.
Each fish was homogenized in 2 ml of phosphate-buffered saline for 15 seconds using a polytron (Brinkmann). 12 ml of lysis buffer (50 mM Tris-HCl (pH 8.5), 100 mM EDTA (pH 8.0), 200 mM NaCl), 1 ml of Proteinase K (10 mg/ml), and 1 ml of 20% SDS were added.
Samples were mixed by inversion and incubated at 50°C overnight. DNAs were precipitated with isopropanol after an extraction step with phenol/chloroform (1:1 volume). The DNA was suspended in 3 ml of deionised H2O and the concentration of the nucleic acids assessed using spectrophotometry. For genotyping, the DNA was diluted to 4 ng/μl.
To protect against sample cross-contamination, every DNA sample was screened with two genetic markers that have four alleles segregating. Any sample that showed more than two alleles was removed from the data set and the DNA discarded. Duplicate DNA master plates (96 wells × 5 plates × 2 for the duplicate) were made and all PCR amplifications were derived from these plates. To ensure that there were no sample mix-ups between the primary plates and the duplicate plates, five genetic markers were typed in both sets and the genotypes were compared. All five genetic markers produced the same genotypes in both plates.
A zebrafish small insert genomic library was prepared as previously described (Jacob et al., 1991, 1995) with slight modifications. This library was prepared after digesting the genomic DNA from AB fish with one of the three restriction enzymes AluI, HaeIII, or RsaI (New England Biolabs). The digested DNA were ligated to BstXI adapters (Invitrogen) and cloned into an M13mp19 vector (Boehringer Mannheim) modified to include two BstXI sites to avoid formation of chimeric inserts and self ligation. Oligos (CA)15 and (GT)15 were used as probes, end-labeled with [γ32P]ATP (6000 Ci/mmol, Du Pont-New England Nuclear). The genomic library (approx. 250 plaques/150 mm Petri dish) was screened by plaque hybridization using Colony/Plaque Screen (Du Pont-New England Nuclear). Hybridization was carried out at 65°C in Church’s hybridization solution (Church and Gilbert, 1984) with the radiolabeled probes described above. Filters were washed at 65°C in 0.1×SSC (15 mM NaCl and 1.5 mM sodium citrate) and 0.1% SDS, and positive clones isolated.
Phage DNA was prepared using Qiagen columns (Qiagen). DNA sequencing was performed with an ABI 373A DNA Sequencer (Applied Biosystems) using the manufacturer’s Taq cycle sequencing protocol.
Selection of genetics markers
The 102 genetic markers genotyped on the DNA panel of the reference cross were selected based on the genetic linkage map (>500 genetic markers) for the zebrafish that we are constructing in our laboratory. The initial map was constructed using MAPMAKER (Lander et al., 1987). Linkage groups were determined using a two-point analysis. Local order was established by multipoint analysis. However, we have not yet employed the version of MAPMAKER designed to handle the three and four allele systems. Details of the construction of the genetic map will be published elsewhere (E. W. K., A. G., M. C. F., W. D., H. J. J. and others). For the map presented here, we selected markers from over 500 genetic markers on our SSLP map as close to 20 cM intervals as possible. This level of resolution provides investigators with (1) a starting point to use the markers immediately, while the genetic map is being completed, and (2) the opportunity to add genetic markers and genes to the map, thereby determining the genetic location relative to other genes and markers.
In a non-inbred organism the number of alleles identified is determined by the size of the general population and by the sample size studied. Since the various zebrafish strains have undergone several cycles of inbreeding and breeding within a closed colony, we selected three fish from four different strains: AB, IN, Tübingen, (Tü) and Top Long-fin (TL). Allele sizes were then determined by genotyping each of the 102 markers on all 12 fish, and by comparing the amplified product sizes to a known size standard (pBR322 digested with MspI, New England Biolabs). The G0s of the AB × IN reference cross were characterized separately.
The progeny of the AB × IN F2 intercross was genotyped as previously described (Jacob et al., 1991, 1995) with a modification to the PCR protocol: initial denaturation at 94°C for 3 minutes, 27 cycles of 92°C for 1 minute, 58°C for 1 minute and 72°C for 1.5 minutes, and a final extension period at 72°C for 7 minutes.
Map construction for the reference cross
Linkage analysis was performed using the MAPMAKER computer package (Lander et al., 1987). Linkage groups were constructed using two-point analysis and local order was determined using multipoint analysis. This framework map was then compared to the genetic map. The command ‘ripple’ was used to check whether the order of the markers in the framework map were correct.
To minimize the number of errors, we utilized the automatic error detection package in MAPMAKER (Lincoln and Lander, 1992), which flags double crossovers. Potential errors were checked against the autoradiograms and, where necessary, genotypes were repeated. All films were read at least twice and data entry was checked for mistakes.
As part of a project to integrate the RAPD map with the SSLP map, we have been working with Drs. John Postlethwait, Will Talbot and Michael Gates. While these data will be presented elsewhere, when applicable (Fig. 3).
All loci are SSLPs and are named in accordance with lab rules, where Z denotes the zebrafish and the number indicates the clone selected during the hybridization screen. It is important to bear in mind that these numbers are assay names only. Formally, once the map is completed and physically linked to each chromosome, each SSLP should receive a locus name. We suggest a system similar to that used in the dog, mouse, pig, and rat. The name consists of a species designation DR – for Danio rerio, followed by a D (for DNA marker), chromosome number, institution name (or lab name) and number based on the sequential placement of new markers for that chromosome by that group, e.g. DR-D1Mgh1 for the first Mgh marker on zebrafish chromosome 1. Where no confusion can result, the prefix DR– can be dropped. Gene names should follow the nomenclature rules for the human.
Strategy for constructing the reference cross
We have generated a collection of 790 F2 progeny from a single set of F1 parents derived from a G0 cross (AB × IN). The use of a single set of F1 parents removes the complications associated with generating maps in non-inbred organisms; therefore, a maximum of four alleles could segregate at any given marker (Fig. 1).
Our goal was a minimum of 500 F2 fish (1000 meioses) genotyped for each genetic marker; therefore, we selected 520 DNA samples. The 102 markers genotyped were selected from the mapping panel, which contains 44 F2s from the reference cross (Fig. 2). On average, we lost 20 genotypes per locus; yielding approx. 500 genotypes per locus (96% success rate). This translates to an equivalent of 1000 meioses with a maximum mapping resolution of 0.10 cM (100cM/1000 meioses). On a per fish basis this means that on average 3.8 loci were not typed (3.9%) in any given animal, with one notable exception. Sample 357 failed to produce a genotype in 99 out of 102 loci.
Although our stocks of DNA for F2s are not infinite, we obtained on average 0.5 mg of DNA per sample. Since the PCR protocol used requires a maximum of 10 ng of DNA per reaction, we are able to extend its use to at least 50,000 PCR amplifications.
Allele sizes of all 102 SSLPs were determined in three fish from four commonly used strains: AB, IN, Tü, TL, plus the G0s, as shown in Table 1. These allele sizes are relative and should only serve as a guide, since we did not use an internal size marker in each lane. The predicted size is based on the sequencing of the clone containing the SSR. The predicted size and primer sequences are shown in Table 2. In 11% of the cases the size of the amplified fragment denotes a new allele. It is important to note that while a large proportion of the markers are polymorphic between the four strains, there are also a large number of shared alleles amongst strains, about 30%. The degree to which the markers with shared alleles are informative depends upon the chance segregation of the common allele occurring in both G0s and the shared alleles being passed to both F1s.
As shown in Table 1, there are a large number of shared alleles between the different strains and the G0s. We chose not to use markers with shared alleles in the F1s (data not shown), thereby maximizing the map information content of the markers.
The location of 97 genetic markers on the framework map is shown in Fig. 3. Five genetic markers remain unlinked. Collectively the map and the unlinked markers cover 2520 cM (using the Kosambi map function). The markers on the map were chosen to provide a comprehensive representation of the known linkage groups in the over 500 SSLP genetic linkage map. However, this genetic linkage map has not been completed to the point of attaching the markers physically to the chromosomes. This limitation prevents us from knowing, with complete certainty, whether each linkage group represents one of the 25 chromosomes. Therefore, the map presented here must be considered to be an approximate framework map.
Coverage of the genome, in cM, can be estimated from the sum of the intervals plus the sum of coverage provided by the genetic markers at the ends of each linkage group and by estimating the coverage of the five unlinked makers (typically approx. 20 cM on either side). The coverage between the intervals totalled 1320 cM and the coverage at the ends of each linkage group (2 × 25 linkage groups × 20 cM) totalled 1000 cM, plus 200 cM (5 × 2 × 20 cM on either side) covered by the 5 unlinked markers yielding a total coverage of 2520 cM. Postlethwait et al. estimated the zebrafish genome to be approx. 2635 cM (Postlethwait et al., 1994). Consequently this SSLP map, although not complete provides excellent coverage of the genome. The orientation of the linkage groups is arbitrary. As would be expected, the maps constructed for the reference cross DNA panel were in complete agreement with the >500 SSLPs map, with one exception: linkage groups 8 and 21 were linked to one another. However, since the genetic map with a greater density of genetic markers did not attach these groups together, we ignored it. Five markers (Z374, Z536, Z644, Z1296, Z5075,) could not be linked together in the framework map, and the linkage groups to which they were assigned were not strong. We used these markers because they will eventually tie in with one of the existing linkage groups. The framework map reported here, covering the majority of the genome, provides a set of anchor markers. These markers will provide precise integration points between the complete genetic linkage map and any marker or gene placed on the reference cross DNA panel. In addition, the reference cross DNA panel will provide investigators with a set of recombinant chromosomes that can be used for fine structure mapping.
Availability of data and DNA
The data underlying this map – including the primer sequences, allele sizes, and genotypes for each locus are available electronically from a server maintained at the Cardiovascular Research Center. The data can be obtained from http://zebrafish.mgh.harvard.edu.Primer sequences have also been deposited with Research Genetics (Huntsville, Alabama), where they are stocked in aliquots appropriate for mapping experiments.
A subset of 48 DNAs (44 F2 animals, 2 F1s and 2 G0s) is available to the community for the initial placement of genes and genetic markers. Once the gene or marker has been mapped, DNA from the recombinant animals in the interval of interest may then be requested. Precise rules and requirements for obtaining DNA from the reference cross are also available on the zebrafish webserver at cvrc, mgh or via request. Three major requirements must be met before DNA can be released: (1) that the gene or marker be mapped using PCR, (2) that a marker has a demonstrated polymorphism between the AB and IN strains used to construct the reference cross, (3) that all mapped genes and genetic markers be posted in the reference cross database. The reference cross database will contain the actual genotypes and information regarding the gene, the primer sequences and the investigator who developed the primers. At the discretion of the investigator, the release of this information to the general public can be withheld until publication. Other guidelines, such as adherence to nomenclature rules will also need to be met.
The determination that a particular genetic marker is polymorphic is somewhat problematic given that the G0 and F1 DNA is relatively limited. Therefore, we will use the remaining approx. 270 F2 zebrafish as test samples, enabling investigators to develop a polymorphic marker without having to use any of the DNA from the cross.
We have developed a genetic mapping DNA resource for the zebrafish community that will provide fine resolution mapping, prevent the need for map integration, and provide the foundation for a physical map. In addition we will provide the DNA from zebrafish known to be recombinant in any interval between the anchored markers, providing a maximum mapping resolution of 0.10 cM. Postlethwait et al. have estimated that 1 cM is approximately 625 kb cM (Postlethwait et al., 1994). Therefore, this DNA panel has the capacity to provide resolution up to 62.5 kb. At this resolution, one could contemplate making a physical map out of any large insert library, (such as a P1 library). Thus the DNA panel will provide an efficient means of obtaining a precise map position for a gene or marker and for constructing a physical map, in a region of interest.
The high resolution mapping potential and direct relationship of the reference cross DNA panel to the genetic linkage map will facilitate determination of the conserved gene segments between other organisms. The ability to order physically close genes (within 62.5 kb) will provide essential information with regard to conserved gene sequences between the genome of the zebrafish and the genomes of other organisms. The markers used to anchor the reference cross were selected from the genetic linkage map consisting of SSLPs that we are currently constructing. However, it is important to bear in mind three points about the map reported here. First, the genetic linkage map with SSLPs is not yet complete and none of the linkage groups have been physically assigned to a chromosome. Consequently, the linkage groups have been arbitrarily assigned as independent. Second, since the genetic map has not been completed, marker order may change as the genetic map is resolved and physically linked to the chromosomes. However, this does not detract from the opportunity to place genes and genetic markers relative to the order reported here. Since the 44 F2 animals used to construct the genetic linkage map are a subset of the reference cross, any changes in the map can be easily corrected in the reference cross map. We will continually update the map for both the reference cross and the genetic map itself. This information will be displayed on our Web site. Third, there are likely to be errors in this data set (53,040 genotypes) and the genetic map data set. All map data sets contain some errors, even after extensive error checking. The most common error is the misassignment of a marker to the wrong linkage group. Even when a somatic cell hybrid panel is available, marker assignments have changed in the initital rat and mouse genetic maps (Howard J. Jacob unpublished data and Eric Lander personal communication). In the case of the zebrafish, linkage groups are determined by probability alone – leaving chances for more errors. The errors will be detected and corrected as additional markers are added to the map. However, the majority of the data is sound and will provide investigators with the opportunity to place genetic markers. It is important to note that the number of double recombinants is low for the approximate framework map presented here, demonstrating that the linkage groups do minimize recombinants. All recombinants were checked and in selected cases the assays were repeated to ensure that the recombinants were real. Where possible we have identified the linkage groups relative to the RAPD map (Fig. 3).
Investigators wishing to receive the DNA plate will need to contact our Web site and request a copy of the guidelines. Once the guidelines have been agreed to, the DNA plate will be sent. Due to the limited amount of DNA, only one plate can be sent to the laboratory. Investigators wishing to map a gene on the reference cross will be sent DNA from the F2 progeny of this cross that were not used to construct the reference cross, and 44 DNAs which were genotyped for anchor markers. The investigator will develop a polymorphic marker using test DNA and upon identification of a segregating polymorphism, will genotype 44 F2 animals as well as F1s and G0s from the reference cross DNA panel. The 44 animals will enable the investigator to map the gene or marker to a maximum resolution of 1.1 cM (100 cM/88 meioses). The genotype information will be incorporated into the map and into the reference cross data set. Note that the investigator will be able to use the data from the Web site to do the analysis. After analysis, the investigator will be able to request the DNAs from all recombinant animals in the interval containing the marker. These DNAs will allow the investigator the opportunity to map other closely linked markers or help in constructing a physical map of the interval. By using this approach the DNA from the reference cross will provide a long term resource (approx. 50,000 PCR amplifications per sample).
In conclusion, we have developed a resource that can be used to map genes and genetic markers to a high resolution and also provides the foundation for a future physical mapping project. This reference cross DNA panel has been anchored by 102 SSLPs. The SSLPs and the reference cross DNA panel are now available to the community. The utility of the reference cross will continue to increase as more markers are mapped and the density of the map improves. The framework map is tentative but usable, while work proceeds on completing the genetic map consisting of SSLPs and physically anchoring it to the chromosomes. The map and the reference cross will also play an important role in establishing conserved gene segments between the zebrafish genome and genomes of different organisms.
We would like to thank John Postlethwait, Will Talbot and Michael Gates for their help with integrating the two maps. M.S. is supported by the Fukuzawa Memorial Fund; S. W-Z. is supported by American Heart, Massachusetts Affiliate (13-426-934); M.G.M. is supported by the Stanley J. Sarnoff Endowment for Cardiovascular Science; L. P. and C. F. are supported by PROTECH; J.S.S. is supported by an individual NRSA grant (HL08968); MCF is supported by a grant from NHLBI (HL-49579) and a sponsored research agreement with Bristol Myers-Squibb, W. D. is supported in part by grants from NIH (HD29761), NSF (IBN-931469) and a sponsored research agreement with Bristol Myers-Squibb, H. J. J. is supported in part by grants from National Institute of Diabetes, Digestive and Kidney disease (DK46612), National Center for Research Resources (RR08888) and a sponsored research agreement with Bristol Myers-Squibb.