To attempt description of the set of mRNA and protein (sialome) expressed in the salivary glands of the tick Ixodes scapularis, we randomly sequenced 735 clones of a full-length salivary gland cDNA library of this arthropod and performed Edman degradation of protein bands from salivary gland homogenates (SGH) and saliva separated by SDS-PAGE. The sequences were grouped into 410 clusters, of which 383 are not associated with known I. scapularis sequences. 15- and 17-protein bands from PAGE yielded amino-terminal information on the saliva and salivary gland gels,respectively. We attributed 19 of these sequences to translation products of the cDNA library. Full-length sequences were obtained for 87 clones. Among these protein sequences are several protease inhibitors of distinct classes,metalloproteases, novel proteins with histamine-binding domains, and several peptide families of unknown function displaying different conserved cysteine residues, many of which contain single Kunitz domains. This work provides information into the diversity of messages expressed in the salivary glands of I. scapularis, describes novel sequences that may be responsible for known biological activites, indicates further biological activities that may be present in I. scapularis saliva and identifies novel vaccine targets that may be used in Lyme disease prevention.
Saliva of blood-sucking arthropods contains a large array of antihemostatic, anti-inflammatory and immunomodulatory components(Ribeiro, 1995). Tick saliva has been proposed to be important for formation and maintenance of the feeding cavity in host skin (Ribeiro,1989); Wikel et al.,1994; Wikel,1996). The tick Ixodes scapularis, the main vector of Lyme disease in the eastern US, has a salivary apyrase(Ribeiro et al., 1985) that destroys ADP, a main agonist of platelet aggregation. Platelet-activating factor and collagen-induced platelet aggregation inhibitors also exist in I. scapularis saliva (Ribeiro et al., 1985), as do anticlotting agents(Ribeiro et al., 1985)including Ixolaris, an inhibitor of tissue factor/FVIIa(Francischetti et al., 2002),inhibitors of neutrophil activation(Ribeiro et al., 1990) and inhibitors of T-cell activation (Ribeiro et al., 1985). The latter activity is mediated, at least in part,by an undefined protein that binds IL-2(Gillespie et al., 2001). A salivary kininase in Ixodes(Ribeiro and Mather, 1998)destroys bradykinin, a mediator of pain and edema(Regoli and Barabe, 1980). The effects of inflammatory anaphylatoxins are also blocked, perhaps by the same kininase enzyme or other carboxypeptidases(Ribeiro and Spielman, 1986). I. Scapularis saliva also has an inhibitor of the alternative complement pathway, Isac, which was recently characterized molecularly(Ribeiro, 1987; Valenzuela et al., 2000). There is also evidence for the presence of salivary prostacyclin(Ribeiro et al., 1988) and prostaglandin E2 (Ribeiro et al., 1985). Prostaglandins, in particular E2 and F2α, have been described in saliva of other ticks(Dickinson et al., 1976; Higgs et al., 1976; Ribeiro et al., 1992); these prostaglandins are both vasodilators of skin vasculature and immunomodulators. Other than prostaglandins, Isac, the salivary anticomplement of I. scapularis (Valenzuela et al.,2000) and the anticlotting Ixolaris(Francischetti et al., 2002),no other pharmacologically active molecule in I. scapularis saliva has been molecularly characterized.
Tick saliva is also important in transmission of tick-borne pathogens for several reasons; it may enhance pathogen transmission, hypersensitivity to saliva may modify the site of inoculation of pathogens, and it may promote non-viremic transmission of viruses by cofeeding (Jones et al., 1987, 1990; Nuttall et al., 2000; Wikel et al., 1994; Wikel, 1996). A protein of unknown function (named SALP16) has been characterized by immunoscreening an expression salivary gland cDNA library obtained from I. scapularisnymphs (Das et al., 2000), as have 13 other immunodominant proteins from I. scapularis(Das et al., 2001).
The composition of I. scapularis saliva is interesting in the study of the biology of parasite—host relationships, the discovery of novel biologically active components, and the identification of novel vaccine targets against I. scapularis-vectored diseases. Toward these goals,we constructed a salivary gland cDNA library from blood-feeding I. scapularis and randomly sequenced 735 clones that yielded 410 cDNA clusters. Based on BLAST homology to other proteins in the non-redundant (NR)database, the presence of conserved domains of the SMART(Schultz et al., 2000) or Pfam(Bateman et al., 2000)databases, and the presence of a signal peptide indicative of secretion in these clones (Nielsen et al.,1997), we identified 100 clusters that are probably associated with secretory products. From these, we obtained full-length information on 87 different clones, herein reported, 19 of whose expression was confirmed by identification of their amino-terminal sequence in PVDF-transferred salivary proteins separated by SDS-PAGE. While descriptive in nature, this paper raises many hypotheses about the compositional diversity of blood-sucking arthropods and identifies several novel sequences that could have biological activity and possibly serve as vaccine targets.
Materials and methods
Water and organic compounds
All water used was of 18MΩ quality and was produced by a MilliQ apparatus (Millipore, Bedford, MA, USA). Organic compounds were obtained from Sigma Chemical Corporation (St Louis, MO, USA) or as stated.
Ticks and tick saliva
Tick saliva was obtained by inducing partially engorged adult female I. scapularis to salivate (3-4 days post-attachment to a rabbit) into capillary tubes using the modified pilocarpine induction method(Valenzuela et al., 2000). Tick salivary gland extracts were prepared by collecting glands from partially engorged female I. scapularis as described(Valenzuela et al., 2000). Glands were stored frozen at -75°C until needed.
Salivary gland cDNA library construction
I. scapularis salivary gland mRNA was isolated from 25 salivary gland pairs taken from adult females at days 3 and 4 after attachment to a rabbit host. The Micro-Fast Track mRNA isolation kit (Invitrogen, San Diego,CA, USA) was used to isolate mRNA, which was reverse transcribed to cDNA using Superscript II RNase H-reverse transcriptase (Gibco-BRL, Gaithersburg, MD,USA) and the CDS/3′ primer (Clontech, Palo Alto, CA, USA). Second-strand synthesis was performed using a polymerase chain reaction (PCR)-based protocol with the SMART III primer (Clontech) as the sense primer and the CDS/3′primer as antisense primer. These two primers create SfiI A and B sites at the ends of the nascent cDNA. Double-stranded cDNA was immediately treated with proteinase K (0.8μgμl-1) and washed three times with water using Amicon filters with a 100 kDa cutoff (Millipore). Double-strand cDNA was then digested with SfiI. cDNA was then fractionated using columns provided by the manufacturer (Clontech). Fractions containing cDNA of more than 400 base pairs (bp) were pooled, concentrated and washed three times with water using an Amicon filter with a 100 kDa cutoff. cDNA was concentrated and ligated into an 8-Triplex2 vector (Clontech). The resulting ligation reaction was packed using the Gigapack Gold III from Stratagene/Biocrest (Cedar Creek, TN, USA) following the manufacturer's specifications. The library thus obtained was plated by infecting log-phase XL1-blue cells (Clontech), and the amount of recombinants was determined by PCR using vector primers flanking the inserted cDNA and visualized on agarose gels with Ethidium Bromide. For more details, see Valenzuela et al.(2002).
Sequence of Ixodes scapularis cDNA library
The salivary gland cDNA library was plated to approximately 200 plaques per plate (150 mm diameter Petri dish). Randomly picked plaques were transferred to a 96-well polypropylene plate containing 100μl of water per well. The bacteriophage sample (5μl) was used as a template for a PCR reaction to amplify random cDNA using PT2F1 (5′-AAG TAC TCT AGC AAT TGT GAG C-3′), which is positioned upstream from the cDNA of interest (5′end), and PT2R1 (5′-CTC TTC GCT ATT ACG CCA GCT G-3′), which is positioned downstream from the cDNA of interest (3′ end). Platinum Taq polymerase (Gibco-BRL) was used for these reactions. After removal of primers, the PCR product was used as a template for a cycle-sequencing reaction using the DTCS labeling kit from Beckman Coulter Inc. (Fullerton, CA, USA). The primer used for sequencing (PT2F3) is upstream from the inserted cDNA and downstream from primer PT2F1. After cycle sequencing the samples, a cleaning step was done using the multiscreen PCR 96-well plate cleaning system from Millipore. Dried samples were immediately resuspended with 25μl of deionized ultrapure formamide (J. T. Baker,Phillipsburg, NJ, USA) and one drop of mineral oil was added to the top of each sample. Samples were sequenced immediately on a CEQ 2000 DNA sequencer(Beckman Coulter Inc.) or stored at -30°C.
Detailed description of the bioinformatic treatment of the data can be found elsewhere (Valenzuela et al.,2002). Briefly, primer and vector sequences were removed from raw sequences, compared against the GenBank non-redundant (NR) protein database using the standalone BlastX program found in the executable package at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/(Altschul et al., 1997) and searched against the Conserved Domains Database (CDD) (found at ftp://ftp.ncbi.nlm.nih.gov/pub/mmdb/cdd/),which includes all Pfam (Bateman et al.,2000) and Smart (Schultz et al., 2000) protein domains. The predicted translated proteins were searched for a secretory signal through the SignalP server(Nielsen et al., 1997). Sequences were clustered using the BlastN program(Altschul et al., 1990) as detailed before (Valenzuela et al.,2002), and the data presented in the format of Table 1 in this paper. The electronic version of the table has additional hyperlinks to ClustalX(Jeanmougin et al., 1998)alignments as well as FASTA-formatted sequences for all clusters. The electronic table is available upon request; e-mail: firstname.lastname@example.org.
Full-length sequencing of selected cDNA clones
A sample (4 μl) of the λ-phage containing the cDNA of interest was amplified using the PT2F1 and PT2R1 primers (same conditions as described above). The PCR samples were cleaned using the multiscreen PCR 96-well filtration system (Millipore). Cleaned samples were sequenced first with PT2F3 primer and subsequently with custom primers. Full-length sequences were again compared with databases as indicated for the nucleotide sequences above, and the data displayed as in Table 2, which has hyperlinks in its electronic version (available upon request to email@example.com).
SDS-polyacrylamide gel electrophoresis
NuPAGE 10% gels, 1 mm thick (Invitrogen), using reducing MES buffer, were electrophoresed according to the manufacturer's recommendations to resolve proteins in 60 μl of tick saliva. Salivary gland homogenates (SGH; 1.0 pairs per lane) were run in 12% gels under non-reducing conditions with Bis-Tris buffer. To estimate the molecular mass of detected proteins,SeeBlue™ markers from Invitrogen (myosin, bovine serum albumin, glutamic dehydrogenase, alcohol dehydrogenase, carbonic anhydrase, myoglobin, lysozyme,aprotinin and insulin, chain-B) were used. Samples were treated with NuPAGE LDS sample buffer (Invitrogen). For amino-terminal sequencing of the salivary proteins, the gels were transferred to PVDF membrane using 10 mmol l-1 Caps, pH 11.0, 10% methanol as the transfer buffer on a blot module for the XCell II Mini-Cell (Invitrogen). The membrane was stained with 0.025% Coomassie Blue in the absence of acetic acid. Stained bands were cut from the PVDF membrane and subjected to Edman degradation in a Procise sequencer (Perkin-Elmer Corporation). To find the cDNA sequences corresponding to the amino acid sequence obtained by Edman degradation, we wrote a search program that checked these amino acid sequences against the three possible protein translations of each cDNA sequence obtained in the mass sequencing project. A more detailed account of this program is found elsewhere(Valenzuela et al., 2002).
Characterization of the library by DNA sequencing of randomly selected clones
To investigate the transcriptome of the salivary glands of feeding adult female Ixodes scapularis ticks, we randomly sequenced 735 clones from our unidirectionally cloned library. After clustering these sequences using BlastN with a cutoff of 10E-60, we found 410 unique clusters. All sequences within each cluster were compared with the NR protein database using the BlastX program (Altschul et al.,1997) and with the CDD database, containing all Pfam and SMART motifs (Bateman et al., 2000; Schultz et al., 2000), using the RPSblast program (Altschul et al.,1997). The three possible reading frames of each sequence were inspected for long reading frames with an initial methionine residue followed by at least 40 residues; these were submitted to the SignalP server for verification of secretory signal peptide. The results for each cluster were compiled as shown in Table 1,which displays the 30 most abundant clusters of this cDNA collection. 13 of the 30 clusters are possibly related to secretory products as they display a signal secretion peptide signature(Nielsen et al., 1997). Five clusters have indications of being related to membrane-anchored or cytoplasmic proteins, while the remaining eight clusters give no conclusive indication of a leader signal peptide, probably due to diminished sequence quality at the 5′- end. Notably, seven of the 30 clusters have Kunitz domains, found in many protease inhibitors such as anticlotting proteins. Of these 30 clusters,six had highly significant matches to five previously published I. scapularis salivary proteins, all of which are from clusters having a predicted signal secretory peptide sequence. When comparing all 16 known salivary protein sequences of I. scapularis (as of September 20,2001) with the complete cDNA library described in this paper (using tBlastN),13 were found in the library with a confidence value of 1E-30 or better,indicating they corresponded to the same or very closely related proteins. The three reported protein sequences not found in the translation of our library are: the SALP9 protein (gi| 15428346), which matched the amino-terminal sequence of one of the clones and appears to be a signal sequence, yielding an E value of 1E-5 for the match; the salivary gland 16 kDa protein SALP16(gi| 12002008), which identifies four cDNA with varying scores ranging from 2E-5 to 3E-15, the best alignments indicating 40% sequence identity; and finally, the 26 kDa salivary protein B (gi| 15428306), which has no matches to our database.
The complete Table 1(available electronically; e-mail: firstname.lastname@example.org)containing 410 clusters was annotated to indicate whether each of the clusters is associated with a possibly secreted, probably housekeeping protein, or one of unknown function. These annotation and function assignments were based on both similarities to the NR or CDD databases and on whether the proteins indicate coding for a secretory signal peptide. We thus found 102 clusters possibly associated with secretory products. These 102 clusters account for a total of 310 sequences, or 42% of the cDNA database. Table 2 indicates the clusters possibly associated with secretory products, sorted alphabetically. The electronic version of the manuscript contains the tables for the clusters associated with probable housekeeping and unknown clusters, as well as links to all sequences, alignments and BLAST results.
Table 2 shows that, in addition to the 13 proteins indicated above, there are several clusters associated with anti-protease sequences or domains, such asα-2-macroglobulin and cystatin, and 28 clusters having the Kunitz domain found in soybean trypsin inhibitor. Two serpins were also found, one of which matches a previously reported I. ricinus sequence. One additional cluster has the SMART TIL signature of trypsin inhibitors. Possible inhibitors of platelet aggregation include disintegrins (four clusters) and thrombospondin (five clusters). Three clusters code for proteins having similarity to tick histamine-binding proteins, one of which has been already described in I. scapularis.
A sequence matching the antimicrobial defensin was found, but this clone is truncated and does not have the distal 5′ end of the starting methionine. Proteins or peptides with similarity to collagen or gap junction proteins are also represented, but their function is unknown. A serine carboxypeptidase, two serine proteases and metalloproteases appear to be secreted. More than 35 clusters are associated with proteins that are possibly secreted, but their function in tick feeding is not readily apparent. Also evident from Table 2 is the existence of several related proteins. Indeed, when the clustering of the database is done with a cutoff value of 1E-20 rather than 1E-60, several of these clusters collapse (for example, those labeled short proteins or those containing Kunitz domains), although the alignments indicate that these are composed of several different, but related, gene products (results not shown;see below).
Table 2A, available on request from the author (e-mail: email@example.com),contains information on clusters of sequences probably associated with housekeeping function. Three of these clusters, each containing only one sequence, all code for proteins of the 5′-nucleotidase family, a family previously associated with secreted salivary apyrase of mosquitoes. Of interest were also the finding of a sulfotransferase and an alkyl hydroperoxide reductase that could be linked to synthesis of sulfated products of secretion and salivary prostanoids, respectively.
Full-length sequence information on 87 clones
To obtain more information on this transcriptome collection, with emphasis on the messages possibly associated with secreted proteins (the sialome set),we obtained full-length sequence of 87 clones, the properties of which are summarized in Table 3. 62 of these sequences belong to seven distinct groups, obtained by comparing the sequences against themselves using the BlastP program with a cutoff value of 1E-20 (see Materials and methods for more detail).
Peptide group 1
Peptides from group 1 consist of 22 sequences(Table 3) representing the most abundant family of messages in the salivary gland library (cluster 1; Table 1). These sequences have high similarity to the 14 kDa protein of I. scapularis (gi|15428308), but have no other significant matches to the NR database. No motifs were found when compared to CDD database. All sequences have a putative signal peptide indicative of secretion, which end in the tripeptide Ala-Ala-Ala(Fig. 1). Alignment of these 22 novel sequences with the 14 kDa protein (gi| 15428308)(Fig. 1) indicates these proteins belong to three closely related families, ETC, HNC and HDC, for the amino-terminal sequence of the predicted mature peptides (see cladogram in Fig. 1). These proteins have a mature molecular mass predicted to vary from 9.1 to 11.5 kDa; most are basic in nature due to a lysine-rich carboxy-terminal region. They all possess a conserved sequence Asn-Gly-Thr-Arg-Pro, starting at position 5 of the putative mature protein, which was detected twice by Edman degradation of protein bands excised from gels subjected to SDS-PAGE from separated tick salivary proteins(see below). Except for one sequence (ISL 1342), all have six conserved cysteine residues. The function of these proteins remains elusive.
Peptide group 2
Peptides from group 2 (Table 3) are putative mature proteins varying in molecular mass from 6.5 to 8.4 kDa, of both basic and acidic pI. Four of the 13 proteins gave significant matches to Kunitz domains, indicating they may be protease inhibitors or otherwise interact with other protein domains. Most of the peptides gave BlastP matches to the NR protein database, indicating similarity to proteins annotated as protease inhibitors. Cysteine residues are conserved in most peptides of this group, as well as a N-X-T preceding the third conserved cysteine of the mature peptides(Fig. 2). Remarkably, there is significant conservation of the predicted signal peptide. In the first 24 amino acid positions, there are 12 positions that are identical or conserved(excluding the initial methionine), whereas for the remaining 63 ungapped positions there are 13 conserved positions. The χ2-test indicates these ratios to be significant at P=0.0223. This conservation of the signal peptide was observed earlier in a family of antimicrobial peptides of frog skin skin(Charpentier et al., 1998), and in semenogelins, a family of mammalian semen proteins(Lundwall and Lazure,1995).
To further investigate the nature of the peptide group 2, we built a hidden Markov model based on the alignment shown in Fig. 2, using the -f switch to allow for the presence of multiple domains in the resulting model. Search of the NR database produced six matches with an E value of 5.4E-005 or lower,three of which are the mouse, the rabbit and the human anticlotting protein,tissue factor pathway inhibitor (TFPI). TFPI is a blood coagulation inhibitor containing three tandem Kunitz domains; two of these domains have been demonstrated to interact with Factor VIIa or Factor Xa(Girard et al., 1989). Single Kunitz molecules with specificity for Factor VIIa or elastase have also been characterized in libraries from phage display(Dennis and Lazarus, 1994) and from extracts of the parasitic worm Ancylostoma ceylanicum ceylanicum(Milstone et al., 2000),respectively. The model also recognized another I. scapularissalivary protein, SALP10, but with a higher (less significant) E value of 1.9E-4.
Peptide group 3
Group 3 cDNA sequences code for short peptides of mature molecular mass ranging from 3.5-4.8 kDa of both basic and acidic nature(Table 3). All sequences are relatively glycine- and proline-rich. Some sequences give weak matches to proteins in the NR database annotated as collagen; these possess two conserved cysteine residues in the mature peptide and remarkable conservation of the secretory signal peptide (Fig. 3). All amino acid sites of the predicted signal secretory peptide are conserved, against 18 of 35 sites on the mature peptide. Aχ 2-test is significant at P=0.0422. It is possible some of these sequences are alleles of an extremely polymorphic locus or,alternatively, that they represent different conserved loci. The possible function of these peptides remains elusive.
Peptide group 4
Group 4 sequences code for putative mature peptides having four conserved cysteine residues, molecular mass 7.9-8.7 kDa, of both basic and acidic nature. All display strong similarity (BlastP against NR database) to a protein from I. scapularis named SALP10 (gi| 15428348), and weak similarities to mammalian tissue pathway inhibitor (TFPI) and bungarotoxin. 19 of 21 first amino acids (excluding initial methionine) are conserved (Fig. 4), as compared to 33 of the 69 amino acids of the mature peptide. This difference is highly significant (χ2-test, P<0.001), indicating higher conservation of the signal peptide rather than the mature protein. An HMM model made from the alignment shown in Fig. 4 retrieved only SALP10 from the NR database, with an E value of 1.9E-070 but no other significant matches.
Peptide group 5
Three sequences in group 5 (Table 3) code for proteins of mature molecular mass ranging from 33.7 to 35.5 kDa of a basic nature, and having 24 conserved cysteine residues(Fig. 5). Comparisons with the NR protein database using BlastP indicate similarities to proteins annotated as protease inhibitors, including TFPI and the protein Ixolaris, an inhibitor of Factor VIIa (Francischetti et al.,2002). ISL228_Cluster344 has a Kunitz domain, as indicated by the SMART database. These proteins probably code for anti-clotting compounds.
Peptide group 6
Group 6 represents sequences giving similarities to histamine-binding proteins (Table 3, Fig. 6). ISL1040_cluster233 has no matches to the NR protein database but has a significant match by RPSBlast to the Pfam histamine-binding domain, whereas ISL1276_cluster 363 has no such match but instead has similarity to the tick Rhipicephalus apendiculatus histamine-binding protein found in the NR protein database. These two proteins are mildly acidic and have a mature molecular mass of 32.6 kDa and 34.6 kDa, respectively. It is probable that these proteins function by binding histamine or other small ligands.
Peptide group 7
The two sequences in group 7 match a sequence deposited in the NR database from I. scapularis and annotated as thrombospondin. The two predicted mature sequences, with eight conserved cysteine residues, code for two peptides of molecular mass 10.2 and 11.6 kDa, one basic and the other acidic in nature. Their similarities to thrombospondin proteins are not apparent. Both sequences have weak similarities to disintegrin metalloproteases, and ISL373_cluster33 has the cysteine-rich domain of ADAM proteases as predicted by the Pfam database. No RGD domains found in disintegrins are observed in these sequences, nor in any of the other sequences reported in Table 3. Fig. 7 shows the alignment of the two proteins with the Ixodes thrombospondin found in the NR database. The role of the cysteine-rich domain of ADAM proteases is not known but it is postulated to interact with integrins and/or other attachment motifs of cells and matrix proteins (Hooper,1994). Accordingly, these peptides could be involved in disruption of platelet aggregation, cell-matrix interactions and/or inhibition of angiogenesis (Roberts,1996).
The remaining 24 novel sequences presented in this paper can be grouped as:(i) similar to previously reported I. scapularis salivary proteins;(ii) a novel, shorter, protein with a Pfam histamine-binding motif, but not similar to other HBP found in the NR database (when compared by a BlastP search); (iii) five novel proteins coding for different inhibitors of proteolytic activity; (iv) six enzymes; and (v) ten proteins probably secreted and with unknown function.
Messages coding for proteins similar, but not identical, to previously reported I. scapularis sequences
ISL1083_cluster9 is 95% identical to the previously reported 25 kDa proteins of I. scapularis (not shown) and may represent an allele of a highly polymorphic gene or another closely related gene. ISL1083_cluster9,which does not display a histamine-binding motif, is highly similar to two other proteins found in the NR database that are also from Ixodes scapularis salivary gland cDNA libraries and annotated as histamine-binding, 17 kDa proteins. The alignment of the four sequences shows highly conserved areas, including the putative secretory signal peptide(Fig. 8). The mature 17 kDa protein is a truncated version of the other three proteins, containing two conserved cysteine residues in the mature form, while the remaining proteins have an additional four cysteine residues. These proteins may have a function in blood feeding by binding small mediator molecules involved in hemostasis or inflammation.
Novel putative protein containing the histamine-binding domain
ISL868_cluster49 has no similarities to proteins in the NR database but has a histamine-binding motif, a predicted signal peptide, and the molecular mass of the mature protein is 23.3 kDa. This molecular mass is similar to ISL1083_cluster9 analyzed above, I. scapularis 25 kDa protein A, and I. scapularis histamine-binding protein, to which ISL868 may be distantly related.
Sequences coding for different protease inhibitors
Five predicted proteins appear to function as protease inhibitors. ISL1095_cluster291, an α-2-macroglobulin truncated clone with highest similarity to the Limulus protein, also demonstrates very high similarity to vertebrate proteins. These protein inhibitors are very large and entrap the proteases that they inhibit; they may also bind to cytokines(Armstrong and Quigley, 1999; Borth, 1992). Because the clone we describe in this paper is the truncated carboxyterminal region, we do not know whether there is a signal peptide indicative of secretion coded in this message.
ISL888_cluster62 codes for a secreted peptide with mature molecular mass of 11.9 kDa containing the cystatin domain of cysteine protease inhibitors; 15 kDa cystatin has been described previously in several nematodes(Dainichi et al., 2001; Hartmann et al., 1997; Manoury et al., 2001). These nematode cystatins inhibit the lymphocyte asparaginyl endopeptidase involved in class II antigen processing in human B cells and inhibit T-cell proliferation. A similar function may be served by ISL888_cluster62.
ISTA397_cluster68 is similar to the I. scapularis TFPI-like molecule Ixolaris (alignment in Fig. 9), a molecule containing one complete and one incomplete Kunitz domain (Francischetti et al.,2002). ISTA397_cluster68 has the same number of cysteine residues in the first and second Kunitz domains as does Ixolaris. ISTA397_cluster68 may accordingly work also as a TFPI, or inhibit some other proteases such as chymotrypsin or trypsin (Petersen et al.,1996).
ISL1156_cluster318 codes for a 10 kDa peptide with a Kunitz domain, having considerable similarity to other proteins from the NR database annotated as protease inhibitors of both vertebrate and invertebrate origins.
Finally, ISL1268_cluster360 codes for a mature protein of 20.8 kDa with a serpin motif, highly similar to Limulus coagulation inhibitor and to other serine protease inhibitors of both vertebrate and invertebrate origins. Interestingly, the mRNA has two open reading frames, both of which code for serpins, one with a typical secretory peptide, the other apparently leading to an intracellular protein. The specificity and activity of these putative protease inhibitors remain to be determined.
Sequences coding for different enzymes
Six clones are reported to code for enzymes. ISL1194_5nuc codes for a protein with high similarity to invertebrate and vertebrate 5′-nucleotidases and apyrases. 5′-nucleotidases have a signal peptide indicative of secretion, which causes the protein to be expressed extracellularly, and a carboxy terminus in which a GPI anchor fixes the protein to the extracellular side of the membrane(Ogata et al., 1990). The GPI anchor is attached to a conserved serine residue, followed by a stretch containing 15 or 16 hydrophobic amino acid residues. Neither mosquito salivary apyrase, a secreted enzyme, nor a 5′-nucleotidase of sand fly saliva,has this conserved serine. These enzymes also lack the hydrophobic carboxy terminus, allowing the enzyme to be secreted(Champagne et al., 1995; Charlab et al., 1999). Analysis of the carboxy terminus of ISL1194_5nuc(Fig. 10) shows that it does not have the conserved serine found in mammalian and constitutive tick 5′-nucleotidases. Instead of 15-16 hydrophobic residues, it contains only eight such residues. Furthermore, it contains four charged (K+E) and three polar (T+S) residues, making the carboxy terminus unlikely to be intramembranous. ISL1194_5nuc is thus possibly responsible for the previously described salivary apyrase of I. scapularis(Ribeiro et al., 1985), or may code for a secreted 5′-nucleotidase.
ISL1316_cluster379 codes for a serine carboxypeptidase containing a signal peptide indicative of secretion. The specificity of this putative carboxypeptidase is unknown. It probably does not code for the previously described kininase activity of I. scapularis saliva, which has kinetic characteristics of another family of peptidases, the angiotensin converting enzymes (ACE) (Ribeiro and Mather, 1998). ISL1316_cluster379 carboxypeptidase could, however,be the salivary enzyme described previously to inactivate the serum anaphylatoxins C3a and C5a (Ribeiro and Spielman, 1986).
ISL812_cluster188 codes for a protein with high similarity to proteins from the NR database annotated as chymotrypsin, elastase, enterokinase and enteropeptidase. The best protein match is from a protease from the tick Haemaphysalis longicornis(Mulenga et al., 1999). ISL812_cluster188 putative protein has a strong signal anchor as determined by the SignalP program. It probably is not secreted and serves a housekeeping function.
ISL1033_cluster65 and ISL1324_cluster383 have very high similarity to a hypothetical protein from the tick I. ricinus and to other proteins in the NR database annotated as disintegrins and metalloproteases. Both have the Pfam reprolysin motif indicative of a zinc metalloprotease family, most commonly found in snake venoms (Hooper,1994). Neither has a signal sequence indicative of secretion;however, the amino-terminal sequences for both were found in protein bands of one-dimensional electrophoresis of saliva samples (see below).
Finally, ISL939_cluster238 has very high similarity to Drosophila melanogaster NADH-ubiquinone oxidoreductase, a typical mitochondrial enzyme ranging in molecular mass from 69 to 75kDa, and to other proteins annotated as deoxyguanosine/deoxyadenosine kinases, consistent with the finding of a deoxynucleoside (DNK) motif from the Pfam database. DNK are 44 to 56 kDa enzymes described on both mitochondria and cytosol(http://brenda.bc.uni-koeln.de). ISL939_cluster238 codes for a putative protein containing a signal peptide indicative of secretion, with a mature molecular mass of 45.8 kDa. It is thus possible that ISL939_cluster238 codes for a secreted DNK in saliva with an unknown function in the tick feeding process.
Sequence coding for proteins of unknown function
Eleven additional clones were fully sequenced, either because they represented abundant clones or because their partial sequence contained a signal peptide indicative of secretion. Although all of these full-length clones code for putative proteins displaying a signal peptide indicative of secretion, no function was indicated when their sequences were compared to the NR or CDD database. ISTB418_cluster 179 codes for a 4.3 kDa basic peptide with similarity to human and murine proteins of unknown function. ISL942_cluster53 has similarity to a Borrelia burgdorferi protein (E value 1E-4) and weak similarity to a tick histamine-binding protein (E value 0.006). This putative protein, and that coded by ISL1270_cluster22, has a predicted mature molecular mass of 22.5-22.6 kDa, similar to the protein described in Table 3 as histamine-binding,not group 5 (ISL868_cluster49). Alignments of these three putative proteins reveal no obvious similarities (not shown).
Initial characterization of the proteome set of Ixodes scapularis
To obtain further information on the salivary proteome set of I. scapularis, electrophoresis of saliva and SGH were performed by one-dimensional SDS-PAGE followed by transference of the proteins to PVDF membranes, staining with Coomassie Blue, and submission of the cut bands to Edman degradation. 15 and 19 bands yielded useful sequence information from saliva and salivary gland gels, respectively (Figs 11, 12). With the exception of one larger molecular mass band in the saliva gel (FEVGKDYYY...), and three sequences on the SGH gel, we tentatively assigned all other sequences to a gene product, as follows.
Sequences originating from proteins in saliva included two matching rabbit albumin and one matching the α-chain of rabbit hemoglobin. Similarly,the SGH-derived sequences included both the α- and β-chains of rabbit hemoglobin as well as a sequence with high similarity to Ig-κlight chain.
Amino-terminal sequences matching putative proteins coded by cDNA sequences from cluster 1
Two sequences in each of the two gels fractionating saliva and SGH matched putative proteins belonging to the most abundant cluster of cDNA sequences. The observed amino-terminal sequences matched those predicted by the SignalP program. Mature sequences from group 1 peptides start with either HX or ET,followed by C-[QKRQ]-NGTRPAS (see above and Fig. 1). Accordingly, the sequences HNXQNG-TRPASEENREGXDY and HKXQNGTRPASEKNREGXDY were obtained from protein bands of saliva separated by SDS-PAGE and corresponding to the sequences of clones ISL1129 and TB222. Gels from SGH yielded the Edman degradation products HNXQDGTRPASE and HNXKNGTRPASE, matching clones ISTA48 and TA379 for which we do not have full-length sequences. Notably, although proteins from group 1 (Table 3)vary in molecular mass from 9.3 to 11.5 kDa, they all are located in the 20-24 kDa region in both gels. It is thus possible that the proteins of this cluster make dimers through disulfide bridges even when the samples are run under reducing conditions or, alternatively, they may be modified by post-translation mechanisms such as glycosylation.
Amino-terminal sequences matching putative proteins coded by cDNA sequences from cluster 14
Two proteins belonging to cluster 14 were also represented in both gels and, in both cases, represented by the pair of sequences from clones ISTB346 and ISL914. The observed amino-terminal sequences are in agreement with the mature peptide sequence predicted by the SignalP program. Although the mature peptide predicted by ISL914_cluster14 is 7 kDa, it was found in the 10-12 kDa regions of the reduced saliva gel and in the 30 kDa region of the non-reduced SGH gel, indicating that these molecules may form multimers through disulfide bridges. Alternatively, this peptide may have a compact structure in its oxidized state that precludes sufficient binding of SDS, leading to less charge and apparently higher molecular mass in the gel experiment(Pitt-Rivers and Impiombato,1968). No Asn glycosylation sites were found in ISL914_cluster14 or in ISTB346.
Amino-terminal sequence matching the tick anticomplement protein,Isac
The sequence SEDGLE... obtained from saliva run in the SDS-PAGE gel and the tripeptide SED on the SGH gel were found in a location with an apparent molecular mass of 48 kDa (Figs 11, 12), matching the previously reported inhibitor of the C3 convertase, Isac(Valenzuela et al., 2000). Isac has a molecular mass of 18.5 kDa but behaves in gel chromatography as though it has a larger molecular mass than predicted(Valenzuela et al., 2000).
Amino-terminal sequences from salivary proteins matching putative proteins within the metalloprotease reprolysin domain
Two amino-terminal sequences were obtained from the gel used to separate tick saliva that match metalloproteases having the reprolysin domain. These two clones (ISL1324 and ISL1033) were fully sequenced as described above. ISL1033_cluster65 codes for a 44.1 kDa protein, while ISL1324_cluster383 codes for a 46.1 kDa protein. The SignalP program does not predict these protein to be secreted. The observed amino-terminal sequences represent unusually distant sites from the starting methionine residue, at positions 49 and 72, predicting mature proteins of 36.7 and 38.2 kDa and compatible with their migration on gels (Fig. 11, Table 3). These proteins may be secreted by a different pathway from the other proteins, perhaps a product of apocrine secretion (Aumuller et al.,1999). They may also be the result of proteolytic processing of a pro-enzyme. It is also possible that both clones are truncated at their 5′-end, where a conserved stretch of 169 residues is sandwiched between the pre- and proproteinase in snake venom metalloproteases(Jia et al., 1996). Indeed,ISL033_cluster65 is very similar to a hypothetical protein of I. ricinus (gi|5911708), which contains a longer predicted amino-terminal. These metalloproteases may be involved in digestion of skin matrix constituents or fibrinogen, like the hemorrhagic metalloproteases of snake venoms (Leonardi et al.,1999; Tortorella et al.,1998).
Presence in saliva of the peptide coded by clones TA242, ISL1014 and ISL818
These clones were classified as being of unknown function because they did not produce any significant matches when compared with protein NR or CDD databases. Their aminoterminal sequences, as predicted by the SignalP program,were found in protein bands of saliva separated by SDS-PAGE.
Calreticulin sequences of SGH proteins
The sequences DPTVYFK... and DPAIYFK..., found in protein bands from SDS-PAGE-separated SGH, match the secreted calreticulin of the tick Amblyomma americanum (gi|3924593) and rat calreticulin(gi|11693172), respectively (Fig. 11). We have not found any sequence matching calreticulin in our own library, which appears to be underrepresented for cDNA sequences coding for proteins of molecular mass greater than 50 kDa. These two amino-terminal sequences indicate that calreticulins, abundant intracellular proteins(Nash et al., 1994), are probably produced in I. scapularis salivary glands, although their secretory nature is not obvious.
Housekeeping and other protein sequences found in SGH proteins
The sequence AKDFIAGGVA matches those from cluster 64 with very high similarity to the mitochondrial carrier enzyme ATP/ADP translocase. The sequence MQIFV..., matching the cDNA clone ISL844 from cluster 201, has very high similarity to ubiquitin. The amino-terminal sequence DPIMGYT... was not found in the possible translations of our cDNA library but does match putative oxidoreductases found in the NR protein database. Finally, the sequence NEDLIL... does not match any possible translation product of our cDNA library but does match the SALP17 protein from I. scapularis(gi|15428298) at position 112. The protein sequence ARXDAYDNXSGIRARLH matched clone TB210.
We constructed a PCR-based cDNA library from the salivary glands of the tick I. scapularis, sequenced 735 random clones, clustered the cDNA sequences based on a BLAST algorithm, and obtained full-length information on 87 novel proteins and peptides, most of which appear to be secreted in saliva. Further, we collected information on amino-terminal sequences from proteins from saliva and SGH by SDS-PAGE. We confirmed expression for 19 proteins,including four members of the most abundant cDNA population (cluster 1), two members of another abundant cDNA cluster (cluster 14), two secreted zinc metalloproteases of the reprolysin family (the previously identified anticomplement peptide), and three proteins of unknown function. Several tick-host proteins were found in both saliva and SGH. While the possible function and structure of the sequences obtained are described in Results, two additional items remain to be discussed: (i) observation of a large redundancy of related sequences and (ii) origin of host proteins in saliva and SGH.
Our library contains a remarkably large degree of redundancy, as shown by the many related mRNAs, most of which are too different to be alleles from polymorphic loci. In addition to those shown in Figs 1,2,3,4,5,6,7,8,9,10,the previously reported salivary anticomplement protein (gi|8896135) is 82% identical to SALP20 (gi|5428300)(Das et al., 2001). The long evolutionary history of ticks may be responsible for this complex plethora of related proteins. Indeed, when we sequenced similar salivary cDNA libraries from sand flies (Charlab et al.,1999; Valenzuela et al.,2001), and mosquitoes(Valenzuela et al., 2002), we found far less diversity of related molecules. This variability in the tick salivary cDNA library is consistent with the reported high polymorphism of salivary proteins among individual ticks analyzed by SDS-PAGE(Wang et al., 1999). The adaptive role of this gene-duplication phenomenon may derive from divergence of functions in duplicated genes. For example, a Kunitz-containing protease inhibitor might evolve into another protease inhibitor of different specificity, thus targeting another protease of the host blood-clotting pathway. Another possible adaptive role for gene duplication is the generation of different antigenicity epitopes within molecules of the same function,allowing the tick to better evade host immune responses. It is interesting to speculate whether each of these protein variants would have a differential temporal expression. Because our cDNA library was made from 25 adult female tick salivary glands removed from the tick 3-4 days after host attachment, and because ticks vary up to 2 days in their total feeding time (5-7 days from attachment to a rabbit), it is likely that our library represents an average of messages translated within a broad range of physiologic ages. A microarray experiment with messages obtained from ticks at different times post-attachment could be used to detect individual messages produced at unique times by individual ticks, thus testing the hypothesis of temporal switching of similar salivary proteins in I. scapularis.
With regard to the related messages found in the salivary gland cDNA library of I. scapularis, the higher conservation of signal peptides found in peptide groups 2-4, compared with the remaining protein sequences, is remarkable. This pattern was also found in secreted peptide families of vertebrates (Charpentier et al.,1998; Lundwall and Lazure,1995). Increased evolution of secreted rather than signal peptides indicates possible conservation of a `secretion signal cassette' or strong evolutionary pressure for variation of the secreted moiety, consistent with an antigenic variation scenario.
This diversity of related salivary proteins, whether they vary from tick to tick or temporally within individual ticks, will certainly pose an additional burden in the attempts to develop a vaccine against tick salivary antigens that may protect against tick-borne pathogens(Valenzuela et al., 2001). Defining invariant antigens, and/or using a cocktail vaccine approach will be important for a successful vaccine development strategy.
With regard to the finding of host proteins in tick saliva and SGH, we cannot rule out contamination by host blood trapped in the tick mouthparts by tick regurgitation during saliva collection, or by tick-gut contents during salivary gland dissection. Although our cDNA library did not contain a single rabbit sequence match, and the tick mouthparts were thoroughly washed before saliva collection, this does not eliminate the possibility of regurgitation. Host Ig secretion in tick saliva has been reported before in other ticks with Ig-binding proteins (IGBP) (Wang and Nuttall, 1995a,b, 1999), and is postulated to be the carrier for this host protein through the tick midgut and salivary gland epithelia. The biological reason for tick IGBP may be related to counteracting the possible noxious effects of host Ig against midgut or hemocoel targets;any other explanation for this seemingly wasteful secretion of host albumin and hemoglobin is not immediately apparent. It is interesting to speculate whether these host proteins are modified by the tick by glycosylation or by other additions. Incorporation of such antigenic epitopes into self molecules may be a strategy for tick suppression of host immunity against potentially antigenic carbohydrate determinants. Further, hemoglobin degradation leads to formation of hemorphins, opioid peptides active in the immune system and in pain reception (Nyberg et al.,1997). Hemoglobin-derived peptides may also have antimicrobial activities (Fogaca et al.,1999).
The functions of most tick sequences described in this paper are unknown. Some, such as group 2, are relatively short peptides with single Kunitz domains (Fig. 2, Table 3). When compared with snake dendrotoxins, which are also small peptides containing a single Kunitz domain (Harvey, 2001),similarities are apparent (Fig. 13) not only in the typical conservation of the Kunitz cysteine residues but also in conserved glycine-rich and basic amino acid-rich regions. These peptides may function as dendrotoxins that variously affect membrane functions. These and other peptides are of a size amenable to either direct synthesis or production by recombinant methods, and will eventually be tested for their biological activities in various bioassays. Other biological activities, such as the several antiproteases and metalloproteases, can be identified with different enzyme assays. Our ongoing studies should increase our understanding of how ticks successfully evade the hemostatic and immune responses of their hosts.
The authors thank Drs Robert Gwadz, Louis Miller and Thomas Kindt for encouragement and support, and Nancy Schulman for editorial assistance.