Birdsong and human speech share many features with respect to vocal learning and development. However, the vocal production mechanisms have long been considered to be distinct. The vocal organ of songbirds is more complex than the human larynx, leading to the hypothesis that vocal variation in birdsong originates mainly at the sound source, while in humans it is primarily due to vocal tract filtering. However, several recent studies have indicated the importance of vocal tract articulators such as the beak and oropharyngeal–esophageal cavity. In contrast to most other bird groups, parrots have a prominent tongue, raising the possibility that tongue movements may also be of significant importance in vocal production in parrots, but evidence is rare and observations often anecdotal. In the current study we used X-ray cinematographic imaging of naturally vocalizing monk parakeets (Myiopsitta monachus) to assess which articulators are possibly involved in vocal tract filtering in this species. We observed prominent tongue height changes, beak opening movements and tracheal length changes, which suggests that all of these components play an important role in modulating vocal tract resonance. Moreover, the observation of tracheal shortening as a vocal articulator in live birds has to our knowledge not been described before. We also found strong positive correlations between beak opening and amplitude as well as changes in tongue height and amplitude in several types of vocalization. Our results suggest considerable differences between parrot and songbird vocal production while at the same time the parrot's vocal articulation might more closely resemble human speech production in the sense that both make extensive use of the tongue as a vocal articulator.
In recent years birdsong has become the focus of many scientists interested in the cognitive, neural, genetic and physiological mechanisms underlying human speech and language. The fact that songbirds and humans exhibit many parallels in vocal learning and perception (e.g. Doupe and Kuhl, 1999; Ohms et al., 2010a; Ohms et al., 2011; Beckers, 2011) has established songbirds as an excellent model system in which to study the underlying mechanisms in both birds and humans (Bolhuis et al., 2010). Also, cognitive mechanisms related to syntax detection might be comparable in humans and songbirds, although the results are controversial (Gentner et al., 2006; van Heijningen et al., 2009).
However, while there are numerous analogies there are differences too, especially regarding vocal production. In humans the primary sound source is the larynx and voiced speech sounds are produced by a pair of vibrating vocal folds (Titze, 2000). The acoustic signal generated is subsequently dynamically filtered by shaping the vocal tract using different articulators such as the tongue and lips (Ladefoged, 2006). This leads to the amplification of different frequency regions, called formants, within the broad-band spectrum of human speech sounds.
In contrast, the vocal organ of most birds, the syrinx, is located at or near the base of the trachea in the interclavicular air sac (Suthers and Zollinger, 2004) and in the case of Oscine songbirds consists of two sets of vibrating labia located at the cranial end of each of the primary bronchi (Goller and Larsen, 1997), which are capable of independent motor control (Suthers, 1990). This enables songbirds to sing with two voices simultaneously or to switch between the two sets of labia while singing, depending on the frequencies produced (Suthers, 1990; Suthers et al., 1994; Suthers et al., 2004; Zollinger and Suthers, 2004). The greater complexity of the vocal organ of songbirds initially led to the hypothesis that acoustic variation predominantly arises at the sound source and that, in contrast to human speech, acoustic filtering by the vocal tract only plays a minor role in birdsong production (Greenewalt, 1968).
Most bird species studied produce relatively narrow-band, tonal songs that lack the complex formant patterns prominent in human speech. It has been shown, however, that the sound generated at the source can exhibit harmonic overtones (Beckers et al., 2003) and that cyclical movements of the hyoid skeleton or expansion of the cervical esophagus filter these out of the signal by tuning the oropharyngeal–esophageal cavity (OEC) to the fundamental frequency of the song (Riede et al., 2004; Riede et al., 2006; Riede and Suthers, 2009). Additionally, in zebra finches (Taeniopygia guttata), which produce a wide range of broad-band note types, expansion of the OEC has also been found to affect frequency patterns by shifting energy to relatively lower frequencies while amplitude generally increases (Ohms et al., 2010b). Other articulators involved in avian vocal tract filtering include beak movements and gape widths (Hoese et al., 2000; Podos et al., 2004; Nelson et al., 2005). Clearly, there is increasing evidence for the importance of vocal tract filtering in the production of avian vocalizations.
Interestingly, observations of naturally vocalizing and speech-imitating parrots, which have a simpler syrinx with only one pair of vibrating membranes (Larsen and Goller, 2002), suggest that tongue movements play an important role in vocal production too (Nottebohm, 1976; Patterson and Pepperberg, 1994). The parrot tongue is morphologically very different from that of songbirds, in that it contains many intrinsic muscles and its surface is more like that of the human tongue: a fleshy, rather flexible structure (Homberger, 1986) that can be moved in a horizontal and vertical plane within the oral cavity. So far, however, evidence on this subject is rare and observations are often anecdotal. Studies on a speech-imitating African grey parrot (Psittacus erithacus) have suggested that, similar to humans, this bird can adjust the front–back position of its tongue in order to imitate human articulatory patterns while, unlike humans, it lacks extensive control over the high–low dimension (Patterson and Pepperberg, 1994; Warren et al., 1996). Evidence from X-ray and infrared videos, however, could only be obtained from two vowel sounds: /a/ and /i/. In both cases the tongue is not visible and therefore likely rests low within the oral cavity. The authors did find a clear difference in tracheal configuration and beak opening, though, with the trachea being protracted and the beak being closed during the production of /a/ but not /i/ (Warren et al., 1996). Another experimental approach evaluating the significance of tongue movements in monk parakeet (Myiopsitta monachus Bonaparte 1854) vocalizations has demonstrated that moving the tongue horizontally in the mouth cavity can lead to frequency and amplitude changes in acoustic resonance patterns (Beckers et al., 2004). Specifically, the frequency of the third formant (F3) increases the more the tongue moves back whereas the frequency of the first formant (F1) slightly decreases. High–low manipulations of tongue position have a general attenuating effect on formant amplitude. However, no direct observations of tongue movements in naturally vocalizing parrots exist to date, nor is it known whether parrots, like songbirds, exhibit a cyclical movement of the hyoid skeleton causing an expansion of the OEC.
In the current study we addressed these questions using X-ray cinematographic imaging of the vocal tract during natural vocalizations of monk parakeets. We measured movements of structures that previously have been identified or proposed to act as vocal articulators in parrots, including beak opening, tongue height and tracheal and laryngeal movement. We discuss the outcome of these measurements in comparison with songbird vocal production and parrot speech imitation.
MATERIALS AND METHODS
The monk parakeets used in this study were obtained from a US Department of Agriculture pest control program in Florida and were housed in pairs or individually in metal cages (43 cm deep×44.5 cm wide×50 cm tall) in the same room under a 14 h light:10 h dark schedule prior to the experiment. During the experiment all birds were moved in their home cages to the room that contained the X-ray apparatus to stimulate the respective focal bird to vocalize. Food and water were provided ad libitum and wooden toys in the cages served as enrichment. X-ray recordings were obtained from four monk parakeets of which three fulfilled our criteria for good lateral views and were included in further analysis.
All animal procedures were reviewed and approved by the Institutional Animal Care and Use Committee and the Radiation Safety Office of Indiana University, and comply with the ‘Principles of animal care’, publication no. 86-23, revised 1985, of the National Institutes of Health.
X-ray cinematography and song recordings
A Series 9800 mobile C-arm and 1 k×1 k neurovascular work station (OEC Medical Systems, Inc., GE Healthcare, Piscataway, NJ, USA) was used to obtain X-ray videos of spontaneously vocalizing monk parakeets. This apparatus generated a digital signal of 30 pulses s–1 and a 1000×1000 image resolution. Each digital image was produced by a 10 ms X-ray pulse. The focal bird was transferred into a metal cage of the same dimensions as the home cage in which two opposite sides of the cage were replaced by Plexiglas panels and enabled recording of the bird in a lateral view with the bird's head being about 5 cm in front of the intensifier screen. The digital signal of the X-ray apparatus was recorded on a Sony GVD-1000 NTSC digital video cassette recorder, mini DV format. Sound was simultaneously recorded using a directional microphone (Audio Technica model AT835b, Stow, OH, USA) which was positioned about 0.5 m from the bird. Afterwards, relevant sequences of the X-ray movies were digitized and rendered at 30 frames s–1 (video) and concurrent vocalizations were digitized at 48 kHz sampling rate using the software Vegas Video, version 5.0 (Sonic Foundry, Madison, WI, USA). All data files were corrected for a recording delay of approximately 114 ms in the video relative to the audio.
In all four birds a 1.6 mm diameter stainless steel ball (Type 316, Small Parts Inc., Seattle, WA, USA) was inserted dorsally under the skin of the neck. This sphere provided a size reference when measuring anatomical distances from the X-ray videos. Additionally, two of the monk parakeets (birds 2 and 3) were anesthetized and the trachea was exposed through a small mid-ventral incision in the skin of the neck and two pieces of silver wire (ca. 2 mm long×0.16 mm diameter) (Engelhard Fine Wire, Engelhard Industries Inc., Newark, NJ, USA) were attached with tissue adhesive (3M Vetbond) to two tracheal rings. These markers were ∼13 mm apart in bird 2 and ∼10 mm in bird 3. In order to better follow tongue movements during X-ray recordings, we implanted a short piece (ca. 1.5 mm) of the same silver wire into the ventral surface of the tongue about 1.5 mm from the tip of the tongue of bird 1. The wire was inserted into the hole made by a 26 gauge hypodermic needle and the incision was sealed with a micro-drop of tissue adhesive. All of the described procedures were performed under isoflurane anesthesia administered with a calibrated anesthetic gas vaporizer (Fluotec) through a mask at a concentration of ∼1.5–2.0% in air.
Only those video sequences in which the birds' heads were clearly laterally oriented towards the X-ray beam were used for measuring anatomical distances during sound production. The distances measured (Fig. 1) were as follows. (1) Beak opening (BO), represented by the distance between the dorsal point of the beak–skull transition and the ventral point of the lower mandible where the gnathotheca starts. It should be noted that we did not measure beak gape as used in the songbird literature, i.e. the distance between the tip of the upper and lower mandible, because of the morphology of the parrot beak where the tip of the upper mandible projects beyond and curves below the tip of the lower mandible. (2) Tongue height (TH), defined as the distance between the tongue's ventral surface measured ∼1.5 mm from the tip of the tongue and the same point of the lower mandible as measured in beak opening. (3) Tracheal shortening (TS), determined by changes in the distance between the tracheal markers. (4) Laryngeal movement (LM), the distance measured between the center of the larynx and the dorsal beak–skull transition. These measurements were performed using MaxTRAQ Lite+, version 220.127.116.11 (Innovision Systems Inc., Columbiaville, MI, USA) by manually selecting points of interest in each successive frame. From the coordinates of each selected point, distances were automatically calculated between the points. Ten repeated measures of both beak movement and the distance measured between two metal bars in the same frame had a standard deviation of 0.1 mm.
Adult monk parakeets produce nine different call types in various contexts, e.g. territorial defense, pair bonding and flock integration, which differ in temporal as well as spectral parameters (Martella and Bucher, 1990). In the current study, however, only a subset of these vocalizations was uttered during recording sessions, consisting of contact and greeting calls as well as chatter sounds.
The most common call type produced by the monk parakeets in our study was the contact call (Fig. 2A), a short (180.7±9.2 ms, mean ± s.d., between animals), strongly frequency-modulated (FM) call with discrete, harmonically related frequency bands, which is uttered in many contexts by both sexes (Martella and Bucher, 1990). We recorded several instances of contact calls of three birds that met the criteria specified in Materials and methods for inclusion in the analysis.
The second most common call produced by the monk parakeets in this study was the greeting call (Fig. 3A), which is considerably longer and more variable in duration (455.7±234.4 ms, mean ± s.d. between individuals) and does not exhibit the fast FM typical of contact calls. It consists of a spectrally complex pattern with amplified frequency bands that are indicative of formants (Beckers et al., 2004) and that exhibit some FM, especially at the beginning of a call.
Furthermore, each of the parakeets produced several sounds that are referred to as chatter (Martella and Bucher, 1990). These sounds are mostly characterized by short harmonic stacks that at times exhibit some FM. In the case of bird 1 these short harmonic sounds alternate with notes that exhibit fast FM (Fig. 4A).
All monk parakeets in this study generally showed the same articulatory movements of beak and tongue when producing contact and greeting calls. Although these call types differed from each other in acoustic structure, no obvious differences in the movement patterns of the tongue and beak were detected that could explain the acoustic variation and FM between call types.
Beak opening increased substantially before the onset of a contact call and the tongue, which usually rests high in the oral cavity so that it touches the upper mandible, moved downwards and retracted a bit, thereby creating a large oral resonance cavity (Fig. 2B, Fig. 5). Just after call onset both beak opening and tongue height reached their maximum mean displacement, with beak displacement ranging from 5.6 to 6.7 mm and tongue displacement ranging from 2.9 to 4.3 mm (Table 1). This position was maintained for the duration of the call, after which both articulators returned to their original position.
The movement patterns for beak and tongue during greeting calls were rather similar to those described in contact calls. However, in longer greeting calls the initial beak opening movement proceeded more gradually than for contact calls, reaching its maximal displacement towards the end of the call, while tongue height decreased faster at the beginning of the greeting call and remained low throughout its duration (Fig. 3B). Additionally, beak opening did not increase as much as it did during contact calls, with a mean maximum displacement ranging from 3.0 to 4.8 mm, whereas tongue displacement seemed to be higher in one of the birds (Table 2, bird 1). Also, the standard variations were greater in greeting calls than in contact calls, which can be explained by the fact that greeting calls were produced over a wide range of intensities and we found a strong positive relationship between acoustic power and the magnitude of articulatory displacement in two of the birds (Fig. 6C,D; see below).
Fig. 4 shows the cyclical movements of the beak and tongue during the production of two alternating chatter sounds. It is apparent that the magnitude of change of both beak opening and tongue height was lower in the second and fourth note than in the first and third. Examining the corresponding video (supplementary material Movie 1) revealed a strikingly opposite pattern of cyclical tongue movement between these two note types. During the production of the first and third note the tip of the tongue and antero-dorsal part of the tongue body first moved caudally following the movement of the lower mandible while the postero-dorsal part of the tongue body remained higher on a vertical axis. However, during the second part of the sound, which consisted of upward FM sweeps, the postero-dorsal part of the tongue body pushed downwards, forming a horizontal plane with the rest of the structure before the anterior part of the tongue moved rostrally to its resting position high up in the mouth cavity touching the upper mandible. In the second and fourth note this pattern was reversed with the postero-dorsal part of the tongue body moving caudally just before the onset of the note. During the first part of the note the rest of the tongue then completed its caudal movement and again formed a horizontal plane with the postero-dorsal part of the tongue body, which was lifted a bit during the second part of the note before the tongue as a whole moved rostrally to its resting position.
Cyclical changes of the larynx
We also observed and quantified cyclical movements of the hyoid apparatus, which lowered the larynx during calls and presumably increased the volume of the oropharynx (Figs 1, 2, 3, 4, Tables 1, 2, 3). Judging from the X-ray videos it seems that, unlike songbirds, monk parakeets do not expand the cervical end of the esophagus to form a large OEC but that the larynx, similarly to the tongue, moves downwards. In accordance with this observation is the fact that when obtaining silicone casts of the oral cavity from dead monk parakeets, no silicone entered the esophagus while the cranial part of the trachea and the glottis were filled with silicone.
Changes in tracheal length
In birds 2 and 3 we implanted silver wire markers onto the trachea that could be traced during sound production. In bird 2 these markers were attached to the trachea 18 and 31 mm from the glottis. In bird 3 the markers were implanted 24 and 34 mm from the larynx. The total length of the trachea from glottis to syrinx was 55 mm in bird 2 and 65 mm in bird 3. In all three types of vocalization the distance between these markers changed substantially over the course of call production, with a mean maximum shortening ranging from 5.7 mm in bird 2 to 3.4 mm in bird 3 during contact calls, from 1.5 mm in bird 2 to 2.6 mm in bird 3 during greeting calls and from 0.9 mm in bird 2 to 1.4 mm in bird 3 during chatter sounds (Tables 1–3). Post-mortem investigation of the trachea revealed that it had very little resistance to substantial changes in length in both birds. Calculation of the predicted resonance of the tracheas modeled as stopped tubes yielded resonance at 1570 and 1330 Hz, respectively, for birds 2 and 3. Both of these values fall within the range of spectral peaks measured over the course of greeting calls.
Relationship between articulators and acoustic power
The fast FM patterns characteristic of contact calls are likely to be caused by the sound source and only marginally influenced by articulatory movements of the upper vocal tract as (1) in contact and greeting calls tongue and beak movements as well as tracheal contraction are comparable and (2) changes in articulatory configurations are slow compared with FM. Changes in resonance patterns of greeting calls, however, are likely to be influenced by articulator movements. Unfortunately, it was not possible to establish clear relationships between articulator configuration and formant changes because it is not clear how the sound source behaves in this species, which therefore precludes extracting the filter characteristics. However, we detected positive correlations between articulator movements (beak displacement, tongue displacement and tracheal contraction) and acoustic power for greeting calls and chatter sounds in several birds (Fig. 6C–F; Table 4). We did not find a correlation between beak displacement and power and tongue displacement and power for contact calls (Fig. 6A,B; Table 4), although this might be due to the fact that contact calls are generally rather loud calls and there is little variation in acoustic power.
Our study is the first to investigate vocal tract articulation in a naturally vocalizing parrot species using X-ray cinematographic imaging. Our results demonstrate that monk parakeet vocalizations are accompanied by prominent changes in beak opening, tongue position and tracheal length. We also observed cyclical downward movement of the larynx during vocalization. These findings are partly consistent with what has been previously reported for an African grey parrot imitating speech (Warren et al., 1996). While previous studies have indicated that retraction and extension of the tongue between back and front positions, respectively, seem to be particularly important in mimicking human speech (Warren et al., 1996) and modulating formant patterns (Beckers et al., 2004), our results show that monk parakeets especially manipulate the high–low dimension when vocalizing, although they may be able to move their tongue in a horizontal plane more than they actually do when communicating naturally. Given that monk parakeets can mimic human speech, which seems to require extensive control over the front–back position of the tongue, one wonders why they do not use this dimension as much in their own vocalizations. However, we could only record three of the nine different calls uttered by adult monk parakeets and it cannot be ruled out that in some of the other call types these birds use the front–back dimension more heavily. However, most of the different call types are structurally rather similar to those we recorded and only differ in duration and repetition rate (Martella and Bucher, 1990), which makes it unlikely that the calls are produced by different articulatory patterns.
Beak gape has been found to correlate with frequency changes in many bird species (Hausberger, 1991; Westneat et al., 1993; Hoese et al., 2000; Podos et al., 2004; Goller et al., 2004). In the current study we detected beak displacement in vocalizing monk parakeets of up to 6.7 mm although we could not establish a quantitative relationship with frequency patterns for several reasons. Contact calls exhibit fast FM patterns while articulator movements are slow and therefore cannot cause FM in these calls. In greeting calls and chatter sounds, in contrast, formants are often poorly defined and it was not possible to extract the filter characteristics. However, it seems that beak opening and tongue position can change independently of each other at least to a certain degree, as we observed prominent tongue movements in softer greeting calls while beak opening changed only slightly. Therefore, we can conclude that tongue position is not merely incidental to beak opening, a question that arose in a previous study (Warren et al., 1996).
Furthermore, the strong tracheal shortening in the range of 9% to 44% that we observed provides convincing evidence for a new type of vocal articulator in birds. The contraction is accompanied by a caudal movement of the lower mandible and the hyoid skeleton and although it might be a passive process resulting from the movement of the larynx and tongue it is very likely to have an effect on the sound produced. A previous study (Daley and Goller, 2004) investigating tracheal length changes in singing zebra finches found that at the beginning of a song bout and between motifs tracheal length decreased. While the initial contraction was actively mediated by syringeal muscles, the shortening within the motif seemed to be the result of pressure changes in the interclavicular air sac and could not be related to frequency patterns of the song. However, length changes were small (<0.2 mm) within a song and represented only about 3% of the length of the trachea, and therefore are unlikely to have a strong effect on resonance patterns. Even within the family Psittacidae, the degree to which the trachea can contract seems to vary noticeably between species and even between individuals. An initial study on African grey parrots found variation in tracheal length ranging from 77% to 130% (Patterson et al., 1997) whereas a subsequent study found that the trachea of African grey parrots can stretch only about 10% (Pepperberg et al., 1998). However, both studies were on excised tracheas and it was unclear how the trachea behaves in a live bird. In our monk parakeets the trachea showed very little resistance to tracheal shortening, both during vocalizing and in post-mortem investigation, providing strong evidence in favor of the hypothesis that the trachea might act as vocal articulator in parrots (Patterson et al., 1997; Pepperberg et al., 1998). Additional research is required to reveal exactly how acoustic features of vocalizations are influenced by tracheal length changes.
We also found a significant positive correlation between beak displacement and amplitude in greeting calls in two of the three birds. The same significant correlation was found for tongue displacement and amplitude in the greeting calls of the same birds. The analysis revealed more positive correlations between beak displacement and sound amplitude for chatter sounds in some individuals but not for contact calls, likely because contact calls were generally rather loud and showed little variation in amplitude (Fig. 6A,B). These findings largely agree with earlier reports on zebra finches producing loud notes with large beak gapes (Ohms et al., 2010b).
The vocal tract filter of songbirds depends on two mechanisms to maintain an inverse relationship between the volume of the upper vocal tract and the fundamental or dominant frequency of their song. During high- and mid-range frequencies, vocal tract dimensions are controlled by ventral and caudal movement of the hyoid and larynx, which enlarges the oropharyngeal cavity. At low frequencies the volume of the vocal tract is further increased by opening the cervical end of the esophagus to form a large OEC (Riede et al., 2006; Riede and Suthers, 2009; Ohms et al., 2010b). We have shown that monk parakeets also lower their hyoid and larynx during call production. This movement presumably enlarges the oropharyngeal cavity, but its acoustic importance, if any, in monk parakeets remains to be determined.
Opposite to what has been reported for songbirds (Riede et al., 2006; Riede and Suthers, 2009; Ohms et al., 2010b), monk parakeets do not seem to expand the cervical end of the esophagus to form a large OEC and it presently remains unclear whether the esophagus contributes to vocal tract filtering at all.
Overall, we have shown that monk parakeets use several articulators when producing species-specific sounds with tongue height changes, beak opening and tracheal length changes being the most obvious movements. However, tongue movements in the horizontal direction, although less prominent, are also likely to affect sound production while other possible articulators such as glottal opening still have to be identified. Experimentally manipulating such structures and obtaining cineradiographic data on mimicking parrots would provide further insight into the mechanisms underlying vocal production and would be of great interest for comparing the role of the tongue in human speech production and in parrot speech imitation.
This work was supported by the Netherlands Organization for Scientific Research (NWO) [grant number 815.02.011 to C.t.C.] and the National Institutes of Health (NIH) [grant number NINDS R01 NS029467 to R.A.S.]. Deposited in PMC for release after 12 months.
We thank Amy Coy for assistance conducting the experiment, Kenneth Kragh Jensen for general discussion, Inge van Noortwijk for artwork in Fig. 1 and two anonymous referees for valuable comments.