Vocal emission requires coordination with the respiratory system. Monitoring the increase in laryngeal pressure, which is needed for vocal production, allows detection of transitions from quiet respiration to vocalization-supporting respiration. Characterization of these transitions could be used to identify preparation for vocal emission and to examine the probability of it manifesting into an actual vocal production event. Specifically, overlaying the subject's respiration with conspecific calls can highlight events of call initiation and suppression, as a means of signalling coordination and avoiding jamming. Here, we present a thermal imaging-based methodology for synchronized respiration and vocalization monitoring of free-ranging meerkats. The sensitivity of this methodology is sufficient for detecting transient changes in the subject's respiration associated with the exertion of vocal production. The differences in respiration are apparent not only during the vocal output, but also prior to it, marking the potential time frame of the respiratory preparation for calling. A correlation between conspecific calls with elongation of the focal subject's respiration cycles could be related to fluctuations in attention levels or in the motivation to reply. This framework can be used for examining the capability for enhanced respiration control in animals during modulated and complex vocal sequences, detecting ‘failed’ vocalization attempts and investigating the role of respiration cues in the regulation of vocal interactions.

Vocal communication is a multi-stage process. The basic vocal production-and-reply exchange can be described as: (1) the reception and neural translation of an audible signal; (2) the informational integration (Belin et al., 2000); (3) the activation of a behavioural response (Chen et al., 2009); (4) the motor preparation of the vocal apparatus (Winkworth et al., 1995); and finally, (5) the sound emission. However, most current research on vocal exchanges focuses mainly on the first and the last stages – the acoustic stimulus and its effect on the receiver's behaviour (e.g. the presence or absence of a vocal response; Herbinger et al., 2009; Schulz et al., 2008; Seyfarth and Cheney, 2003; Sugiura, 1998). Such an approach creates a binary link between stimulus and response, largely ignoring potentially intervening factors that might modify the communication event. Monitoring neural (Hage and Nieder, 2013) and/or physiological processes in the gap between a stimulus and a response would enable examination of the communication events in more detail. For example, perceived stimuli being ignored as not relevant or not triggering a vocal response as a tactical decision (e.g. confrontation avoidance) will register on a neural/physiological level and could be distinguished from an undetected stimulus, despite both of these conditions not manifesting into a behavioural reaction. Additionally, quantifiable physiological changes, triggered by a vocal input and preceding observable behaviour, could reflect modulation of response motivation and allow detection of signal intensity and type-dependent response thresholds. Furthermore, monitoring the timing and the dynamics of physiological changes, at the gap between receiving and responding to a call, might allow estimation of the extent of preparation for vocal emission (Rochet-Capellan and Fuchs, 2014), and perhaps the levels of volitional control over initiation and duration of vocalizations.

Because communication is a socially driven process (Blumstein and Armitage, 1997), observing it under natural group settings is needed to generate insights into its ecologically relevant dynamics. While recording the neural activity throughout stimuli perception (Aubie et al., 2014) and vocal production (West and Larson, 1993; Wild, 1997) is possible, it is not easily implemented outside laboratory conditions. Monitoring of physiological processes, on the other hand, is less invasive and more suitable for field-based studies (Elliott, 2016). One candidate physiological process that is directly related to vocal emission is respiration (Riede, 2011; Smotherman et al., 2006). The mechanism of vocal production relies on a flow of gases within the respiratory tract, generating sufficient subglottal pressure for laryngeal tissue vibration and emission of sound (Herbst, 2016). The orientation of the vocal folds in most terrestrial mammals allows efficient sound production on the expiration phase. Phonation by an inspiratory air stream is also possible, but rarely occurs (Fitch, 2006). The duration of continuous vocalization and the rate of vocal unit production is limited by the caller's lung capacity and by the thoracic muscle activity, the latter aiding in maintaining subglottal pressure beyond the point of elastic lung recoil (MacLarnon and Hewitt, 1999). The pressure requirements of vocal production, call structure and duration, introduce alterations to the ‘quiet’ respiration pattern (Häusler, 2000; Hernandez et al., 2017).

Studies of respiration patterns in the context of vocal communication have traditionally focused on human conversations. Stereotypic breathing changes have been detected before the initiation of speaking turn, allowing better timing of speaker exchange and overlap avoidance. Speaking duration is positively correlated with the inspiration amplitude, potentially indicating early planning of the speaking turn. Additionally, breathing patterns are used by the counterparts in a conversation as cues indicating intention to speak (MacLarnon and Hewitt, 1999; McFarland, 2001; Rochet-Capellan and Fuchs, 2014). Animals have been claimed to have less sophisticated control over their respiration in the context of vocal emission (Maclarnon and Hewitt, 2004; MacLarnon and Hewitt, 1999). However, several studies have demonstrated a disruption of the respiration cycle after (Laplagne, 2018) and before vocal onset (Häusler, 2000; Hernandez et al., 2017; Riede et al., 2020) in animals, suggesting that claims of complex vocalization-correlated respiratory movements (VCRMs) being uniquely human (Häusler, 2000) are not well justified.

Here, we establish a proof-of-concept methodology for remote respiration monitoring in a free-ranging animal, as a tool for tracing VCRMs. Additionally, we demonstrate the ability to characterize changes in respiration preceding vocal production and to detect the extent of the preparation phase that the individual needs to produce a call. Additionally, correlating the vocalization-preparatory phase with external social events could reveal stimuli that fail to reach a threshold for an audible response, but might still have a stimulatory effect on the receivers’ motivation to reply. On the flip side, partial vocalization events, in which an individual was interrupted before the active vocalization phase, could be detected. Both of those conditions have the potential to be reflected in respiratory changes typical to vocal preparation, but not resulting in call emission.

To identify VCRMs, we observed meerkats [Suricata suricatta (Schreber 1776)], a mammalian species with very well understood social and communication systems (Clutton-Brock and Manser, 2016; Manser et al., 2014). Meerkats are social mongooses, living in the Kalahari Desert region of Southern Africa. They have an extensive vocal repertoire and most of their behaviours are accompanied by vocalizations (Gall and Manser, 2017; Manser, 1999; Townsend et al., 2010). This species has been intensively studied for more than 20 years and the wild meerkat population, monitored by researchers at the Kalahari Research Centre (KRC), South Africa, is well habituated to human presence (Clutton-Brock and Manser, 2016). Close access to the animals allows individual audio recording and a clear distinction between calls produced by the recorded focal individual and the calls produced by neighbouring conspecifics (Demartsev et al., 2018).

While individual audio recording is extensively applied in field studies, respiration monitoring is less trivial. Solutions for externally tracing respiration, outside of laboratory settings, mostly rely on wearable sensors, detecting rib cage expansion and contraction movements (Chu et al., 2019). A non-invasive alternative for recording respiration is based on thermographic imaging of external airway openings (Pereira et al., 2016). The changes in surface temperature of the airways are due to the temperature differences between the inspired and expired air; under most circumstances, cooler, drier air from the environment is inhaled and warmer, moist air from the lungs is exhaled (Pereira et al., 2019). These temperature modulations are shown to reliably represent respiration and correlate not only with the respiration waveform, but also with tidal volume (Vainer, 2018).

In this work, we present the methodological procedures for recording respiration using thermographic imaging of non-restricted subjects in natural settings. We demonstrate the method's sensitivity for detecting VCRMs and discuss the potential for identifying call intention by breathing patterns as well as the effect of conspecific calls on the respiration rate of focal individuals.

Field procedures

Data were collected in June 2019 at the Kalahari Research Centre, in the Kuruman River Reserve, Northern Cape, South Africa. Daily, at dawn, one meerkat group was observed during its morning emergence from the sleeping burrow. Following emergence, meerkats often spend time in a bipedal posture, warming up in the sun before beginning to forage. This behaviour is accompanied by the emission of ‘sunning calls’, a short and soft call type that potentially functions as a social bonding signal (Demartsev et al., 2018). During this time window, a focal animal, located at >0.5 m to the nearest neighbour, was simultaneously audio recorded and filmed. The focal subjects were not restricted in their movement and were able to freely move, change location and posture. Each continuous individual recording lasted up to ∼5 min. The daily recording session was terminated after the majority of the group moved away from the sleeping burrow and started foraging.

The focal animal's vocalizations were recorded using a Marantz PMD-661 solid state digital recorder (Marantz, Japan) and a directional Sennheiser ME66 microphone with K6 power module (Sennheiser electronic, Germany), sampling rate 44.1 kHz, 16-bit. A FLIR T1030 (FLIR Systems, USA) thermal camera with 12 deg lens was fixed on a tripod, in front of the focal animal at 1–1.5 m distance. The camera was angled in a way to not obstruct the animal's field of vision while having a horizontal view of its nasal region. The audio recording and thermal video data were collected simultaneously for each individual. To achieve synchronization between the audio and video streams a piezo-electric lighter was activated 3 times in front of the camera and in close proximity to the microphone. The distinctive sound coupled with the appearance of heat signature provided the markers required for accurately aligning the audio and video tracks.

All procedures were approved by the ethical committees of the University of Pretoria, South Africa (permit: EC011-10) and the Northern Cape Department of Environment and Nature Conservation (permit: FAUNA 1020/2016).

Audio and video processing and synchronization

Collected audio files were analysed in Avisoft-SASLab Pro v. 5.2.13 (Avisoft Bioacoustics, Germany) and all recorded calls were manually marked. Focal (recorded individual) and non-focal (NF; conspecific neighbours) calls were identified by their relative amplitude levels and labelled accordingly (Fig. S1). Recordings with no clearly distinguishable focal versus non focal differences in amplitude were omitted from further analysis.

FLIR video (CSQ) file formats were converted to AVI (FFMPEG) format while retaining the original frame rate, pixel dimensions and colour palette. The conversion was performed using ThermImageJ plugins for the Fiji (ImageJ) environment (https://github.com/gtatters/ThermImageJ) and carried out as outlined in Tattersall et al. (2020). The conversion process first split the video file into its subsequent FLIR file format (FFF) image frames using a custom perl script (available in ThermImageJ; https://github.com/gtatters/ThermImageJ). The underlying raw thermal data from each FFF file was extracted using Exiftool's (https://exiftool.org/) raw thermal image extraction option, which created a series of lossless greyscale JPEG-LS files, where pixel intensity corresponded to the raw sensor data from the thermal imaging camera. These files were subsequently converted into a portable video file (AVI) using FFMPEG software and used for synchronization with the audio recordings. Matching audio (WAV) and video (AVI) files were imported into SHOTCUT (Meltytech, LLC, USA) video editing software. The synchronization markers in both audio and video tracks were manually aligned and a conversion table between video and audio time lines was generated. No time drift between the AV tracks in different parts of the recording was detected and single triplet of synchronization markers was sufficient for the duration (≤5 min) of the recorded files Movie 1).

Nasal area detection and tracking

For locating the animal's nasal area in the videos, the relevant ROI (region of interest) on each video frame were defined and tracked. To automate ROI tracking, Loopy (loopbio, Austria) pose estimation tool was used. Briefly, a key point detector was trained by manually labelling 1400 exemplars of meerkat nostril centre points, randomly pulled from our video data set. This was used as the training set for a deep-learning-based detector, which was used to process all video files and list the pixel coordinates for each detection above 0.95 certainty score. Five randomly selected video files were visually inspected for correct ROI detection (Movie 1).

Nasal temperature estimation

Raw nostril detection output consisted of frame number, x–y pixel coordinates and indication of laterality (left versus right nostril). To extract intensity values corresponding to the raw sensor radiance value, a custom-written macro batch function was used for exporting pixel coordinates into Fiji, defining a 10×5 pixel ROI (Movie 1) and measuring its median intensity. The ROI size for intensity extraction was chosen to include the size of meerkat nostril pixel dimensions across filming distances (16×9–11×6 px). The intensity values were converted to estimated temperature by using standard thermal image conversion algorithms outlined in ThermImageJ (https://github.com/gtatters/ThermImageJ) and described in Minkina and Dudzik (2009). The raw2temp function in ThermImageJ follows FLIR's algorithms, requiring information on emissivity, relative humidity, atmospheric temperature, reflected temperature, object distance and the camera's unique calibration constants. An emissivity of 0.95 was assumed based on other biological studies (Tattersall, 2016), while the other parameters were recorded at the beginning and, as permitted, throughout the daily data collecting session (1–2 h). As our data analysis was focused on changes in temperature, the average atmospheric values recorded in a given day were used for temperature conversion of all videos captured on that day. This approach helped us to streamline workflow and was made possible by the relatively stable atmospheric conditions, throughout which the possible error in absolute temperature measurements would be minimal (Tattersall, 2016).

The calibration constants were extracted from the original thermal image file using Exiftool. Sample images were tested using FLIR's ResearchIR (FLIR Systems) software and compared with the functions above to ensure the extracted temperature calculations were according to manufacturer recommendations.

Data quality control and filtering

The video data were subjected to several quality control and filtering steps. Frame sequences shorter than 150 frames (5 s) and with ROI detection gaps of more than 5 consecutive frames were omitted. This ensured that analysed segments span at least three full respiration cycles allowing for reliable identification of periodicity associated with breathing. The frame cut-off was decided based on the mean±s.e.m. meerkat respiration rate of 0.603±0.112 Hz (Worthington et al., 1991) and the 30 frames s−1 rate of the recorded videos (3 respiration cycles×1.6 s cycle−1×30 frames s−1=144 frames). Only frame sequences with simultaneous detections of both left and right nostrils were used to control for focal subject head orientation as front-view, minimizing potential measurements errors due to varying angles of ROI exposure. Only sequences with clearly identifiable cyclic patterns were selected for further analysis (Fig. S2A,B) by visual inspection of time–temperature line plots.

Data smoothing and respiration phase detection

A low-pass Butterworth filter (second order, critical frequency=1/5) was applied for smoothing the high frequency noise in temperature values, possibly originating from muscle contractions affecting ROI shape and tracking errors (Fig. S2B). A custom script was used to locate local minimum points, indicating maximum cooling of the nostril, equivalent to the end of inspiration phase. The local minimum points were used as a starting point for locating the beginning of the inspiration and the end of expiration phases. This was achieved by calculating the local slope of the curve to the point of it becoming infinitesimal, indicating the plateauing of the curve and the boundaries of expiration pause (Fig. S1A, Fig. S2C) – a rest phase in which the nostril temperature is expected to remain stable. Identification of the three transition points (end of expiration pause, maximum inspiration, maximum expiration) defined a single respiration cycle and allowed us to quantify each respiration phase (inspiration, expiration, expiration pause) in terms of duration (s), amplitude (Δ°C) and slope (amplitude/duration).

Marking vocalization times and defining breath types

Video frames corresponding to focal and non-focal (NF) call times were marked according to the audio-video synchronization table and breathing cycles were defined as: (1) Call: cycles during which focal calls were produced; (2) Pre-call: cycles immediately preceding a call cycle; (3) Post-call: cycles immediately following a call cycle; (4) Quiet: cycles which were not associated with vocal production (Fig. 1B, Fig. 2). In order to ensure correct identification of respiration cycles located on the edges of detected breathing sequences, audio recordings were examined for focal calls occurring 2 s before and after usable video sequences. If focal calls were recorded within those time windows, the designation of the following or preceding cycle was set as Post-call or Pre-call, respectively.

Additionally, video frames corresponding to breath phases (inspiration, expiration, etc.) were defined according to the timing of NF calls; NF heard: phases during which or immediately preceding when a NF call was produced; No calls: breathing phases not associated with NF calls.

Data structure summary and statistical analysis

The collected raw data set included ∼3 h of AV material of 33 adult individuals from 3 different social groups. Approximately 25% of the raw material was omitted from further analysis at the quality control phase due to acoustic interferences and inability to reliably identify focal calls. Individuals that did not produce any calls were discarded from the dataset to allow within-individual data permutation procedure. Each traced respiration cycle was divided into three breathing phases (Inspiration, Expiration, Expiratory pause) which were tested separately. Following the processing and filtering stages, a final dataset consisted of 800 respiration phases (inspiration=256, expiration=283, expiratory pause=261) defined by the detection of vocalization: (Pre-call=78, Call=86, Post-call=88, Quiet=548). The per-individual call count mean±s.d. in the analysed data set was 2.9±2.4 calls from 10 individuals (4 males and 6 females). The test statistic used was the change in the mean (Δmean) of the measured parameters (e.g. mean duration of inspiration phases in Call cycles–mean duration of inspiration phases in Quiet cycles).

A permutation procedure was designed in which respiration cycle type identifiers (e.g. Call and Quiet) were randomly reshuffled against the measured values creating a new null data set. To control for potential biases related to variation in respiration and call rates between individual animals we restricted the permutations to within individuals only. Permutations were repeated 10,000 times, calculating the Δperm_mean for each iteration. Two-tailed pseudo P-values were estimated by calculating the percentage of absolute Δperm_means, which were lower than absolute Δmean (calculated from the data).

For estimating the effect of neighbour calls on individual respiration, only ‘Quiet’ respiration cycles were subsetted from the data, as an individual’s own call emission could potentially mask the likely subtle effects of auditory perception. Based on the timing of NF calls, breathing phases during which or immediately preceding which NF calls were detected were designated as ‘NF heard’. The analysed dataset consisted of 714 respiration phases (NF heard=142; No calls=572). The dataset was analysed using a similar permutation procedure, reshuffling ‘NF heard’ and ‘No calls’ respiration phase designations.

All quality control, filtering procedures, permutations and statistical testing were done in R 3.6.3 (https://www.r-project.org/). A graphical representation of the data collection, processing and analysis process is detailed in Fig. 3.

The nasal region of recorded subjects, tracked in the collected thermal video material, showed an obvious variation in median temperature corresponding to respiration trace (Movie 1). Examination of the respiration cycle ratio (inspiration time/expiration time) showed a nearly symmetrical value of 0.987, indicating a roughly balanced airflow during active respiration (Fahlman and Madigan, 2016) and confirming consistency of our respiration tracing. As expected, vocalization demonstrated detectable respiratory exertion (Fig. 1C, Fig. 2). The amplitude (Δtemperature) of the expiration phase during Call cycles (median=1.14, IQR=0.73) was significantly higher than in Quiet cycles (P<0.01, median=0.78, IQR=0.40, Fig. 4A, middle panel, Fig. S3A). The slope (temperature–time) of expiration was also significantly steeper (median=2.87, IQR=1.52) in comparison to the control condition of Quiet cycles (P=0.02, median=2.33, IQR=1.56, Fig. 4C, middle panel, Fig. S3C). These results correspond to higher aerobic expenditure during call emission, which confirms the accuracy and sensitivity of the proposed methodology. This justified testing the ability for detecting respiration patterns associated with the preparation to call and the effects of conspecific vocal signals on focal respiration.

Median inspiration amplitude in Call cycles (median=1.21, IQR=0.83) was larger than in Quiet cycles (P=0.02, median=0.87, IQR=0.48), indicating that calling requires deeper inspiration (Fig. 4A, middle panel, Fig. S3A). Comparison between Pre-call and Quiet respiration cycles (Fig. 1B) showed a difference between expiratory pause amplitudes of Pre-call (median=0.22, IQR=0.33) and Quiet cycles (P=0.04, median=0.16, IQR=0.26, Fig. 4A, left panel, Fig. S4A). Taken together, this supports the prediction that there are detectable alterations of respiration patterns before call emission, with the most noticeable changes occurring during the inspiration phase of the Call respiration cycle; however, Pre-call cycles are also potentially affected. The recovery from vocalization is evident from the duration of the expiratory pause of Call cycles (median=0.65, IQR=0.53), which demonstrates a shorter rest period following call emission in comparison to Quiet respiration cycles (P<0.01, median=0.9, IQR=0.65, Fig. 4B, middle panel, Fig. S2B). Examination of Post-call cycles shows traces of recovery with trends for higher amplitude during inspiration and expiratory pause in comparison with Quiet respiration cycles (Fig. 4, Fig. S5, Table 1).

To test whether the magnitude of vocalization-associated respiration changes are correlated with the duration of produced call; parameters (amplitude, duration, slope) of Call cycles were subtracted from the corresponding median values of Quiet cycles. A Spearman correlation test, between call duration and the difference from median of the three respiration phases was performed. No significant correlation with call duration could be detected (Table 2).

It has been previously hypothesized that conspecific calls have the potential to stimulate, but also suppress focal vocalizations (Demartsev et al., 2018). Thus, we tested whether focal respiration patterns were affected by incoming non focal calls as a potential indication of changes in motivation towards vocal production. In a comparison between ‘NF heard’ respiration phases (during or immediately before which NF calls were recorded) and ‘No calls’ phases (no NF calls were recorded), the durations of expiration and expiratory pause, in the former, were significantly longer (P≈0.01) and inspiration phase demonstrated a similar trend (P=0.08, Fig. 5B, Fig. S6B). This indicates a transient decrease in breathing rate when conspecific calls are heard by the focal. Additionally, the inspiration and expiration slopes of the NF heard respiration phases appeared steeper; however, they did not reach statistical significance (P≈0.06, Fig. 5C, Fig. S6C).

This work describes a methodological framework for thermography-based remote monitoring of breathing in a free-ranging mammal and demonstrates the sensitivity of this method for detecting vocalization-correlated respiratory movements (VCRMs). Additionally, it demonstrates the capability to measure the effects of conspecific calls on the respiration of a focal subject as a potential indication of attention and signal perception.

As expected, calling had an effect on respiration (Fig. 1C), and despite the relatively low amplitude and short duration of the calls in focus (50 ms, Fig. S1), they could not be accommodated by regular expiration. Both the amplitude (Fig. 4A) and the slope (Fig. 4C) measurements during calling exceeded expirations while quiet, probably because of the increase in subglottal pressure needed for call production. The duration of an expiratory pause (in which the volume of lungs remains unchanged, Fig. 1A) is dependent on accumulation of the respiratory chemical drive (low oxygen and high carbon dioxide levels) leading to the activation of respiratory muscles and active inspiration (Rafferty et al., 1995). Additionally, mechano-receptors located in the lungs and the airways, react to lung inflation by adjusting the rate and volume of respiration. Vocalizing increases the gas flow during the expiration phase, resulting in a decreased lung volume in comparison to quiet respiration. This is expected to increase the respiratory drive, and to shortening the expiratory pause. Our results fit this assumption, as the duration of call cycle expiratory pause is significantly shorter than during quiet respiration cycles (Fig. 4B). Taken together, the increased flow during vocalizing and the shortening of subsequent expiratory pause support the notion that the temperature measurements, extracted from the subject's nasal area are representative of its respiration curve. The resolution and consistency of this representation were adequate for detecting the magnitude of changes related to vocal emission. The proposed methodology may be limited in its ability to detect mini-breaths, associated with separate notes in trill vocalizations, for example. However, the effect of the whole calling bout on the full tidal volume trace is likely to be detectable. This procedure can be extended to other mammalian species given the possibility of individual audio recording and steady horizontal view of the nasal region for the duration of at least three full respiration cycles. Special attention should be given to environmental conditions during thermal data collection, as with ambient temperature being nearly equal or above the subjects’ body temperature, the spectral resolution between inspired and expired air will be limited.

The changes in respiration preceding vocal production are of interest as they indicate preparation for vocalizing, either as part of an automated process (Riede et al., 2020) or perhaps having an intentional component. Mammalian vocal production is usually considered to be a flexible behaviour both in terms of call production and usage (Maciej et al., 2013; Sugiura and Masataka, 1995). Recent findings suggest that there is no point of no-return for calling and demonstrate that some species can suppress calls shortly before and even during call emission (Demartsev et al., 2018; Pomberger et al., 2018). In social settings, there could be instances in which an individual, stimulated to vocalize, was interrupted immediately before sound emission. The respiration patterns preceding calling in the inspiration phase of call cycles (Fig. 4A) and trends visible in expiration and expiratory pause of the pre-call cycles (Fig. 2A,C), suggest that the window of preparation to call is detectable. The physiological duration of this window is likely to be equal to the duration of inspiration, corresponding to the interval between the neural activation of vocalization and accumulating the pressure capacity for vocal emission. This theoretical minimum, however, is likely to be affected by the length of the call and by the intensity of the vocal exchange. Emission of long call or call sequences, performed on a single respiration cycle requires deeper inspiration (Riede et al., 2020). The current data could not show correlation between the duration of the produced call and the magnitude of the detected respiration changes. The described methodology is, however, fully capable of detecting such effects, perhaps if comparing different call types of variable duration or looking at continuous vocal bouts. Similarly, louder vocalizations requiring higher subglottal pressure could also result in stronger VCRM deviation from quiet respiration. To what extent respiratory preparations for longer or louder calls are distinguishable and whether call types or calling properties can be independently predicted from preceding respiration patterns is yet to be determined.

Precise call timing, needed to achieve caller coordination for synchrony or anti-synchrony, might result in multiple calling initiation attempts and suppression events, which could be reflected in irregularities of respiration patterns. By tracing focal respiration during exposure to conspecific calls we could detect social effects on quiet breathing cycles, specifically with longer expirations and expiratory pauses (Fig. 5). Although, experimental work is needed to test the causal link between conspecific calls and focal respiration, we can think of three potential explanations for the elongation of respiration phases. First, ‘sunning’ call exchanges in meerkats demonstrate strong turn-taking patterns (Demartsev et al., 2018) and individuals time their calls between the calls of multiple conspecifics, perhaps to avoid overlap. This could result in failed initiations and transient irregularities in respiration, as each suppressed calling initiation event could be represented by a partial respiratory preparation for calling. Second, the elongation of breathing phase duration could be a general indication of attention. It was suggested that attention and respiration are coupled (Maric et al., 2020; Melnychuk et al., 2018) and that respiratory inhibition is related to increased auditory sensitivity (Stekelenburg and Van Boxtel, 2001). So, tracing respiration during social vocal interactions could aid in identifying momentary attention changes that do not manifest into behavioural responses, perhaps in combination with heart rate logging and muscular micro movement detection. Third, slowed down breathing could be related to the proposed function of meerkat sunning call exchanges as an acoustic grooming behaviour. Similarly to direct physical grooming, acoustic grooming is expected to have an appeasing effect on the participants (Kulahci et al., 2015), which can result in physiological responses of decreased heart rate and slower breathing. Further work is required in order to distinguish between ‘vocalization attempts’, ‘auditory perception’ and ‘general appeasing’ explanations for the demonstrated slowing down of breathing.

Examining an animal's VCRMs in producing long, loud, high pulse rate or highly modulated calls offers interesting perspectives for understanding the production similarities with the sequential assembly of vocal utterances in human speech (Hernandez et al., 2017). The ability for complex respiration control has been suggested to be a uniquely human trait, likely important for syntax and combinatorial structure of human speech (MacLarnon and Hewitt, 1999). These claims, however, were based only on mostly ancillary data, as direct respiration monitoring during vocal production of animal subjects is scarce. Existing evidence indicate that a relationship between breathing movements and vocal production is often more complex than the call-per-breath paradigm (Hage et al., 2013; Häusler, 2000) and there is little support for the claims that animals have limited ability to control pitch, modulate amplitude and produce rapidly changing sound sequences on a ‘single breath’ (Fitch, 2018; Maclarnon and Hewitt, 2004). There are numerous mammalian species capable of long, variable and high-rate vocal performance (Haimoff, 1984; Passilongo et al., 2010), which could benefit from efficient respiration control beyond the call-per-breath paradigm. Additionally, there is a growing body of evidence demonstrating precise motor control of animals over both initiation (Hage and Nieder, 2013) and termination (Pomberger et al., 2018) of their vocal signals. Further focused studies on animal respiration during vocal production would be able to refute or confirm the claims about complex respiration control being an adaptation to the complexity of human vocal production.

This work provides the methodological framework for remote respiration monitoring in free-ranging animals and demonstrates its sensitivity for detecting respiration cues indicating vocal preparation. Further exploration of the association between respiration patterns and vocalization will provide a perspective on intentionality and planning of animal vocal signalling. Additionally, respiration cues could function as an interaction regulation mechanism, with individuals identifying their neighbours’ preparations to vocalize and adjusting their own behaviour accordingly. Addressing this aspect could provide insights on the levels of social awareness during communicative interactions and the ability to perceive conspecific intentions.

We are grateful to A. Strandburg-Peshkin for the peak detection script, as well as for the stimulating discussions, and to B. Averly for help in the field. We are obliged to Toronto Zoo, ON, Canada and to The Zoological Centre Tel Aviv-Ramat Gan, Israel for allowing pilot work for this project. We thank O. Eitan and Y. Yovel for providing equipment at the exploratory stages of this work, M. Krasnovsky for assistance with early video tracking attempts, A. Ashbury for comments on readability of the manuscript and M. Weinstock for editorial services. We thank the Kalahari Research Trust, and in particular Tim Clutton-Brock, and the Northern Cape Conservation Authority for research permission (FAUNA 1020/2016). We also thank T. Vink and W. Jubber for organizing the field site as well as the managers and volunteers of the Kalahari Meerkat Project (KMP) for maintaining habituation and long-term data collection of the meerkats. This article has relied on records of individual identities and/or life histories maintained by the KMP, which has been supported financially by the European Research Council (742808 to Tim Clutton-Brock, University of Cambridge since 1 July 2018) and the University of Zurich, as well as logistically by the Mammal Research Institute of the University of Pretoria.

Funding

This work was done while V.D. was funded by Minerva Stiftung and Alexander von Humboldt-Stiftung post-doctoral fellowships. Additional funding included Internationalization Initiative Start Up funding, University of Konstanz and Aharon and Ephraim Katzir study grant from the Israel Academy of Sciences and Humanities. The Natural Sciences and Engineering Research Council of Canada (RGPIN-05814 to G.J.T.). M.B.M. was funded by the University of Zurich. Open Access funding provided by Max Planck Institute. Deposited in PMC for immediate release.

Author contributions

Conceptualization: V.D., M.B.M.; Methodology: V.D., G.J.T.; Formal analysis: V.D., G.J.T.; Investigation: V.D., G.J.T.; Resources: M.B.M., G.J.T.; Writing - original draft: V.D.; Writing - review & editing: V.D., M.B.M., G.J.T.; Visualization: V.D.; Funding acquisition: V. D., M.B.M., G.J.T.

Data Availability

The processing scripts and the final dataset used for analysis are available from Mendeley Data: doi:10.17632/mcvdvvs43x.1.

Aubie
,
B.
,
Sayegh
,
R.
,
Fremouw
,
T.
,
Covey
,
E.
and
Faure
,
P. A.
(
2014
).
Decoding stimulus duration from neural responses in the auditory midbrain
.
J. Neurophysiol.
112
,
2432
-
2445
.
Belin
,
P.
,
Zatorre
,
R. J.
,
Lafaille
,
P.
,
Ahad
,
P.
and
Pike
,
B.
(
2000
).
Voice-selective areas in human auditory cortex
.
Nature
403
,
309
-
312
.
Blumstein
,
D. T.
and
Armitage
,
K. B.
(
1997
).
Does sociality drive the evolution of communicative complexity? A comparative test with ground-dwelling sciurid alarm calls
.
Am. Nat.
150
,
179
-
200
.
Chen
,
H.-C.
,
Kaplan
,
G.
and
Rogers
,
L. J.
(
2009
).
Contact calls of common marmosets (Callithrix jacchus): influence of age of caller on antiphonal calling and other vocal responses
.
Am. J. Primatol.
71
,
165
-
170
.
Chu
,
M.
,
Nguyen
,
T.
,
Pandey
,
V.
,
Zhou
,
Y.
,
Pham
,
H. N.
,
Bar-Yoseph
,
R.
,
Radom-Aizik
,
S.
,
Jain
,
R.
,
Cooper
,
D. M.
and
Khine
,
M.
(
2019
).
Respiration rate and volume measurements using wearable strain sensors
.
NPJ Digital Medicine
2
,
8
.
Clutton-Brock
,
T.
and
Manser
,
M.
(
2016
).
Meerkats: cooperative breeding in the Kalahari
. In
Cooperative Breeding in Vertebrates
(ed.
W. D.
Koenig
and
J. L.
Dickinson
), pp.
294
-
317
.
Cambridge University Press
. doi:
Demartsev
,
V.
,
Strandburg-Peshkin
,
A.
,
Ruffner
,
M.
and
Manser
,
M.
(
2018
).
Vocal turn-taking in meerkat group calling sessions
.
Curr. Biol.
28
,
3661
-
3666.e3
.
Elliott
,
K. H.
(
2016
).
Measurement of flying and diving metabolic rate in wild animals: Review and recommendations
.
Comp. Biochem. Physiol. A Mol. Integr. Physiol.
202
,
63
-
77
.
Fahlman
,
A.
and
Madigan
,
J.
(
2016
).
Respiratory function in voluntary participating Patagonia sea lions (Otaria flavescens) in sternal recumbency
.
Front. Physiol.
7
,
528
.
Fitch
,
W. T. S.
(
2006
).
Production of vocalizations in mammals
. In
Encylopedia of Language and Linguistics
(ed.
K.
Brown
), pp.
115
-
121
.
Elsevier
(
incl. Pergamon
).
Fitch
,
W. T.
(
2018
).
The Biology and Evolution of Speech: A Comparative Analysis
. In
Annual Review of Linguistics
, Vol.
4
(ed.
M.
Liberman
and
B. H.
Partee
), pp.
255
-
279
.
Palo Alto
:
Annual Reviews
.
Gall
,
G. E. C.
and
Manser
,
M. B.
(
2017
).
Group cohesion in foraging meerkats: follow the moving ‘vocal hot spot
’.
R. Soc. Open Sci.
4
,
170004
.
Hage
,
S. R.
and
Nieder
,
A.
(
2013
).
Single neurons in monkey prefrontal cortex encode volitional initiation of vocalizations
.
Nat. Commun.
4
,
11
.
Hage
,
S. R.
,
Gavrilov
,
N.
,
Salomon
,
F.
and
Stein
,
A. M.
(
2013
).
Temporal vocal features suggest different call-pattern generating mechanisms in mice and bats
.
BMC Neurosci.
14
,
99
.
Haimoff
,
E. H.
(
1984
).
The organization of song in the Hainan black gibbon (Hylobates concolor hainanus)
.
Primates
25
,
225
-
235
.
Häusler
,
U.
(
2000
).
Vocalization-correlated respiratory movements in the squirrel monkey
.
J. Acoust. Soc. Am.
108
,
1443
-
1450
.
Herbinger
,
I.
,
Papworth
,
S.
,
Boesch
,
C.
and
Zuberbuhler
,
K.
(
2009
).
Vocal, gestural and locomotor responses of wild chimpanzees to familiar and unfamiliar intruders: a playback study
.
Anim. Behav.
78
,
1389
-
1396
.
Herbst
,
C. T.
(
2016
).
Biophysics of vocal production in mammals
. In
Vertebrate Sound Production and Acoustic Communication
, Vol.
53
(ed.
R. A.
Suthers
,
W. T.
Fitch
,
R. R.
Fay
and
A. N.
Popper
), pp.
159
-
189
.
New York
:
Springer
.
Hernandez
,
C.
,
Sabin
,
M.
and
Riede
,
T.
(
2017
).
Rats concatenate 22 kHz and 50 kHz calls into a single utterance
.
J. Exp. Biol.
220
,
814
-
821
.
Kulahci
,
I. G.
,
Rubenstein
,
D. I.
and
Ghazanfar
,
A. A.
(
2015
).
Lemurs groom-at-a-distance through vocal networks
.
Anim. Behav.
110
,
179
-
186
.
Laplagne
,
D. A.
(
2018
).
Interplay between mammalian ultrasonic vocalizations and respiration
. In
Handbook of Behavioral Neuroscience
, Vol.
25
(ed.
S. M.
Brudzynski
), pp.
61
-
70
.
Elsevier
.
Maciej
,
P.
,
Ndao
,
I.
,
Hammerschmidt
,
K.
and
Fischer
,
J.
(
2013
).
Vocal communication in a complex multi-level society: constrained acoustic structure and flexible call usage in Guinea baboons
.
Front. Zool.
10
,
58
.
MacLarnon
,
A. M.
and
Hewitt
,
G. P.
(
1999
).
The evolution of human speech: the role of enhanced breathing control
.
Am. J. Phys. Anthropol.
109
,
341
-
363
.
Maclarnon
,
A.
and
Hewitt
,
G.
(
2004
).
Increased breathing control: Another factor in the evolution of human language
.
Evol. Anthropol.
13
,
181
-
197
.
Manser
,
M. B.
(
1999
).
Response of foraging group members to sentinel calls in suricates, Suricata suricatta
.
Proc. R. Soc. Lond. B Biol. Sci.
266
,
1013
-
1019
.
Manser
,
M. B.
,
Jansen
,
D.
,
Graw
,
B.
,
Hollen
,
L. I.
,
Bousquet
,
C. A. H.
,
Furrer
,
R. D.
and
le Roux
,
A.
(
2014
).
Vocal complexity in meerkats and other mongoose species
. In
Advances in the Study of Behavior
, Vol.
46
, (ed.
M.
Naguib
,
L.
Barrett
,
H. J.
Brockmann
,
S.
Healy
,
J. C.
Mitani
,
T. J.
Roper
and
L. W.
Simmons
), pp.
281
-
310
.
Academic Press
.
Maric
,
V.
,
Ramanathan
,
D.
and
Mishra
,
J.
(
2020
).
Respiratory regulation & interactions with neuro-cognitive circuitry
.
Neurosci. Biobehav. Rev.
112
,
95
-
106
.
McFarland
,
D. H.
(
2001
).
Respiratory markers of conversational interaction
.
J. Speech Lang. Hear. Res.
44
,
128
-
143
.
Melnychuk
,
M. C.
,
Dockree
,
P. M.
,
O'Connell
,
R. G.
,
Murphy
,
P. R.
,
Balsters
,
J. H.
and
Robertson
,
I. H.
(
2018
).
Coupling of respiration and attention via the locus coeruleus: effects of meditation and pranayama
.
Psychophysiology
55
,
e13091
.
Minkina
,
W.
and
Dudzik
,
S.
(
2009
).
Infrared Thermography: Errors and Uncertainties
.
Wiley
.
Passilongo
,
D.
,
Buccianti
,
A.
,
Dessi-Fulgheri
,
F.
,
Gazzola
,
A.
,
Zaccaroni
,
M.
and
Apollonio
,
M.
(
2010
).
The acoustic structure of wolf howls in some eastern Tuscany (central Italy) free ranging packs
.
Bioacoustics
19
,
159
-
175
.
Pereira
,
C. B.
,
Yu
,
X.
,
Czaplik
,
M.
,
Blazek
,
V.
,
Venema
,
B.
and
Leonhardt
,
S.
(
2016
).
Estimation of breathing rate in thermal imaging videos: a pilot study on healthy human subjects
.
J. Clin. Monit. Comput.
31
,
1241
-
1254
.
Pereira
,
C. B.
,
Kunczik
,
J.
,
Bleich
,
A.
,
Haeger
,
C.
,
Kiessling
,
F.
,
Thum
,
T.
,
Tolba
,
R.
,
Lindauer
,
U.
,
Treue
,
S.
and
Czaplik
,
M.
(
2019
).
Perspective review of optical imaging in welfare assessment in animal-based research
.
J. Biomed. Opt.
24
,
1
.
Pomberger
,
T.
,
Risueno-Segovia
,
C.
,
Loschner
,
J.
and
Hage
,
S. R.
(
2018
).
Precise motor control enables rapid flexibility in vocal behavior of marmoset monkeys
.
Curr. Biol.
28
,
788
.
Rafferty
,
G. F.
,
Evans
,
J.
and
Gardner
,
W. N.
(
1995
).
Control of expiratory time in conscious humans
.
J. Appl. Physiol.
78
,
1910
-
1920
.
Riede
,
T.
(
2011
).
Subglottal pressure, tracheal airflow, and intrinsic laryngeal muscle activity during rat ultrasound vocalization
.
J. Neurophysiol.
106
,
2580
-
2592
.
Riede
,
T.
,
Schaefer
,
C.
and
Stein
,
A.
(
2020
).
Role of deep breaths in ultrasonic vocal production of Sprague-Dawley rats
.
J. Neurophysiol.
123
,
966
-
979
.
Rochet-Capellan
,
A.
and
Fuchs
,
S.
(
2014
).
Take a breath and take the turn: how breathing meets turns in spontaneous dialogue
.
Philos. Trans. R. Soc. B Biol. Sci.
369
,
10
.
Schulz
,
T. M.
,
Whitehead
,
H.
,
Gero
,
S.
and
Rendell
,
L.
(
2008
).
Overlapping and matching of codas in vocal interactions between sperm whales: insights into communication function
.
Anim. Behav.
76
,
1977
-
1988
.
Seyfarth
,
R. M.
and
Cheney
,
D. L.
(
2003
).
Signalers and receivers in animal communication
.
Annu. Rev. Psychol.
54
,
145
-
173
.
Smotherman
,
M.
,
Kobayasi
,
K.
,
Ma
,
J.
,
Zhang
,
S.
and
Metzner
,
W.
(
2006
).
A mechanism for vocal-respiratory coupling in the mammalian parabrachial nucleus
.
J. Neurosci.
26
,
4860
-
4869
.
Stekelenburg
,
J. J.
and
Van Boxtel
,
A.
(
2001
).
Inhibition of pericranial muscle activity, respiration, and heart rate enhances auditory sensitivity
.
Psychophysiology
38
,
629
-
641
.
Sugiura
,
H.
(
1998
).
Matching of acoustic features during the vocal exchange of coo calls by Japanese macaques
.
Anim. Behav.
55
,
673
-
687
.
Sugiura
,
H.
and
Masataka
,
N.
(
1995
).
Temporal and acoustic flexibility in vocal exchanges of coo calls in Japanese macaques (Macaca fuscata)
. In
Current Topics in Primate Vocal Communication
(ed.
E.
Zimmermann
,
J. D.
Newman
and
U.
Jürgens
), pp.
121
-
140
.
Springer
.
Tattersall
,
G. J.
(
2016
).
Infrared thermography: a non-invasive window into thermal physiology
.
Comp. Biochem. Physiol. A Mol. Integr. Physiol.
202
,
78
-
98
.
Tattersall
,
G. J.
,
Danner
,
R. M.
,
Chaves
,
J. A.
and
Levesque
,
D. L.
(
2020
).
Activity analysis of thermal imaging videos using a difference imaging approach
.
J. Therm. Biol.
91
,
102611
.
Townsend
,
S. W.
,
Hollén
,
L. I.
and
Manser
,
M. B.
(
2010
).
Meerkat close calls encode group-specific signatures, but receivers fail to discriminate
.
Anim. Behav.
80
,
133
-
138
.
Vainer
,
B. G.
(
2018
).
A novel high-resolution method for the respiration rate and breathing waveforms remote monitoring
.
Ann. Biomed. Eng.
46
,
960
-
971
.
West
,
R.
and
Larson
,
C. R.
(
1993
).
Laryngeal and respiratory activity during vocalization in macaque monkeys
.
J. Voice
7
,
54
-
68
.
Wild
,
J. M.
(
1997
).
Neural pathways for the control of birdsong production
.
J. Neurobiol.
33
,
653
-
670
.
Winkworth
,
A. L.
,
Davis
,
P. J.
,
Adams
,
R. D.
and
Ellis
,
E.
(
1995
).
Breathing patterns during spontaneous speech
.
J. Speech Hear. Res.
38
,
124
-
144
.
Worthington
,
J.
,
Young
,
I. S.
and
Altringham
,
J. D.
(
1991
).
The relationship between body mass and ventilation rate in mammals
.
J. Exp. Biol.
161
,
533
-
536
. doi:

Competing interests

The authors declare no competing or financial interests.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

Supplementary information