Previous research has suggested that the peaks in the first derivative (dEGG) of the electroglottographic (EGG) signal are good approximate indicators of the events of glottal opening and closing. These findings were based on high-speed video (HSV) recordings with frame rates 10 times lower than the sampling frequencies of the corresponding EGG data. The present study attempts to corroborate these previous findings, utilizing super-HSV recordings. The HSV and EGG recordings (sampled at 27 and 44 kHz, respectively) of an excised canine larynx phonation were synchronized by an external TTL signal to within 0.037 ms. Data were analyzed by means of glottovibrograms, digital kymograms, the glottal area waveform and the vocal fold contact length (VFCL), a new parameter representing the time-varying degree of ‘zippering’ closure along the anterior–posterior (A–P) glottal axis. The temporal offsets between glottal events (depicted in the HSV recordings) and dEGG peaks in the opening and closing phase of glottal vibration ranged from 0.02 to 0.61 ms, amounting to 0.24–10.88% of the respective glottal cycle durations. All dEGG double peaks coincided with vibratory A–P phase differences. In two out of the three analyzed video sequences, peaks in the first derivative of the VFCL coincided with dEGG peaks, again co-occurring with A–P phase differences. The findings suggest that dEGG peaks do not always coincide with the events of glottal closure and initial opening. Vocal fold contacting and de-contacting do not occur at infinitesimally small instants of time, but extend over a certain interval, particularly under the influence of A–P phase differences.
INTRODUCTION
Monitoring and assessment of vocal fold vibration in voice production is crucial for understanding normal and disordered voice, treating voice disorders and training professional voice users. Observation of the larynx during phonation is performed with either direct or indirect methods. The direct methods, such as videostrobolaryngoscopy (Bless et al., 1987), videokymography (Švec and Schutte, 1996) or high-speed videoendoscopy (Rubin and LeCover, 1960; Moore et al., 1962; Hertegård, 2005; Deliyski et al., 2008; Deliyski and Hillman, 2010), provide insights into the spatiotemporal oscillatory behaviour of laryngeal tissue. However, they are semi-invasive (not well tolerated by some subjects), cost intensive, and usually only performed by trained personnel in dedicated premises such as voice clinics.
As a low-cost, non-invasive alternative, vocal fold vibration can be monitored indirectly by electroglottography (EGG) (Fabre, 1957). A low-amperage, high-frequency current is passed between two electrodes placed on each side of the thyroid cartilage at vocal fold level. The time-varying change of vocal fold contact during the flow-induced oscillation of laryngeal tissue induces variations in the electrical impedance across the larynx, resulting in variation in the current between the two electrodes (Fourcin and Abberton, 1971; Baken, 1992; Baken and Orlikoff, 2000). These admittance variations are proportional to the relative vocal fold contact area during phonation (Scherer et al., 1988).
Experimental research has suggested that landmarks in the EGG waveform are related to the relative movement and position of the vocal folds during phonation (Rothenberg, 1979; Baer et al., 1983; Childers et al., 1983; Hess and Ludwigs, 2000). The physiologic relevance of the EGG signal has been examined theoretically by Childers et al. (Childers et al., 1986) and Titze (Titze, 1989; Titze, 1990), the latter discussing the effects of (1) increased glottal adduction, (2) glottal convergence (with vertical phasing), (3) medial vocal fold surface bulging and (4) increased vertical phasing in vocal fold vibration.
The moments of glottal opening and glottal closure are of particular interest for quantitative analysis of the voice source. The timing of these events can be used to determine the relative proportion of glottal closure within a glottal vibratory period (Rothenberg and Mahshie, 1988), known as the ‘larynx closed quotient’ (Howard, 1995) or ‘contact quotient’ [CQEGG (Orlikoff, 1991)]. This quotient has been found useful in clinical as well as in basic voice research (e.g. Schutte and Miller, 2001; Henrich et al., 2005; Švec et al., 2008). However, the calculation of the CQEGG is influenced by the choice of algorithm used to determine the contacting and de-contacting instants, and must therefore be used with caution (Sapienza et al., 1998; Higgins and Schulte, 2002; Henrich et al., 2004; Kania et al., 2004; Herbst and Ternström, 2006; La and Sundberg, 2012).
For the purpose of calculating the CQEGG, estimation of the instants of glottal closure and opening is performed by either (1) applying a threshold criterion to the locally normalized EGG signal (Rothenberg and Mahshie, 1988), or (2) finding positive and negative maxima in the first mathematical derivative of the EGG signal (dEGG), reflecting the maximum rate of change of the EGG signal with time (Teaney and Fourcin, 1980; Childers and Krishnamurthy, 1985; Henrich et al., 2004). Because the latter
List of abbreviations
- A–P
anterior–posterior
- CQEGG
electroglottographic contact quotient
- dEGG
first derivative of the electroglottographic signal
- dGAW
first derivative of the glottal area waveform
- dVFCL
first derivative of the vocal fold contact length signal
- DKG
digital kymography or digital kymogram
- EGG
electroglottography or electroglottographic
- GAW
glottal area waveform
- GVG
glottovibrogram
- HSV
high-speed video
- MCQ
membranous contact quotient
- TTL
transistor-transistor logic
- VFCL
vocal fold contact length
approach does not rely on arbitrary user input (i.e. the arbitrary choice of a threshold) but is rather based on intrinsic properties of the EGG signal, it may be better suited for detection of the glottal opening and closure instants. The relation of positive and negative peaks in the dEGG signal to the events of glottal closure and opening, as seen in laryngeal imaging performed with a wide range of imaging frame rates, has been evaluated in several studies (see Table 1). These studies highlight the potential of the one-dimensional EGG signal (and in particular its first derivative, dEGG) to reveal information about the complex three-dimensional contacting motion of the vocal folds in a non-invasive fashion.
The vocal folds do not vibrate as a uniform mass [e.g. as is seen in a one-mass model of the vocal folds (see Flanagan and Landgraf, 1968)]. Rather, their vibration is characterized by phase differences along both the inferior–superior (Baer, 1981; Titze et al., 1993) and anterior–posterior (A–P) dimensions (Tanabe et al., 1975; Krenmayr et al., 2012; Orlikoff et al., 2012; Yamauchi et al., 2013). These phase differences cause time-delayed contacting and de-contacting of the vocal folds along the respective axes. There is thus no specific instant of glottal closing and opening, but rather an interval during which the closing and opening, respectively, occur. This is reflected in some of the findings summarized in Table 1. Furthermore, phase differences along the A–P dimension, i.e. the so-called ‘zipper-like’ opening and closure (Childers et al., 1986), may introduce multiple peaks into the dEGG waveform (Hess and Ludwigs, 2000; Henrich et al., 2004). The data reported in a recent study (Herbst et al., 2010) give reason to assume that multiple dEGG peaks represent systematic physiological phenomena rather than artifacts.
In most of the studies shown in Table 1, the frame rate of the high-speed video (HSV) recordings was approximately an order of magnitude below the commonly used sampling frequency for recording EGG data (i.e. 44,100 Hz). The timing accuracy of assessments of glottal closing and opening instants or intervals is thus limited by the lower video frame rate. In particular, what appears to be a closing instant in HSV with a low frame rate might actually turn out to be a closing interval in HSV with a higher video frame rate, particularly if the glottal opening or closing exhibits a phase delay along the A–P glottal axis. In order to investigate this issue, super-HSV imaging with a video frame rate of more than half the EGG signal sampling rate is used here to relate the landmarks of vocal fold vibration to those found in the dEGG signal.
RESULTS
The process used to extract the three analyzed sequences and their respective subglottal pressures is illustrated in Fig. 1.
Sequence 1: double dEGG peaks in de-contacting phase
The electroglottographic (EGG) and vibratory data for sequence 1 are illustrated in Fig. 2. In the opening phase, a pronounced A–P phase difference (‘zippering’) was seen following full glottal closure from ~5.2 ms to ~8.0 ms (see arrow in Fig. 2D, supplementary material Movie 1), suggesting the presence of an x-20 or x-21 vocal fold vibratory mode (Berry et al., 1994) (see Fig. 3 and supplementary material Movie 8). The EGG signal amplitude started to decrease ~2 ms before the moment of initial glottal opening (marker T1), which suggests the presence of a strong phase difference of the vocal fold vibration along the inferior–superior dimension. The inferior vocal fold edges – not seen in the HSV – presumably started to separate around t≈3 ms, just after the previous moment of complete glottal closure.
Two negative maxima of similar amplitude were found in the dEGG (see dashed vertical markers T1 and T2 in Fig. 2A). The first of these negative peaks was synchronized with the moment of initial glottal opening, preceding it by only 0.02 ms (see supplementary material Movie 2). This event was also reflected by a local maximum in the derivative of the glottal area waveform (dGAW) and a pronounced local minimum in the derivative of the vocal fold contact length waveform (dVFCL; see Fig. 2B,C). The second negative dEGG peak (marker T2) occurred when the posterior glottis was still partially closed, and it did not coincide with a peak in either the dGAW or dVFCL waveform.
The closing phase was characterized by an A–P phase difference (‘zippering’), suggesting again that the x-20 or x-21 mode contributed to the vibratory pattern. One distinct dEGG maximum was observed (Fig. 2A, marker T3), which preceded the moment of complete glottal closure (HSV, marker T4) by 0.61 ms.
The kymograms in Fig. 2E–H depict the time-varying glottal opening at 80, 60, 40 and 20% of the entire glottal length, respectively. The quantitative glottal width data, extracted from the glottovibrogram (GVG) data in Fig. 2D, were superimposed upon one complete glottal cycle (see the light blue shapes in Fig. 2E–H). The A–P phase difference in both the opening and closing phases was reflected in the kymograms. Marker T1 coincided with the moment of initial glottal opening at a position of 20% of the glottal length, i.e. digital kymogram (DKG) 0.2 in Fig. 2H. With increasing posterior position along the glottal axis, the duration of the glottal open phase decreased. Consequently, the DKG 0.8 (Fig. 2E) location, extracted at a position closest to the posterior boundary of the glottis, had the longest closure duration.
Sequence 2: single dEGG peaks
The EGG and vibratory data for sequence 2 are shown in Fig. 4. Because of the observed period doubling in this sequence, each period contained two glottal cycles, and each of these cycles consisted of one phase of vocal fold de-contacting and contacting, respectively. These two cycles are identified in Fig. 4 as ‘cycle 1’ (from marker T0 to marker T3) and ‘cycle 2’ (from marker T3 to marker T6).
The opening phase of cycle 1 was characterized by a slight GVG ‘hourglass’ pattern (see arrows in Fig. 4D, supplementary material Movie 3) occurring over a period of ~0.2 ms, suggesting the presence of an x-30 or x-31 vibratory mode (see Fig. 3, supplementary material Movie 9). The decrease of the EGG signal amplitude occurred over a duration of ~1 ms just before the moment of initial glottal opening (Fig. 4A, vertical marker T1). As in the previous sequence, this may indicate the presence of a phase difference of the vocal fold vibration along the inferior–superior dimension. The inferior vocal fold edges –not seen in the HSV–presumably started to separate around t≈3 ms (or even slightly earlier, if the decrease of vocal fold contact area at the inferior vocal fold margin was counteracted by an increase of vocal fold contact area along the superior vocal fold margin, resulting in a ‘flat-top’ EGG waveform between t≈2 ms and t≈3 ms). One distinct negative dEGG peak (Fig. 4A) was found in the de-contacting phase (dashed vertical marker T2 in Fig. 4), which was delayed by 0.13 ms from the moment of initial glottal opening in the HSV data (see supplementary material Movie 4). This dEGG peak was temporally aligned with a local maximum of the dGAW waveform (Fig. 4B) and a local minimum of the dVFCL waveform (Fig. 4C). These peaks occurred at the moment when the central portion of the glottis opened, i.e. when the vocal fold edges lost their contact along the entire glottal axis (marker T2).
In cycle 1, the glottis closed with a slight ‘anti-hourglass’ zippering motion towards the center of the glottal axis (see supplementary material Movie 3), again suggesting the presence of an x-31 vibratory mode. One pronounced positive dEGG peak was found (Fig. 4A, vertical marker T3). This peak was temporally aligned with the moment of complete glottal closure (as determined from the GVG, Fig. 4D) and a positive peak in the dVFCL waveform (Fig. 4C).
The opening phase of cycle 2 was also characterized by a slight ‘hourglass’ pattern (see supplementary material Movie 3), suggesting the presence of an x-30 or x-31 vibratory mode. The moment of initial glottal opening was reflected by a negative peak in the dVFCL waveform (Fig. 4C, vertical marker T4). One pronounced negative peak was found in the dEGG signal (Fig. 4A, vertical marker T5), which was delayed by 0.45 ms as compared with the moment of initial glottal opening (marker T4). The negative dEGG peak coincided with a positive peak in the dGAW waveform and a negative peak in the dVFCL waveform (Fig. 4B,C, marker T5).
In cycle 2, the vocal folds closed with an ‘anti-hourglass’ zippering motion towards the center of the A–P glottal axis (see supplementary material Movie 3), again suggesting the continuous presence of an x-30 or x-31 vibratory mode. One pronounced positive peak was found in the dEGG waveform (Fig. 4A), which was synchronized with a positive peak of the dVFCL waveform (Fig. 4C) and preceded the moment of glottal closure as determined from the GVG (Fig. 4D, marker T6) by 0.12 ms.
The kymograms shown in Fig. 4E–H were derived from the HSV and GVG data in a similar fashion as those in Fig. 2. For cycle 1, the moments of initial glottal opening and complete glottal closure (markers T1 and T3) were reflected in the DKG 0.2 (Fig. 4H) and DKG 0.6 (Fig. 4F), respectively. The moment of initial glottal opening in cycle 2 (marker T4) coincided with that in the DKG 0.2, and the moment of complete glottal closure in that cycle (marker T6) was reflected by DKG 0.4 (Fig. 4G).
Sequence 3: double dEGG peaks in contacting phase
The EGG and vibratory analysis data for sequence 3 are displayed in Fig. 5 (see also supplementary material Movie 5). The vocal fold vibration was characterized by a short closed phase of ~12% of the glottal cycle duration, as determined from the VFCL (Fig. 5C) and GVG data (Fig. 5D). The initial glottal opening occurred in the anterior portion of the glottal axis, preceding the initial opening of the posterior glottis by more than 0.5 ms (see markers T1 and T3 in Fig. 5D). One strong minimum was found in the dEGG signal (Fig. 5A, dashed vertical marker T2), which lagged the moment of initial glottal opening (marker T1) by 0.49 ms. This negative dEGG peak, which occurred when the central portion of the glottis opened, did not coincide with any landmark in either the dGAW or dVFCL signals. The dGAW and dVFCL signals had one synchronized peak (Fig. 5B,C, marker T3) at the moment when the posterior portions of the vocal folds separated.
The closing phase of this sequence was characterized by an ‘anti-hourglass’ zippering (see supplementary material Movie 5) towards the center of the A–P glottal axis, suggesting that also in this example an x-30 or x-31 vibratory mode participated in the vocal fold vibration (recall supplementary material Movie 9). Two distinct positive maxima were found in the dEGG signal (Fig. 5, markers T4 and T6), neither of which coincided with any other glottal landmark (see supplementary material Movie 6). The rise in the EGG waveform at marker T4 appears to involve a marked increase in tissue contact in the vertical plane not shown by the gradual GAW decrease (Fig. 5B) and VFCL increase (Fig. 5C) at that time.
The kymograms in Fig. 5E–H were derived from the HSV and GVG data in a similar fashion as those in Fig. 2. Because the moment of initial glottal opening (marker T1) occurred at a position of ca. 30% of the entire glottal length, it is not reflected in any of the displayed kymograms. The moment of complete glottal closure (marker T6) occurred at an offset of 40% along the glottal axis and was thus indicated in the DKG 0.4 (Fig. 5G).
The temporal offsets between glottal closing/opening events (as determined from HSV data) and the respective dEGG peaks for all analyzed sequences are summarized in Table 2.
DISCUSSION
In this study, we present high-speed data of vocal fold vibration recorded at a video frame rate of 27,000 frames s−1. This is, to the best of our knowledge, the highest video frame rate reported in the literature to date for glottal observations, being almost seven times larger than the commonly available recordings made at rates of 4000 frames s−1. The increased temporal accuracy was needed to gain better insights into the temporal alignment between glottal closing and opening events (as determined from HSV data) and positive and negative peaks found in the dEGG signal.
The hypothesis that dEGG waveform peaks provide clear indicators of glottal closing and opening instants was not supported in the three reported vibratory conditions. In only two out of eight cases was a good temporal agreement between dEGG peak and glottal event found: in the opening phase of sequence 1 (see Fig. 2) and the closing phase of cycle 1 in sequence 2 (see Fig. 4), with a measured delay of 0.02 ms, which was below the maximum synchronization error. In three cases, the closing or opening event, respectively, occurred ~0.5 ms before or after the occurrence of the respective dEGG peak (i.e. at an offset of 7–10% of the glottal cycle duration; see Table 2), suggesting that the dEGG peak did not coincide with the moment of glottal closing or opening, respectively. In the remaining three cases, the offset between dEGG peak and the respective glottal event was in the range of 0.1 to 0.15 ms.
In sampled data, the smallest observable instant of time is determined by the sampling frequency (and consequently by the achievable exposure time in the case of video recordings) at which the analyzed data were acquired. As a rule of thumb, a maximum timing error of plus/minus half the time difference between two consecutive samples (i.e. half the synchronization error of 0.037 ms in this study) should always be considered for estimating the accuracy of the occurrence of an instant in time, assuming that the observed vibratory phenomenon is not band limited [i.e. if frequency components higher than half the sampling frequency are present (see McClellan et al., 1998; Roads and Strawn, 1998)].
The data presented in this study show that increased video frame rates provide a means to better understand the relationship between EGG and HSV data (Golla et al., 2009), suggesting that it is not as simple as previously thought: vocal fold vibration may include phase delays along both the inferior–superior and the A–P glottal axis in both the opening and closing phase (Lohscheller et al., 2013). The EGG signal is a time-varying one-dimensional representation of relative vocal fold contact induced by the complex three-dimensional motion of the vocal folds. Glottal contacting and de-contacting are, strictly speaking, not events that happen at an instant in time (having, theoretically, a duration of 0 s). Rather, they represent phenomena that occur over an interval of time.
The concept of a closing or opening ‘event’ or ‘instant’ thus deserves a more rigorous definition. To avoid confusion, we suggest a distinction between (1) opening/closing instants when considering the glottal airflow and the acoustic excitation in relation to vocal fold vibration; and (2) contacting/de-contacting intervals when describing vocal fold vibratory features and EGG recordings and analyses. When the presumed onset and offset of glottal airflow are being discussed in endoscopic (HSV) data with complete glottal closure, the terms ‘instant of initial glottal opening’ and ‘instant of complete glottal closure’ might be used to indicate those moments in time when the glottal air flow just leaves or reaches the baseline value that is seen during the closed phase (zero in the case of complete glottal closure). With ever-improving technology and increasing HSV frame rates, future research may address the question as to how vertical and A–P phase differences influence the abruptness of glottal airflow cessation during the contacting of the vocal folds, thus influencing the spectral slope of the sound source. In this context it is conceivable that the absence of A–P phase differences (i.e. a more abrupt contacting of the vocal folds) is a prerequisite for optimizing sound generation, e.g. as is needed in un-amplified professional singing (Herbst et al., in press).
The existence of a contacting or de-contacting interval is even more evident if the dEGG signal contains multiple positive or negative peaks. Several previous authors have suggested that these double or multiple dEGG peaks are induced by phase differences along the A–P glottal axis (Hess and Ludwigs, 2000; Henrich et al., 2004; Orlikoff et al., 2012). Analysis of the data gathered in the present study supports these findings: all occurrences of multiple dEGG peaks in either the contacting or de-contacting phase coincided with A–P or hourglass vibratory vocal fold patterns, presumably induced by x-2n or x-3n vibratory modes. However, not all occurrences of hourglass vibratory patterns (see sequence 2) coincided with clear double peaks in the dEGG signal. Further research is necessary to clarify this issue.
In six out of the eight events during glottal opening and closure in this study, a dEGG peak coincided with a dVFCL peak. By its nature, the VFCL is only sensitive to changes in vocal fold contact along the A–P glottal axis, because the video data are acquired from a superior viewpoint. Thus, vertical phase differences (along the inferior–superior dimension) are not reflected in quantitative data gained from the detection of the time-varying glottal edges. In contrast, the EGG signal – measuring the time-varying relative vocal fold contact area – is influenced by phase differences of vocal fold vibration along both the visible A–P and the mostly hidden inferior–superior dimension. Therefore, it is likely that any dEGG peak that coincides with a dVFCL peak is caused by an A–P phase difference. By inversion of this argument, we further hypothesize that any dEGG peak that did not coincide with a dVFCL peak may have been caused by either (1) a vertical phase delay of vocal fold vibration or (2) an inhomogeneity of the vocal fold structure (or thickness) along the glottal axis, e.g. when the vocal processes are involved in the vibration.
Conclusions
In conclusion, the evidence presented in this manuscript does not support the common assumption that the maxima found in the dEGG signal always coincide with the moments of glottal closure and opening. Contacting and de-contacting of the vocal folds does not occur at an infinitesimally small instant of time, but extends over a certain interval. The duration of the vocal fold contacting and de-contacting intervals are governed by vibratory phase differences along the A–P glottal axis, which have been observed to cause dEGG double peaks. The VFCL was introduced as a promising new parameter for assessing features of vocal fold vibration from HSV data. Further research (employing HSV recordings with maximally achievable frame rates) is needed in order to examine the exact relationship of these contacting and de-contacting intervals to both A–P and inferior–superior phase differences of vocal fold vibration, possibly also analyzing their relationship to glottal airflow and the acoustic output.
MATERIALS AND METHODS
The excised larynx of a female golden retriever (~6 yr and 30 kg body mass), which died of natural causes, was phonated in an excised larynx setup, as described in a previous publication [see supplementary material in Herbst et al. (Herbst et al., 2012)]. The excised larynx was mounted on a vertical air supplying tube. The upper 4 cm of the specimen's trachea formed an airtight seal with that tube. The vocal folds were adducted with three-pronged devices as described in Titze (Titze, 2006). No longitudinal tension was applied on the vocal folds. The larynx was phonated by blowing warmed and humidified air through the adducted glottis. Subglottal air pressure was controlled manually with a Tescom Regulus 3 D50708 pressure valve (McKinney, TX, USA), and was varied between 0 and 20.5 cm H2O (see Fig. 1), as measured by a Keller PR-41X pressure sensor (Winterthur, Switzerland) positioned 32 cm upstream from the vocal folds.
As regards the appearance of peaks in the dEGG signal, three stereotypical dEGG waveforms can be conceptualized: (A) clear single peaks in both the contacting and the de-contacting phase, (B) a single peak in the de-contacting phase and a pronounced double peak in the contacting phase or (C) a pronounced double peak in the de-contacting phase (and a single peak in the contacting phase). Based on previous research (Hess and Ludwigs, 2000; Henrich et al., 2004), it is hypothesized that scenarios B and C are influenced by A–P phase differences of vocal fold vibration, and scenario A is not.
Following this model, three sequences were selected from a pressure sweep (subglottal pressure ranging from 0 to 20.5 cm H2O, phonation threshold pressure at ~4 cm H2O), each representing one of the three stereotypical dEGG waveforms described above. The individual sequences were selected based on visual inspection of the dEGG signal without any prior knowledge of the HSV data, thus precluding human bias in the selection process. Each sequence had to stem from a locally stable region within the signal, having a minimum of 20 similar periods of oscillation.
HSV recordings were made with a Photron FASTCAM 1024 PCI camera (Photron Limited, Tokyo, Japan) at a frame rate of 27,000 images s−1. In order to provide sufficient illumination for such a high frame rate, two light sources were used simultaneously: a dedocool system (Dedo Weigert Film GmbH, Munich, Germany), and a custom built array of twelve 5 W MR16 LED bulbs (SLV Elektronik GmbH, Übach-Palenberg, Germany), powered by a 12 V car battery. The long-term heat emission from both systems peaked at 31°C, as measured with a Voltcraft IR 260-8S infrared thermometer (Voltcraft, Hirschau, Switzerland).
Synchronization between the HSV and the EGG/acoustic recordings was achieved with a rectangular transistor-transistor logic (TTL) signal (irregular but known pulse duration of ~20 ms, encoding the recording time) generated by a LabJack U6 data acquisition card (LabJack Corporation, Lakewood, CO, USA) that was routed through an IC555 circuit with a rise time of 15 ns. This TTL signal was recorded both as a time-varying voltage by the Fireface sound interface in a separate channel, and as a blinking LED light by the HSV system (see supplementary material Fig. S1). The time-varying intensity values of the pixels in the HSV representing the blinking LED were averaged in each video frame (see supplementary material Fig. S2), and the resulting signal was compared with the TTL signal as captured by the sound card. In cases where the LED took more than one video frame to reach its maximum brightness, the first video frame where the color intensity was greater than the baseline (LED not lit) was chosen to be the onset of the TTL signal. The TTL signal consisted of a steady train of TTL pulses (22 pulses s−1) over the entire duration of the recording, thus ruling out the possibility of a time drift between video system and sound card. The TTL synch signals for both the EGG and the HSV data were correlated to each other by a supervised semi-automatic procedure that is outlined in supplementary material Fig. S3. The synchronization accuracy was dependent on the video frame rate (i.e. the lower of the two sampling frequencies involved). The maximum synchronization error was calculated to be 0.037 ms, i.e. the time delay between two consecutive video frames at a video frame rate of 27,000 frames s−1.
Analysis
In digital kymography (DKG) (Wittenberg et al., 2000), the principles of videokymography are applied to HSV sequences. In order to create a DKG, a line perpendicular to the vocal fold axis is selected within a HSV sequence, and the corresponding video pixels on that line are successively extracted for each video frame in the analyzed sequence. The extracted lines are concatenated in time (separated by the frame rate period) to form the final graph (Švec and Schutte, 2012). The DKGs created for this manuscript were generated using a custom-written Python script (Herbst, 2012), which was run as a plug-in within the FIJI image analysis software package (Schindelin et al., 2012). DKGs were extracted at four equidistantly spaced positions along the glottal axis (Orlikoff et al., 2012) in order to visualize the different vibratory patterns along the glottal axis (Orlikoff et al., 2012; Lohscheller et al., 2013).
To enable quantitative analysis of the vibrating patterns along the entire length of the vocal folds, a clinically evaluated image processing procedure was applied, which is described in detail elsewhere (Lohscheller et al., 2007). With this algorithm, the medial edges of both vocal folds were extracted within each frame of the HSV recording. The segmentation results were superimposed upon the DKGs shown in Figs 2, 4 and 5.
A new parameter, the VFCL, was defined as a measure of the relative degree of vocal fold contact along the A–P glottal axis (Herbst et al., 2013). The VFCL was calculated based on the previously extracted time-varying glottal edge data. This parameter is sensitive to A–P phase differences of vocal fold vibration, but not to vertical phase differences. The VFCL parameter is in essence similar to the membranous contact quotient (MCQ) introduced by Scherer et al. (Scherer et al., 1997). The VFCL differs from the MCQ in that it is calculated for every frame in the HSV data (i.e. the VFCL is a dynamic parameter), whereas the MCQ is determined on a cycle-to-cycle basis, considering the maximum closure along the A–P glottal axis.
The extracted time-varying glottal edges were further used to create GVGs (see Karakozoglou et al., 2012), a visualization technique that transfers information on the time-varying glottal width (as color information) along the A–P dimension into a single graph (Lohscheller et al., 2008). In a GVG, time is displayed on the x-axis, the A–P glottal axis is shown on the y-axis, and the respective normalized distance of the left and right glottal edge (in pixels) is depicted as color information on the z-axis. The GVG can be used to objectively describe the two-dimensional vibration type of glottal opening and closure. For creating the GVG plots shown in this manuscript, no interpolation was used by the plotting software to map the GVG data to the individual pixel values in the graphs. To increase the visibility of smaller vocal fold edge distances within the generated GVGs, the normalized GVG z-axis values were transformed to logarithmic values using the formula ζ[x,y]=log10(9z[x,y]+1).
Acknowledgements
We kindly thank R. Hofer for contributing to the setup of the excised larynx experiment. We are very thankful to the reviewers for their time and expertise, and for their insightful comments.
FOOTNOTES
Funding
This research was supported by European Research Council Advanced Grant ‘SOMACCA’ and a start-up grant from the University of Vienna (to C.T.H. and W.T.F.); the European Social Fund and the state budget of the Czech Republic, project nos CZ.1.07/2.3.00/30.0004 ‘POST-UP’ (to C.T.H. and J.G.Š.) and OPVK CZ.1.07/2.3.00/20.0057 (to J.G.Š.); and grant no. LO1413/2-2 by the Deutsche Forschungsgemeinschaft (to J.L.).
References
Competing interests
The authors declare no competing financial interests.