SUMMARY
The fruit fly Drosophila melanogaster is a widely used model organism in studies of genetics, developmental biology and biomechanics. One limitation for exploiting Drosophila as a model system for behavioral neurobiology is that measuring body kinematics during behavior is labor intensive and subjective. In order to quantify flight kinematics during different types of maneuvers, we have developed a visual tracking system that estimates the posture of the fly from multiple calibrated cameras. An accurate geometric fly model is designed using unit quaternions to capture complex body and wing rotations, which are automatically fitted to the images in each time frame. Our approach works across a range of flight behaviors, while also being robust to common environmental clutter. The tracking system is used in this paper to compare wing and body motion during both voluntary and escape take-offs. Using our automated algorithms, we are able to measure stroke amplitude, geometric angle of attack and other parameters important to a mechanistic understanding of flapping flight. When compared with manual tracking methods, the algorithm estimates body position within 4.4±1.3%of the body length, while body orientation is measured within 6.5±1.9 deg. (roll), 3.2±1.3 deg. (pitch) and 3.4±1.6 deg. (yaw) on average across six videos. Similarly, stroke amplitude and deviation are estimated within 3.3 deg. and 2.1 deg., while angle of attack is typically measured within 8.8 deg. comparing against a human digitizer. Using our automated tracker, we analyzed a total of eight voluntary and two escape take-offs. These sequences show that Drosophila melanogaster do not utilize clap and fling during take-off and are able to modify their wing kinematics from one wingstroke to the next. Our approach should enable biomechanists and ethologists to process much larger datasets than possible at present and, therefore, accelerate insight into the mechanisms of free-flight maneuvers of flying insects.
INTRODUCTION
The fruit fly, Drosophila melanogaster has emerged as one of the most important model genetic organisms used in modern biology. Although their nervous system contains only 300,000 neurons, Drosophila exhibit an array of complex behaviors and offer an ever-increasing and widely accessible set of methods for altering genes and controlling their temporal and spatial expression. One limitation in the use of Drosophila as a model system linking genes to behavior is the difficulty in rapidly quantifying body kinematics. Flight behaviors, in particular, pose many problems for any attempts at automated digitization of movement and behavior. First, wing motion during flight is very rapid, which necessitates the use of high-speed imaging and thus large data streams. Second, flight is intrinsically three-dimensional, which necessitates the coordination of data from two or more cameras. Previous studies of Drosophila flight maneuvers have required laborious manual methods to capture the body and wing kinematics(Altshuler et al., 2005; Card and Dickinson, 2008; Fry et al., 2003; Fry et al., 2005). The time-consuming nature of this approach limits a researcher's ability to thoroughly characterize individual components of flight behavior such as take-off and landing. Thus, development of an automated tracking technique that estimates the complete wing and body posture during flight would greatly facilitate future research on flight control in flies and other organisms.
To address the concerns above, we developed an automated model-based tracking technique that can capture the 3D body and wing motion of Drosophila from a high-speed multi-camera video sequence. Previously,many studies in Drosophila flight control measured the relative wing motion during tethered flight by shining an infrared light upon the fly and measuring the resulting shadow with a photodiode receptor(Dickinson et al., 1993; Gotz, 1987). In that approach,3D wing motion is reduced to a 1D photodiode voltage signal. Recently,Graetzel and colleagues developed a real-time computer vision system to measure the wing motion of a tethered fly(Graetzel et al., 2006). A single camera view was used to track the angular position of the wing's leading and trailing edge in the projected camera view. Zanker and colleagues measured the full 3D motion of flies during tethered flight using stroboscopic video and mirrors to capture multiple views(Zanker, 1990). However, the 3D reconstruction of the fly's geometry relied on manual digitization of six key points on the wing in each camera view. More recently, Fry developed customized software to simplify the manual fit of 3D wing models to free-flight Drosophila in multiple camera views(Fry et al., 2003). This technique was expanded to analyze hovering and take-off behaviors in fruit flies and honey bees (Altshuler et al.,2005; Card and Dickinson,2008; Fry et al.,2005). The algorithm proposed in this paper extends the work of Fry and colleagues (Fry et al.,2003) by developing visual tracking techniques to automatically fit a 3D fly model to images captured from multiple calibrated camera views.
Our approach is motivated by and builds upon computer vision techniques that estimate the 3D rigid motion of a human from multiple calibrated camera views (Moeslund et al., 2006). In model-based approaches, a 3D human model containing kinematic chains is given, and the goal is to estimate the body posture and joint angles using image measurements (e.g. silhouettes, appearance textures, optical flow). Another complementary approach involves the direct reconstruction of the model shape and motion without use of a prior model(Ristroph et al., 2009). The best choice of approach will be dictated by the governing experimental conditions and aims. Model-based approaches facilitate accurate tracking in videos containing occlusions and environmental clutter. Model-based techniques are also better suited to image sequences with poor contrast and low lighting. Conversely, reconstruction-based approaches may not accurately estimate an organism's shape and motion unless the voxels (i.e. 3D pixels) are labeled correctly, which can prove difficult when images have low contrast or contain occlusions and clutter. Model-based approaches also allow the experimentalist to include those degrees of freedom in the model that are most relevant to experimental goals so that the state estimation process performs inference on relevant physical quantities. However, model-based approaches require an extra step (either manual or automated) to initialize the model on the first frame of the image sequence.
While we adopt the model-based idea from the human motion-tracking literature, there are several challenges peculiar to the problem of Drosophila tracking that require special attention. The large body and wing rotations exhibited by Drosophila during take-off require the use of unit quaternions to continuously parameterize the rotations. We present the first velocity-invariant motion prediction model that uses a quaternion parameterization [extending the approach of Rosenhahn and colleagues (Rosenhahn et al.,2007a)]. The near-cylindrical shape of the Drosophilabody makes it difficult to estimate the roll angle about the body's highly symmetric central axis. The human tracking literature has considered such unobservable states (e.g. due to depth ambiguities and rotations about axes of symmetry in limbs) primarily within the context of monocular video(Sminchisescu and Triggs,2003a; Sminchisescu and Triggs, 2003b), and these techniques are not applicable to our multi-camera setup. Recently, Kyrki and Kragic demonstrated tracking of the rotations of spherical objects and solids of revolution by integrating texture features into their CAD model (Kyrki and Kragic, 2006). Unfortunately, due to the low luminance associated with the high frame rate (6000 frames s–1) that is needed to capture Drosophila wing kinematics, our video is void of any robust surface texture features except the silhouette (e.g. see Fig. 1). Instead, we rely upon the gross symmetrical motion of the wing beats to provide cues to the location of the body's dorsal edge.
We apply our method to flight initiation, a behavior that offers particular challenges for an automated tracking algorithm. During take-off, wingbeat frequency is high (Lehmann and Dickinson,1998), the body may undergo large rotational movements(Card and Dickinson, 2008), and images of the body may be occluded by the substratum from which the fly launches itself. Flies are known to initiate flight in at least two ways:voluntary take-offs, which are elicited by attractive odors or unknown internal triggers, and escape take-offs, which are elicited by looming visual stimuli (Trimarchi and Schneiderman,1995; Hammond and O'Shea,2007; Card and Dickinson,2008). Analysis of body motion indicates that during voluntary take-offs animals jump slowly, but stably, into the air. In contrast, escape take-offs are characterized by both high translational and high rotational velocity (Card and Dickinson,2008). In particular, escaping flies often begin flight with an elevated rotational velocity around the roll axis from which they must quickly recover or else crash into the ground. Presumably, flies are able to recover through feedback mediated by their eyes, ocelli, as well as mechanoreceptors on their wings and halteres. Although prior studies show that flies are indeed able to rapidly recover from extreme perturbations at the onset of flight, the means by which they modify wing kinematics to reach a stable equilibrium are unknown. Prior analyses using human-based digitization methods have provided a picture of Drosophila wing and body motion during stable hovering(Fry et al., 2005). Using an automated model-based algorithm for tracking wing and body motion we are able to provide the first picture of how flies transition from the highly unstable conditions at flight initiation to a stable pattern of motion.
The methods section first summarizes the model-based visual tracking approach. Next, a detailed Drosophila model is developed, including a velocity-invariant motion prediction model. Thereafter, we describe the technique for fitting the geometric model to the images while incorporating biomechanical constraints. We also test the results of our method against manually tracked and simulated data. The final section presents results obtained by applying this method to voluntary and escape take-offs.
MATERIALS AND METHODS
The video subjects consist of 3 day old female Drosophila melanogaster (Meigen) filmed at 6000 frames s–1 with a shutter speed of 50 μs. The video sequence was filmed by Card and Dickinson(Card and Dickinson, 2008) in a previous study that analyzed flight initiation using high-speed cameras(Photron Ultima APX, San Diego, CA, USA) to capture orthogonal views at a resolution of 512 pixels×512 pixels. Flies entered the recording volume by crawling up a glass pipette, and the cameras were focused on the pipette tip to maximize the resolution at take-off. After ascending a few body lengths, the airborne flies typically became out of focus in one or more camera views (Fig. 1).
Foreground segmentation
Nearly all conceivable approaches to automated tracking will rely upon accurate foreground segmentation, the process whereby the image pixels belonging to the organism and those belonging to the background are differentiated. Typical laboratory environments provide nearly constant background illumination during flight maneuvers. Hence, background subtraction is used to segment the pixels belonging to the fly. In addition, the appearance of Drosophila is very consistent during the video sequences. Fig. 2D shows a histogram of fly pixel intensity values over 200 frames from three different camera views. The characteristic bimodal shape is due to the opaque nature of the body cuticle, which consistently appears darker than other body parts(i.e. wings and legs) when back-lit. We utilize this appearance consistency to further segment fly pixels into body and appendage groups. At each frame and for each camera, we fit a 1D Gaussian mixture model with two members to the segmented fly pixels using the expectation-maximization (EM) algorithm.
Model-based image tracking and non-linear estimation
Experimental set-up for capturing 3D high-speed sequences of take-off. (A)Arrangement of high-speed cameras and LED panels for back lighting. Individual flies emerged from inside a Pasteur pipette. To elicit escape responses, a stop was removed that released a black disk which fell toward the fly along a brass rod. (B) Images of Drosophila synchronously captured from three camera views. The high-speed video offers no strong visual features except the silhouette. Even with three camera views, the complex wing beat motion is difficult to capture due to low observability of the wings at certain postures and motion out of the camera's depth of field.
Experimental set-up for capturing 3D high-speed sequences of take-off. (A)Arrangement of high-speed cameras and LED panels for back lighting. Individual flies emerged from inside a Pasteur pipette. To elicit escape responses, a stop was removed that released a black disk which fell toward the fly along a brass rod. (B) Images of Drosophila synchronously captured from three camera views. The high-speed video offers no strong visual features except the silhouette. Even with three camera views, the complex wing beat motion is difficult to capture due to low observability of the wings at certain postures and motion out of the camera's depth of field.
Geometric model
The exoskeleton and wings of fruit flies exhibit various deformations during flight maneuvers. However, because the flies are filmed at a low magnification in order to capture the gross body and wing motion, we can reasonably assume that the fly's body and wing parts undergo rigid body motion. Under this rigid body assumption, Dickson and colleagues constructed a polygonal model of the fruit fly from multiple calibrated images of the body and wing (Dickson et al.,2006). This triangular mesh model(Fig. 3A) is integrated into a high performance software library to simulate the Newtonian dynamics of flapping flight (see http://www.ode.org). We used this polygonal model to construct a parameterized generative model of the fly that contains three primitive shapes: the body, head and wings(Fig. 3B). The primitive shapes are assembled into an articulated model where each wing joint is modeled as a spherical joint (permitting arbitrary rotations about all three coordinate axes). In this paper, the head is rigidly attached to the thorax, though extra degrees freedom could easily be added. The shapes are constructed by applying continuous transformations to a B-spline curve. For instance, the fly's body(thorax and abdomen) and head are constructed by revolving a profile curve, R(u), around a known centerline, C(u),while the wings are constructed by transforming and scaling a closed curve(see Fig. 3A–C). This generative model offers a more compact representation of the fly's shape than the triangle mesh (i.e. ∼80 spline control points versus104 mesh points).
Segmentation procedure for Drosophila images. (A) Typical image of Drosophila during flight initiation. (B) Image segmented into body(green) and appendage (yellow) pixels. (C) Histogram of pixel intensities(0–255) from A fitted with the sum of two Gaussians. The local minimum of the Gaussian sum is chosen as the threshold to classify body and appendage pixels. (D) Histogram of fly pixels calculated from background subtraction in over 200 frames across three different camera views. The characteristic bimodal shape is due to the opaque nature of the fly's body cuticle (lower intensity peak) versus the more translucent appendages (higher intensity peak).
Segmentation procedure for Drosophila images. (A) Typical image of Drosophila during flight initiation. (B) Image segmented into body(green) and appendage (yellow) pixels. (C) Histogram of pixel intensities(0–255) from A fitted with the sum of two Gaussians. The local minimum of the Gaussian sum is chosen as the threshold to classify body and appendage pixels. (D) Histogram of fly pixels calculated from background subtraction in over 200 frames across three different camera views. The characteristic bimodal shape is due to the opaque nature of the fly's body cuticle (lower intensity peak) versus the more translucent appendages (higher intensity peak).
One limitation of the current version of our automated tracker is the inability to detect and quantify deformations of the wing and body that violate the rigid body assumption. Wing and body deformations can be quite large in insects (Combes and Daniel,2003a), and may play an important role in stability and maneuverability (Combes and Daniel,2003b). There is nothing in our general methods that would preclude altering the geometric model of the insect to include deformation. However, such an effort would be worthwhile only if the deformation were large enough and the spatial resolution sufficient to capture them. Our imaging system was optimized to capture as large a portion of a fly's take-off behavior as possible, sacrificing spatial resolution for spatial extent. It should be possible for other researchers to modify our imaging arrangement and tracking algorithm in order to detect deformations of the body and wings,especially in larger insects in which such distortions are more pronounced.
Coordinate transformations
An initial step is needed to initialize the fly model to the first frame of the video sequence, adjusting the model's dimensional parameters to the given organism, as well as approximating the animal's initial pose in the first video frame. This initialization process is enabled by a customized software package (Card and Dickinson,2008). A user clicks on six locations (head, tail and joint/tip locations of both wings) on the fly's body in two out of three camera views to localize its 3D position. A wire frame model is next adjusted about its rotational axis until an approximate visual match is realized (see Fig. 5A). Thereafter, the body transformation found by the manual initialization is refined using the registration procedure described in `Model registration' (below) applied to segmented images of the body shown in Fig. 2B. Finally, the body's shape profile is adjusted, while fixing the pose, to match the body-only segmented images. The entire width profile is modeled as the combination of two B-spline curves representing the head and the body. The B-spline control points are adjusted to best match the images,similar to a procedure recently published for zebrafish(Fontaine et al., 2008). Fig. 5B illustrates the results of such a shape refinement process for a particular Drosophila.
Motion prediction via scaled motion dynamics
Practically speaking, the dynamic motion model(Eqn 1) predicts the fly's pose in the current frame being processed, based on the previously estimated poses. This prediction better initializes the fly parameter estimates before they are updated using the image measurement, restricting the search space for possible pose configurations. As a result, the tracking algorithm is able to cope with self-occlusion (e.g. the wing covers the body or vice versa) and other misleading visual clues (e.g. the fly partially covered by the pipette or other obstacles). While this approach assumes a priori knowledge of the animal's dynamics, a first principles construction of such a model is a daunting task, and the resulting model may be quite complicated. Techniques to craft empirical low-dimensional dynamical models have been developed within the human motion-tracking literature, where the motion model is calculated using principal component analysis (PCA)(Sidenbladh et al., 2002) and Gaussian processes (Urtasun et al.,2006; Sminchisescu and Jepson,2004). However, these techniques are not invariant with respect to velocity (i.e. animals that have the same motion pattern but travel at different speeds, or animals traveling the same speed but filmed at different frame rates). Thus, low-dimensional models cannot provide accurate predictions when the training images (i.e. those sequences used to learn the motion pattern) are captured at different frame rates to the experimental images. In addition, dimensionality reduction works well for a single motion pattern, but a difficult-to-estimate mixture of regressors is needed in the case of multiple different motion patterns (Jacobs et al., 1991). We build upon the work of Rosenhahn and colleagues who model the motion patterns in the original high-dimensional space(Rosenhahn et al., 2007a). This approach can incorporate training data that are rescaled to different velocities, and that consist of completely different motion patterns.
Generative model of fly body. (A) Triangle mesh of Drosophilacalculated from multiple calibrated images [courtesy W. Dickson(Dickson et al., 2006)]. (B)Complete generative model constructed from the data points shown in A. The model consists of three shape primitives: the body, head and wing. The generative modeling approach offers a more compact representation of the shape and motion of the fly than its triangle mesh counterpart. (C–E) Method for constructing components of the body shape primitive. (C) The centerline C(u) is a 3D B-spline curve with five control points (only three of them are visible in the axes). The curve of the centerline lies completely in the x–z plane. The width profile, Rb(u), is revolved around C(u)using an elliptical cross-section where the lateral direction is 20% wider than the dorsal–ventral direction. (D) Complete head model of the fly constructed identically to the method described for C using a different profile curve and the x-axis as the centerline. (E) Outline profile of the fly wing model constructed from a closed planar B-spline curve with 20 control points. For other definitions see `Table of abbreviations' in text.
Generative model of fly body. (A) Triangle mesh of Drosophilacalculated from multiple calibrated images [courtesy W. Dickson(Dickson et al., 2006)]. (B)Complete generative model constructed from the data points shown in A. The model consists of three shape primitives: the body, head and wing. The generative modeling approach offers a more compact representation of the shape and motion of the fly than its triangle mesh counterpart. (C–E) Method for constructing components of the body shape primitive. (C) The centerline C(u) is a 3D B-spline curve with five control points (only three of them are visible in the axes). The curve of the centerline lies completely in the x–z plane. The width profile, Rb(u), is revolved around C(u)using an elliptical cross-section where the lateral direction is 20% wider than the dorsal–ventral direction. (D) Complete head model of the fly constructed identically to the method described for C using a different profile curve and the x-axis as the centerline. (E) Outline profile of the fly wing model constructed from a closed planar B-spline curve with 20 control points. For other definitions see `Table of abbreviations' in text.
Model registration
Geometric generative model of Drosophila. Following aeronautical convention, the rotations about the x-, y- and z-axes are defined as roll, pitch and yaw, respectively. The downward pointing z-axis is chosen so that positive pitch angles correspond to pitching upwards. The model's kinematic chain includes a coordinate transformation from the left wing frame to the body frame [given by (Qlw, Tbw)] and a transformation from the body-fixed to the world-fixed frame F, denoted by (Qb, T). Analogous transformations exist for the right wing.
Geometric generative model of Drosophila. Following aeronautical convention, the rotations about the x-, y- and z-axes are defined as roll, pitch and yaw, respectively. The downward pointing z-axis is chosen so that positive pitch angles correspond to pitching upwards. The model's kinematic chain includes a coordinate transformation from the left wing frame to the body frame [given by (Qlw, Tbw)] and a transformation from the body-fixed to the world-fixed frame F, denoted by (Qb, T). Analogous transformations exist for the right wing.
Procedure for initializing automated tracker. (A) We used customized software for manual digitizations of Drosophila body kinematics from Card and Dickinson (Card and Dickinson,2008). Points were clicked at the head, tail wing joint and wing tip in multiple camera views to manually fit a geometric model to the images. The manually estimated body pose was then used as an initial guess for the automated algorithm. (B) At the initial frame, the profile of the body was refined, while holding the pose parameters fixed, to more closely match the actual shape of the fly by minimizing the error described in `Model registration' (Materials and methods).
Procedure for initializing automated tracker. (A) We used customized software for manual digitizations of Drosophila body kinematics from Card and Dickinson (Card and Dickinson,2008). Points were clicked at the head, tail wing joint and wing tip in multiple camera views to manually fit a geometric model to the images. The manually estimated body pose was then used as an initial guess for the automated algorithm. (B) At the initial frame, the profile of the body was refined, while holding the pose parameters fixed, to more closely match the actual shape of the fly by minimizing the error described in `Model registration' (Materials and methods).
Drosophila constraints
Quantifying tracker performance
To assess the accuracy of the proposed method, we compared body pose estimates in six video sequences with those reported in Card and Dickinson(Card and Dickinson, 2008),where the body pose was captured manually using the customized software described in `Coordinate transformations' (above). For the manually tracked study, a reduced body model was fitted to the images typically every five frames and a spline was used to smoothly interpolate the location of the body model at intermediate frames (wing joint angles were not estimated in this study). This approach decreased the labor-intensive nature of manual digitization, and it removed some of the variance attributed to human fitting by providing temporal smoothness. Fig. 9 demonstrates quantitatively that our automated method can realize estimates that are comparable with human visual inspection. In reality, there are no `ground truth' estimates for these data sets. Whereas the performance of human motion-tracking systems is typically measured against a marker-based system, the tiny size of Drosophila only permits visual measurements subject to human interpretation. In this section all errors are reported as root mean square (r.m.s.) values. Wing errors represent an average r.m.s. value between the left and right.
Predictive component of tracker. (A) Motion model used to predict the location of the fly in the next frame, pk+1,given estimates from the previous frame, pk. Here, the displayed motion during the upstroke of the wingbeat is exaggerated to illustrate the concept. (B) Rotational motion of Drosophila left wing motion during take-off (120 out of 380 samples shown). Motion is parameterized by four quaternions which vary smoothly with time. The query of m=5 previously calculated poses is matched with position 106 of the prior database. The relative motion to position 107 is used to calculate the prediction.
Predictive component of tracker. (A) Motion model used to predict the location of the fly in the next frame, pk+1,given estimates from the previous frame, pk. Here, the displayed motion during the upstroke of the wingbeat is exaggerated to illustrate the concept. (B) Rotational motion of Drosophila left wing motion during take-off (120 out of 380 samples shown). Motion is parameterized by four quaternions which vary smoothly with time. The query of m=5 previously calculated poses is matched with position 106 of the prior database. The relative motion to position 107 is used to calculate the prediction.
(A) Correspondence between projected model points
(A) Correspondence between projected model points
Implementation of roll constraint. Because the roll angle of the body is unobservable from silhouette data in the images, a symmetry constraint within the transverse plane of the body must be incorporated. (A) Unconstrained estimate of the fly's pose; (top) projection of the model vectors into the transverse plane, (bottom) 3D pose with transverse plane illustrated in gray. This body configuration is highly unlikely given the biomechanics of Drosophila. (B) Constrained estimate after rotating the body by angle and updating joint angle vectors, Q.
Implementation of roll constraint. Because the roll angle of the body is unobservable from silhouette data in the images, a symmetry constraint within the transverse plane of the body must be incorporated. (A) Unconstrained estimate of the fly's pose; (top) projection of the model vectors into the transverse plane, (bottom) 3D pose with transverse plane illustrated in gray. This body configuration is highly unlikely given the biomechanics of Drosophila. (B) Constrained estimate after rotating the body by angle and updating joint angle vectors, Q.
Performance metrics of tracker compared to human digitizer. Only body kinematics are compared because human-tracked wing motion is unavailable.(A,B) Two frames where a large discrepancy in roll angle was observed between the human estimate and the algorithm. From visual inspection, the human estimate in A appears more accurate than the algorithm's estimate, while in B the algorithm appears to provide a better estimate and more accurate roll angle. (C) Time trace of entire video sequence with frames A and B indicated. Tracker values are solid lines, data from human are shown as open circles. (D)Root mean square (r.m.s.) deviations between the human estimates and our tracker for body orientation and translation. Each bar represents a separate video sequence. The roll angle shows the greatest deviation, as expected due to the symmetrical nature of the fly's body. Video sequence from C has the largest deviation amongst all videos.
Performance metrics of tracker compared to human digitizer. Only body kinematics are compared because human-tracked wing motion is unavailable.(A,B) Two frames where a large discrepancy in roll angle was observed between the human estimate and the algorithm. From visual inspection, the human estimate in A appears more accurate than the algorithm's estimate, while in B the algorithm appears to provide a better estimate and more accurate roll angle. (C) Time trace of entire video sequence with frames A and B indicated. Tracker values are solid lines, data from human are shown as open circles. (D)Root mean square (r.m.s.) deviations between the human estimates and our tracker for body orientation and translation. Each bar represents a separate video sequence. The roll angle shows the greatest deviation, as expected due to the symmetrical nature of the fly's body. Video sequence from C has the largest deviation amongst all videos.
Fig. 9A,B illustrates two configurations where large differences between human and automated roll angle estimates were observed. Based on visual inspection, it appears that the human estimates were more accurate in Fig. 9A, while the automated estimates are slightly better in Fig. 9B. Both display the reduced body frame model used for manual fitting (the long axis indicates the head and tail locations, and the raised `T' junction indicates the approximate wing joint locations which are the visual cues used to determine roll angle). Fig. 9C is the time trace of the body orientation and translation with frames A and B clearly marked. Fig. 9D summarizes the results for all six video sequences. The algorithm can typically estimate the body's center of mass location within 5% of the body length, an absolute distance that is of the order of 0.1 mm. Body orientation is also estimated well. As expected, the roll angle exhibits the largest deviations of 6.5±1.9 deg. on average due to the greater uncertainty associated with rotations about a highly symmetrical axis. The video sequence associated with Fig. 9A–C represents the sequence with the largest error (8.6 deg.) between the human and automated roll angle estimates.
Fig. 10 compares wing angle performance against a human digitizer for a representative voluntary take-off sequence. The results are nearly identical for stroke amplitude (θ; 3.3 deg. error) and stroke deviation (ϕ; 2.1 deg. error), although the two methods do differ for angle of attack (α; 8.8 deg. error), especially during stroke reversals. Such differences are expected, as the subjective choices that are required of a human digitizer are most difficult during stroke reversal when the wing is rapidly flipping and changing direction. This does not imply that a human operator is necessarily less precise, and because there is no absolute ground truth for a captured behavioral sequence it is impossible to determine which method yields more accurate kinematic data. Automatic tracking is, however, more objective and repeatable. Thus, the algorithm will be useful in practical application because it achieves estimates comparable to human interpretation, while significantly decreasing the labor involved in measuring such data.
Comparison of manual and automated tracking of wing kinematics. Each pair of traces (for θ, ϕ and α) plots the kinematic angles for the right (red) and left (blue) wing. Automatically tracked data are shown in color; manual-digitized data are shown in black. The r.m.s. errors are 3.3 deg. (θ), 2.1 deg. (ϕ) and 8.8 deg. (α).
Comparison of manual and automated tracking of wing kinematics. Each pair of traces (for θ, ϕ and α) plots the kinematic angles for the right (red) and left (blue) wing. Automatically tracked data are shown in color; manual-digitized data are shown in black. The r.m.s. errors are 3.3 deg. (θ), 2.1 deg. (ϕ) and 8.8 deg. (α).
Another performance assessment was performed to compare the tracker estimates with an actual ground truth. We use the geometric model(Fig. 11A) and the known experimental camera calibration to construct a set of synthetic images(Fig. 11B) of the fly along a realistic trajectory involving a stable voluntary take-off. The algorithm is used to track these synthetic images, and the results are displayed in Fig. 11C. The difference between the estimate and the ground truth at each time step is displayed as a histogram of residuals. Body position and orientation accuracy are similar to those achieved when comparing with manual tracking(Fig. 9D). Estimates of the wing angles, however, exhibit a broader distribution of errors. Because our model is connected in a kinematic chain, errors in the wing angles are coupled to errors in the body position and orientation. Stroke amplitude (θ) and deviation (ϕ) display strong accuracy with errors of 3.3 deg. and 4.8 deg., respectvely. Geometric angle of attack (α) is also estimated with errors of 17.2 deg. Higher errors in angle of attack are expected due to decreased camera resolution about this degree of freedom. In some instances,the synthetic image contained very few wing pixels due to our model's infinitesimal wing thickness. Also, when the wing speed is small the direction of motion is noisier, which causes the angle of attack measurements to be noisier. For this reason, we do not include the measurements before the initial downstroke in our error analysis.
The tracking algorithm's two failure modes are primarily seen during escape behaviors. These fast maneuvers can cause significant wing deformations that are not captured by our current rigid model (see Fig. 12Aii). However, given the multiple camera views and scaled motion priors, the algorithm is able to continue tracking and provide good estimation once the wing assumes a less deformed configuration (Fig. 12B). Another failure mode of the current algorithm is shown in Fig. 12Ci, where the right wing of the fly is flipped. The frame is taken from an upstroke of the wing path, so the right wing should be undergoing supination like the left wing. Instead, the leading edge is facing downwards as if during a downstroke. This failure primarily occurs during the escape maneuvers when complicated body motions self occlude one of the wings in one or more camera views. Despite the strong motion prior, this causes the state posterior distribution to violate the normal assumption. In addition, the incorrect orientation of the right wing could be caused by the inaccuracies in the body orientation estimate,which are primarily due to inaccuracies in the body shape (length, width,deforming centerline axis, etc.).
Performance metrics of tracker compared with synthetic images. (A) The generative model of the fly and known camera calibration are used to construct(B) synthetic images of a realistic trajectory of stable voluntary take-off.(C) The difference between the estimate (colored line) and the ground truth(black line) at each time step is displayed as a histogram of residuals. Body position and orientation accuracy are similar to those achieved when comparing with manual tracking (Fig. 9D). Stroke amplitude (θ) and deviation (ϕ) have errors of 3.3 deg. and 4.8 deg., respectively. Angle of attack (α) error of 17.2 deg. is due to higher errors when the wing speed is small because the wing velocity direction is a noisier signal. For this reason, we don't plot angle of attack before the initial downstroke.
Performance metrics of tracker compared with synthetic images. (A) The generative model of the fly and known camera calibration are used to construct(B) synthetic images of a realistic trajectory of stable voluntary take-off.(C) The difference between the estimate (colored line) and the ground truth(black line) at each time step is displayed as a histogram of residuals. Body position and orientation accuracy are similar to those achieved when comparing with manual tracking (Fig. 9D). Stroke amplitude (θ) and deviation (ϕ) have errors of 3.3 deg. and 4.8 deg., respectively. Angle of attack (α) error of 17.2 deg. is due to higher errors when the wing speed is small because the wing velocity direction is a noisier signal. For this reason, we don't plot angle of attack before the initial downstroke.
Examples of gross errors in tracking algorithm. (A) During some escape maneuvers, the fly's wing can undergo large deformations (shown in Aii) that are not captured by our current rigid body model. In other camera views (Ai),this deformation is not apparent. (B) Despite this large error, the algorithm does not lose track and is able to continue successful estimation. (C) Another failure mode of the tracking algorithm. The fly as seen in Ci is facing towards the camera during an upstroke. The left wing (top) is in the proper configuration, but the right wing (bottom) is flipped in the wrong orientation(pronation instead of supination).
Examples of gross errors in tracking algorithm. (A) During some escape maneuvers, the fly's wing can undergo large deformations (shown in Aii) that are not captured by our current rigid body model. In other camera views (Ai),this deformation is not apparent. (B) Despite this large error, the algorithm does not lose track and is able to continue successful estimation. (C) Another failure mode of the tracking algorithm. The fly as seen in Ci is facing towards the camera during an upstroke. The left wing (top) is in the proper configuration, but the right wing (bottom) is flipped in the wrong orientation(pronation instead of supination).
Example of voluntary take-off. (A) 3D trajectory of fly during take-off sequence. Wing kinematics for stroke cycles at the beginning, middle and end of the sequence are shown to the right. The right wing is indicated in red,the left in blue. (B) Time history of angles describing wing and body kinematics throughout the take-off sequence. The wing angles were defined relative to a plane through the wing hinges that is inclined 62 deg. from the body axis (see Ai), which is the position of the mean stroke plane in hovering flies. Sequence is shown in supplementary material Movie 1. See text for details.
Example of voluntary take-off. (A) 3D trajectory of fly during take-off sequence. Wing kinematics for stroke cycles at the beginning, middle and end of the sequence are shown to the right. The right wing is indicated in red,the left in blue. (B) Time history of angles describing wing and body kinematics throughout the take-off sequence. The wing angles were defined relative to a plane through the wing hinges that is inclined 62 deg. from the body axis (see Ai), which is the position of the mean stroke plane in hovering flies. Sequence is shown in supplementary material Movie 1. See text for details.
Example of voluntary take-off. Plotting convention same as Fig. 13. Sequence is shown in supplementary material Movie 2. See text for details.
Example of voluntary take-off. Plotting convention same as Fig. 13. Sequence is shown in supplementary material Movie 2. See text for details.
RESULTS
The algorithm is written in MATLAB™ and has an average computation time of 45±3 s frame–1 on a 3.0 GHz Intel® Xeon processor. The source code will be made freely available upon request to the authors. To illustrate our algorithm's capabilities, we analyzed video of two different types of behavior: voluntary take-offs (flies remained on a pipette undisturbed until they flew away) and escape take-offs (flies were startled by the approach of a falling disk). For each video sequence, the geometric model was manually initialized to the first frame according to `Coordinate transformations' (Materials and methods). The database of training samples(`Motion prediction', Materials and methods) consisted of 380 samples from a voluntary take-off and 111 samples from an escape take-off. Initially the training samples were captured manually, and then they were gradually replaced with the estimated values from the tracking algorithm. Wing angles are measured in a body-centered coordinate frame with the longitudinal axis pitched at 62 deg. with the horizontal(Fig. 13Ai), which is consistent with hovering flight (David,1978). The orientation measurements are smoothed with a zero phase lag fourth order Butterworth filter with a cut-off frequency of 1000 Hz and 250 Hz for the wings and body, respectively.
Using our tracking algorithms, we analyzed a total of nine take-off sequences. Of these, we describe in detail four sequences (two voluntary and two escape) that illustrate the range of changes in wing and body kinematics that occur at the onset of flight. Fig. 13 (and supplementary material Movie 1) shows a voluntary take-off in which the onset of flight was particularly smooth and stable. By the third downstroke, the fly reaches a consistent pattern of wing motion that is maintained with little change for the rest of the sequence. This basic pattern in which the wings follow gentle `U-shaped' trajectories is quite similar to that previously described for stably hovering fruit flies using a manual method of digitization (Fry et al.,2005). The main change in stroke kinematics throughout the entire sequence is a slight gradual decrease in stroke amplitude (θ), which is accomplished primarily through a drop in the ventral extent of the wing stroke(compare kinematics at i, ii and iii). Once wing motion stabilizes, the fly maintains a constant body pitch of approximately 45 deg., a slight roll of about 20 deg., and a constant heading. The only major break in left–right symmetry is during the seventh upstroke in which the right wing shows a very high angle of attack. The fact that this change is maintained for just one wingstroke indicates that the fly has the ability to modulate wing kinematics on a stroke-by-stroke basis. There is, however, no obvious change in body orientation as a consequence of this one stroke. The stroke frequency averaged across the sequence is 268 Hz, which is somewhat higher than measured during stable hovering flight, but consistent with studies of flight initiation using tethered flies(Lehmann and Dickinson,1998).
Example of escape take-off. Plotting convention same as Fig. 13. Sequence is shown in supplementary material Movie 3. See text for details.
Example of escape take-off. Plotting convention same as Fig. 13. Sequence is shown in supplementary material Movie 3. See text for details.
The kinematics over the first two strokes in Fig. 13 indicate how rapidly the pattern of wing motion can reach steady conditions following the jump that initiates flight. The flight sequence begins with the wings held constant in a dorsal location. The wing motion parallel to the stroke plane (θ)reaches the steady-state pattern almost instantly, as does the pattern for wing axial rotation (α). The main difference in wing motion during the first two strokes is that stroke deviation (ϕ) exhibits a sawtooth-like pattern of constant downward motion during the downstroke and constant upward motion during the upstroke. The result is that the wing follows a more ventral trajectory during the downstroke than during the upstroke(Fig. 13Ai), opposite to the pattern exhibited during steady flight(Fig. 13Aiii). Note that during the first downstroke the wing angle of attack is nearly parallel to the body axis, which is itself roughly parallel to the horizontal plane. Such an arrangement would create very high vertical forces just as the animal takes off. As found previously for free flight(Fry et al., 2005), the fly in Fig. 13 did not exhibit clap and fling (Weis-Fogh, 1973),even during the initial strokes of take-off.
The sequence shown in Fig. 14 (and supplementary material Movie 2) shows another voluntary take-off. The first two strokes of the flight sequence are virtually identical to those shown in Fig. 12,suggesting that voluntary take-offs begin with a stereotyped pattern of wing motion. The final stroke of the sequence again resembles the `U-shaped'pattern indicative of stable flight. In this sequence, however, the fly generates a brief, but extreme, maneuver starting with the fourth stroke(Fig. 14Aii). At this time,the left wing undergoes a shorter stroke (θ) and a large negative deviation (ϕ) while the right wing maintains the same stroke length but undergoes a positive deviation. The net result is a large left–right asymmetry in wing motion. This asymmetry continues, slightly attenuated,during the fifth stroke, but then reverses during the sixth such that the left wing undergoes a more positive deviation while the right wing undergoes a more negative deviation (ϕ trace, Fig. 14B). Starting with the fourth stroke, the animal begins to roll at a rate of roughly 5500 deg. s–1. Rotation this fast is likely to activate the campaniform sensilla at the base of the halteres(Sherman and Dickinson, 2003),which might be responsible for initiating the compensatory reaction observed during the sixth stroke, in which the pattern of wing motion exhibited in the fourth and fifth stroke reverses. Throughout this maneuver, the animal maintains a constant pitch and heading and a wingbeat frequency of 241 Hz. As with the other voluntary sequence, the fly does not exhibit clap and fling.
The sequence shown in Fig. 15 (and supplementary material Movie 3) is an example of an escape response, elicited by a looming visual stimulus, that nevertheless resulted in a relatively stable take-off. By the fourth and fifth strokes the animal has achieved the `U-shaped' pattern typical of stable hovering. During the second and third strokes of the sequence the fly exhibits a switch in the pattern of stroke amplitude and deviation (ϕ trace, Fig. 15B) that is reminiscent of that seen in Fig. 14. The sequence differs from those shown in Figs 13 and 14 most notably at the start of wing motion. Inspection of the video sequence indicates that the thoracic flight motor begins oscillating before the animal has raised its wings, and as a consequence the wing stutters during the first stroke. The initial downstroke is not coordinated with a large negative deviation as it is in the voluntary take-off sequences. The initial wingbeat frequency is 300 Hz,substantially higher than that measured at the start of the voluntary take-offs or in stable hovering flight(Fry et al., 2005). The animal starts the sequence with its body parallel to the ground, but pitches upward over the first five strokes to reach a posture typical of low stable flight. Again, the fly did not exhibit clap and fling.
Example of escape take-off. Plotting convention same as Fig. 13. Sequence is shown in supplementary material Movie 4. See text for details.
Example of escape take-off. Plotting convention same as Fig. 13. Sequence is shown in supplementary material Movie 4. See text for details.
The sequence shown in Fig. 16 (and supplementary material Movie 4) is an escape take-off that was chosen because it represents an extreme case in wing and body kinematics. At the start of flight, the jump legs generate a large torque that pitches the animal backwards so that it is upside down for most of the sequence. As in the sequence in Fig. 15, the wings begin oscillating before they have been elevated to a proper start position and as a consequence they appear to `unfurl' during the first stroke(Fig. 16Ai). By the end of the sequence, the animal has partially recovered, not by pitching down, but rather by rolling toward an upright position (see the rising roll trace at the bottom of Fig. 15). This strong rolling moment is correlated with an extreme asymmetry in wing motion in which the tip of the right wing follows a broad open loop(Fig. 16Aii). The angle of attack is very high during the upstroke, whereas it is much lower during the downstroke. It is presumably this large asymmetry during the upstrokes and downstrokes of the right wing that creates the moment which starts to roll the animal from an upside-down position. Although the fly is not fully recovered,the kinematics in the last stroke in the sequence begin to resemble the stable flight pattern seen in the other sequences. This sequence illustrates how quickly an animal can recover from an enormous perturbation at the onset of flight. As with the escape sequence shown in Fig. 15, the fly in Fig. 16 begins flight with a rather high wingbeat frequency of 285 Hz. Consistent with the other three sequences, the fly did not exhibit clap and fling during take-off.
A more extensive comparison of voluntary take-offs is shown in Fig. 17, which plots the wing kinematics from eight automatically tracked sequences. To align the sequences from eight different flies, the data were normalized by either stretching or contracting the time axis so that the first three stroke periods were equivalent. The sequences are remarkably similar indicating that voluntary take-offs are quite stereotyped, in contrast to escape responses. None of the flies exhibited clap and fling, and the take-off kinematics closely resemble those of hovering flies (Fry et al.,2005). A detailed analysis of escape take-offs, which are much more variable, will be the subject of a future study.
DISCUSSION
We have presented a practical model-based visual tracking algorithm that estimates the 3D motion of free flying Drosophila from multiple camera views. The algorithm uses a dynamic state estimation framework to provide robustness to self-occlusions and static background clutter such as that due to the glass pipette from which the animals launch. By simply matching motion patterns in a training set, the approach provided accurate prediction in video sequences containing multiple types of behaviors(voluntary versus escape take-off). Our examination of take-off behaviors showed that the algorithm can successfully track a fly through complex tumbling motions. Our method has comparable accuracy to manual-based human digitization, but offers the promise of allowing biologists to analyze much larger image-based databases of kinematics.
Wing kinematics during voluntary take-offs. Each set of traces shows data from eight sequences. Raw data are shown in black; averages are shown in color for the right (red) and left (blue) wings. The standard deviation envelopes are plotted as light red and blue areas in each trace. To align the different flight sequences, the time axis of each trace was normalized so that the first three strokes took the same amount of time. Averages and standard deviation envelopes are shown only over this three-stroke interval.
Wing kinematics during voluntary take-offs. Each set of traces shows data from eight sequences. Raw data are shown in black; averages are shown in color for the right (red) and left (blue) wings. The standard deviation envelopes are plotted as light red and blue areas in each trace. To align the different flight sequences, the time axis of each trace was normalized so that the first three strokes took the same amount of time. Averages and standard deviation envelopes are shown only over this three-stroke interval.
The first application of our method has already provided new insight into the complex dynamics of flight initiation. Our results suggest that voluntary take-offs begin with a stereotyped, feed-forward pattern of motion in which the wing creates large vertical forces during the first downstroke when the longitudinal body axis is parallel to the substratum(Fig. 13Ai; Fig. 14Ai). The pattern of wing motion then approaches that of stable flight within two or three strokes(Fig. 13Aii; Fig. 14Aii). In cases in which the fly must recover from instabilities introduced by the jump, the sequences reveal how quickly the sensorimotor system can respond to bring the animal towards a stable flight posture, even when the animal is initially flipped upside down (Fig. 16). The sequences also suggest that these animals do not rely on clap and fling kinematics to generate elevated lift, even at the onset of flight. This supports the notion that the clap and fling in Drosophila may be in large part an artifact of tethering (Fry et al., 2005). The results confirm, however, that flies do rely on very high wingbeat frequencies at the onset of flight(Lehmann and Dickinson, 1998). All of the sequences show evidence that changes in wing kinematics may last for only one or a few wingbeat cycles, which suggests that the underlying neuromuscular circuits can operate on a stroke-by-stroke basis to alter aerodynamic forces and moments. Evidence for this rapidity is suggested by studies of tethered flight (Heide and Gotz, 1996; Balint and Dickinson, 2004), but is now supported by free-flight kinematics. In the future it will be possible to gain a richer insight into take-offs and other aspects of flight control through the application of model-based automated tracking.
APPENDIX
A constraint on the body's roll angle, which is deployed after the image registration process, can be developed as follows. Conceptually, we assume that the dynamic prediction step produces a roughly correct candidate orientation of the model. The constraint rotates the body about its x-axis so that the z-axis bisects the angle formed by the wing vectors in the body's transverse plane. The equations are developed for the left wing only; an analogous calculation is carried out for the right wing. Let RQb=[XbYbZb]and RQbRQlw=[XlwYlwZlw]denote the coordinate axes of the body and left wing relative to the fixed frame at the current time step (the subscript k is omitted for brevity). Let VL=Ylw and VR=–Yrw be known as the wing vectors that point from the wing tip towards the wing joint.
LIST OF ABBREVIATIONS
- B(u)
bi-normal vector to geometric model's centerline curve
- Ci
3D coordinates of ith camera center
- C(u)
centerline of generative fly body model
- E[·]
expectation
- eij
measured edge location corresponding with xij
- F
world-fixed frame
- f(·, ·)
motion model
- h(·, ·)
measurement model
- Ki
intrinsic parameters of ith camera center
- Lij
Plüker coordinates of projection ray connecting eij and Ci
- M
rigid body coordinate transformation
- nij
local normal vector corresponding to xij
- N(u)
normal vector to geometric model's centerline curve
- p̃
fly parameters from training database
- pk
fly parameters at kth time step
- q
unit quaternion parameterizing rotation in 3D
- Q
unit quaternion
- Qb
quaternion rotation from body to fixed frame
- Qlw
quaternion rotation from left wing joint frame to body frame
- Qrw
quaternion rotation from right wing joint frame to body frame
- Rh
profile curve of head model
- Ri
rotation matrix from fixed frame to camera-centered reference frame for ith camera center
- R(u)
profile curve of fly body model
- T
translation from body-fixed to world-fixed frame
- T(u)
tangent vector to geometric model's centerline curve
- Tbw
translation from body-centered coordinate frame to wing joint frame
- tw
distance from wing center to joint attachment location on geometric model
- u
centerline parameterization
- VL
left wing vector
- VR
right wing vector
- w(·)
projection function to incorporate non-linear constraints
- Xj
3D coordinates of jth model point
- xij
jth boundary point of generative model projected into ith camera image plane
- zk
image measurement at kth time step
- α
geometric angle of attack
- θ
stroke amplitude
- ϕ
stroke deviation
FOOTNOTES
The authors would like to thank Gwyneth Card, who recorded all video illustrated in Results and made her customized manual tracking software and body kinematic data (Card and Dickinson,2008) readily available. Will Dickson provided the digitized surface points of a Drosophila body and wings that were used to construct the generative model. We also thank the Beckman Institute of Caltech for providing financial support for this project. This work was also supported by an NSF FIBR grant (0623527) to M.H.D.