ABSTRACT
Accelerometers are a valuable tool for studying animal behaviour and physiology where direct observation is unfeasible. However, giving biological meaning to multivariate acceleration data is challenging. Here, we describe a method that reliably classifies a large number of behaviours using tri-axial accelerometer data collected at the low sampling frequency of 1 Hz, using the dingo (Canis dingo) as an example. We used out-of-sample validation to compare the predictive performance of four commonly used classification models (random forest, k-nearest neighbour, support vector machine, and naïve Bayes). We tested the importance of predictor variable selection and moving window size for the classification of each behaviour and overall model performance. Random forests produced the highest out-of-sample classification accuracy, with our best-performing model predicting 14 behaviours with a mean accuracy of 87%. We also investigated the relationship between overall dynamic body acceleration (ODBA) and the activity level of each behaviour, given the increasing use of ODBA in ecophysiology as a proxy for energy expenditure. ODBA values for our four ‘high activity’ behaviours were significantly greater than all other behaviours, with an overall positive trend between ODBA and intensity of movement. We show that a random forest model of relatively low complexity can mitigate some major challenges associated with establishing meaningful ecological conclusions from acceleration data. Our approach has broad applicability to free-ranging terrestrial quadrupeds of comparable size. Our use of a low sampling frequency shows potential for deploying accelerometers over extended time periods, enabling the capture of invaluable behavioural and physiological data across different ontogenies.
INTRODUCTION
The foundation of animal ecology is understanding how individuals interact with their abiotic and biotic environment. These interactions are increasingly being measured with bio-logging techniques, where biological data are recorded remotely from devices attached to animals. This approach has allowed researchers to answer questions on everything from hunting tactics of puma (Williams et al., 2014) to energy expenditure in cormorants (Gómez Laich et al., 2011) and diving behaviour in whales (Ishii et al., 2017). Consequently, the ability to continuously ‘observe’ free-ranging animals has facilitated the development and exploration of entirely new theories (Wilmers et al., 2015).
Accelerometers are a valuable tool in bio-logging research as they provide quantitative measurements of animal behaviour and physiology where direct observation is not possible or logistically feasible. The use of accelerometers mitigates some of the major challenges associated with studying the behaviour of wild animals, such as extensive time investment, animal disturbance and observer bias. Accelerometers measure acceleration (gravitational and inertial) caused by animal movement in different planes, allowing the development of classification models calibrated to predict behavioural states such as resting, walking, swimming and eating (e.g. Pagano et al., 2017). Further, there is a strong linear relationship between body acceleration and energy expenditure in many taxa, which is of particular interest to ecophysiologists (Halsey and White, 2010; Wilson et al., 2006; Halsey et al., 2009). Although accelerometry has been used to study animal movement and behaviour for almost two decades (Yoda et al., 1999), recent methodological advancements have increased its accessibility and appeal to a broader scientific community.
Classifying animal behaviours to high-frequency acceleration data presents a suite of new and complex challenges. One approach is unsupervised machine learning, in which pattern-recognition algorithms identify different states directly from the accelerometer signatures. Unsupervised learning is intrinsically challenging so algorithms are frequently used to ‘learn’ the relationship between acceleration data and behaviour using a model-training dataset that is acquired from direct observation. The ability of the algorithm to interpret this relationship depends largely on the variables used to characterise the raw acceleration data. Several attempts to simplify or streamline this approach have been made, with varying success. Ladds et al. (2017) introduced a super-machine-learning method that identified six behaviours in four species of pinniped with approximately 73% accuracy. They used a high sampling frequency (25 Hz), large training dataset (∼90,000 individual data points) and a very large set of input variables (n=147). In contrast, when using fewer input variables and the relatively simple approach (k-nearest neighbour), McClune et al. (2014) classified four behaviours in Eurasian badgers (Meles meles) with an overall classification accuracy of 89%. In general, it is expected that the classification accuracy of a model will increase when using: (a) higher sampling frequencies; (b) more training data; and (c) broader behaviour categories (i.e. fewer behaviours to be classified). The consequence of following these criteria is not only increased computational time and difficulty, but loss of behavioural diversity and decreased deployment time on free-ranging animals due to memory constraints, i.e. the exact opposite of what researchers are aiming for. Reducing the sampling frequency would greatly increase deployment time (e.g. from days to months) whilst also decreasing computational effort. However, it is challenging to accurately classify a broad range of behaviours using very low sampling rates. If we can create a simple model that overcomes the aforementioned hurdles, we will greatly improve integration with other fields, such as movement ecology and physiology.
One major weakness in applying machine-learning algorithms to acceleration data is that, for accurate and reliable identification of different behaviours, a period of observation is required to ‘train’ the algorithm. Therefore, it has only been possible to use this approach on species that can be observed whilst simultaneously recording their acceleration. Campbell et al. (2013) made an important step in overcoming this problem by demonstrating the potential of ‘surrogacy’, whereby a classification model was trained with behavioural observations from one species, and accurately predicted these behaviours in other species that possessed similar morphometrics.
In this study, we describe an approach to the classification of behaviours using accelerometer data collected at the very low sampling frequency of 1 Hz. We used the dingo (Canis dingo Meyer 1793), a medium-sized prototypical quadruped, as an example because it readily exhibits behaviours akin to its wild conspecifics. We used out-of-sample validation to compare the predictive performance of four commonly used classification models (random forest, k-nearest neighbour, support vector machine and naïve Bayes). We then tested the importance of predictor variables for the classification of each behaviour as well as overall model performance. We expected behaviours that were functionally similar, such as lateral and sternal recumbency, would produce similar acceleration signatures and thus be more difficult to classify accurately. Given the increasing use of overall dynamic body acceleration (ODBA) as a measure of activity and as a proxy for energy expenditure (Wilmers et al., 2015), we anticipated that ODBA would show a strong, positive relationship with intensity of movement.
MATERIALS AND METHODS
Data collection
Captive observations were conducted at Cleland Wildlife Park, Adelaide (34.9667°S, 138.6968°E) from August 2016–March 2017 under a University of Adelaide Animal Ethics permit (S-2015-177a). We used three captive-born adult male dingoes (c. 19 kg) that were kept on permanent display in a 2500 m2 outdoor enclosure. We fitted each dingo with a tri-axial accelerometer (LISD2H, ST Microelectronics, USA) built into a custom-made GPS collar (Telemetry Solutions, Concord, CA, USA). The tri-axial accelerometer was programmed to sample changes in acceleration at 1 Hz (one sample per second) and orientated so that the x-, y- and z-axes recorded acceleration along the sway, heave and surge planes, respectively (Fig. 1). Dingo movement was recorded continuously with the accelerometer and visually with a camcorder at 30 frames s−1 (Sanyo Dual Camera Xacti CG10 HD) for eight sessions of ca. 30 min each. Behaviours directly observed from the video footage were manually annotated into a Microsoft Excel spreadsheet by one observer (J.T.) and matched to the corresponding accelerometer data via concurrent timestamps. We synchronised the accelerometer and camera clocks by setting them using the same laptop computer (internet time server) on the morning of each session. Prior to manual annotation, we consulted the timestamp from auxiliary footage (iPhone 8 also set to the internet time server, 30 frames s−1) to confirm the syncing of our devices. Although a handling keeper was present at all times, the focal animal (only ever one dingo per session) was unrestricted and conducted behaviours largely ad libitum. Prior exposure to commercial dog collars ensured that the dingoes did not act atypically during the sampling sessions (Hayley Wells, Cleland Wildlife Park, personal communication).
Determining behaviours from acceleration values requires a sampling frequency that is at least twice as fast as the observed behaviour (Nyquist sampling theorem). Thus, our core criteria for what constituted a behaviour was a repeated movement that consistently lasted two or more seconds. We observed 14 such behaviours and annotated them to 9360 accelerometer data points (equivalent to 156 min) on a per-second basis (Table S1). Quick transitory movements, between recognised behavioural states, were assigned to the behaviour (pre- or post-transition) that was mostly common across the 30 frames s−1. We excluded any behaviours that had sample sizes <20 data points or were clearly observed to be influenced by physical interaction with the keeper. Based on direct observation of dingo movement, each behaviour was broadly assigned to an activity level: low, medium or high (Table 1).
Variable derivation
The ability of classification models to distinguish between behavioural states depends partly on the predictor variables used to characterise the raw acceleration signals. We adopted a comprehensive approach to selecting predictor variables by calculating an extensive list of derived variables (n=66) from the x-, y- and z-axes. These ranged from simple metrics such as the mean and standard deviation of an axis, to more complex, derived variables such as waveform length and signal magnitude area (Table 2). All predictor variables were calculated using a moving window centred on each data point (see detailed description given in ‘Model evaluation’).
Classification modelling
We used supervised machine-learning techniques to fit classification models that used different combinations of the predictor variables. In supervised learning, an algorithm is employed to learn the relationship between a given set of input and output variables (our predictor variables and manually assigned behaviours, respectively) so that, when provided with a new set of input variables, it can predict what the output variables will be. With the goal of finding a reliable method that would be straightforward to implement, we compared four supervised machine-learning algorithms using the R software environment for statistical and graphical computing (R Core Team 2016: http://www.R-project.org/). The k-nearest neighbour (k-NN; R library ‘class’; Venables and Ripley, 2002) is a simple algorithm that employs a number of nearest neighbours (defined by the parameter k) to contribute to the classification of a sample. The majority of behaviours within k observations surrounding the data point being classified determines the behaviour of that data point (Coomans and Massart, 1982). Naïve Bayes (R library ‘e1071’, https://rdrr.io/rforge/e1071/) is a probabilistic classifier that computes the conditional probabilities of a categorical class variable given independent predictor variables using Bayes' rule. A support vector machine (SVM: R library ‘e1071’) constructs an optimal hyperplane to separate patterns, or classes, in the data (Vapnik, 1999). Non-linear classification is achieved using kernel functions (chosen a priori), which nonlinearly map the input vectors into a very high-dimensional feature space. A random forest (RF; R library ‘randomForest’, Liaw and Wiener, 2002) is an ensemble method for classification in which a set of decision trees are constructed that are then used to classify a new instance according to the majority vote (Breiman, 2001). The number of decision trees needed generally increases with the number of predictor variables used. Each of these modelling approaches are widely used, computationally inexpensive, and represent different degrees of complexity to pattern recognition and classification.
Model evaluation
For each machine-learning algorithm, we evaluated a candidate set of models that ranged in complexity from a ‘null’ model containing just the x-, y- and z-axes (n=3 variables) to the most complex model (n=69 variables; Table S2). We tested six different moving windows for variable derivation (4, 8, 16, 32, 64 and 128 s). We also explored how the predictive ability of our four models was affected when we employed a different number of nearest neighbours (1, 3, 5, 7 and 9; k–NN), kernels (linear, radial and polynomial; SVM) and number of classification trees (500, 1000, 2500, 5000, 7500 and 10,000; RF). To measure the predictive performance of each model, we averaged the out-of-sample accuracy (see below) achieved across 10 repeated training-test splits, in each case using a random 90% of the data from each behaviour to train our model and the remaining 10% for testing.
Classification models produce a corresponding probability to their behavioural predictions, to which we apply a threshold criterion (0.1–0.9) that determines the rate of TP, TN, FP or FN. The threshold is usually used to fine-tune model parameters such as sensitivity and specificity. Therefore, it is important to choose a threshold based on the research questions and consequences associated with practical application of the model. Given that the intended practical application of this research is to predict behaviours of free-ranging animals, we chose a threshold that would maximise our TSS score whilst minimising the amount of unclassified data points.
Overall dynamic body acceleration
Dynamic body acceleration (DBA) was calculated by subtracting a running mean from each acceleration axis to give acceleration values occurring from inertia (i.e. movement). We chose a running mean of 4 s (i.e. four data points) because it was roughly half the length of our most active behaviour (running) and, thus, we minimise any loss of resolution for each behaviour. The absolute value for each axis (DBA) was summed to give ODBA, an overall value for dynamic acceleration. To determine whether ODBA differed between the observed behaviours, and whether there was a positive relationship between ODBA and activity level, we conducted an ANOVA and Tukey's test for paired comparisons. All analyses were conducted in the R software environment for statistical and graphical computing (http://www.R-project.org/).
RESULTS
Across the four machine-learning algorithms that we tested, the RF classification models produced superior out-of-sample validation scores (Fig. 2). The top 50 classification models were all achieved using the RF algorithm. Despite differing considerably in their complexity (Table S3), the predictive capacity of these models was similar (Δ mean TSS≤0.04). Our ‘best’ model, which ranked third overall, was selected due to its low number of predictor variables (26 of a possible 69) and classification trees (1000), in conjunction with returning the lowest range in TSS scores between the 14 behaviours (Table 3). This selected RF model predicted all 14 dingo behaviours with high accuracy (mean TSS=0.87).
Comparisons of different RF models indicated that the predictor variable set used had the strongest influence on the ability of RFs to predict dingo behaviours (Fig. 3). When models were fitted with just the x-, y- and z-axes (predictor variable set 1), they produced the lowest predictive accuracy. The RF models that were constructed using a moving window of ≥16 s were substantially better at classifying behaviours, whereas varying the number of classification trees had little effect on model accuracy (Fig. 3). The z-axis, which measured surge movement, was highly variable between behaviours and therefore was particularly important for classification (Fig. 4). Further, predictors that described the variation (s.d.) and characterised the distribution of sway movements proved to be valuable for classifying behaviours (Fig. 5). We attempted to refine the model further by excluding the variables that contributed least to the model (mean decrease in accuracy ≤60%; n=8) but found a reduction in the ability of the model to identify dingo behaviours (mean values: TSS=0.85, MCC=0.87, F-measure=0.88, precision=0.91, sensitivity=0.86 and specificity=0.99).
Overall, our selected model performed better at classifying low-intensity, stationary behaviours. Specifically, all behaviours where dingoes were lying down were identified with very high accuracy (TSS>0.90). In contrast, upright and more dynamic movements such as trotting and running were classified less well (TSS=0.46 and 0.62, respectively). We observed high specificity for each of our behaviours (0.92–1.00) and thus our selected model was robust to misclassification. Although the majority of behaviours exhibited a sensitivity above 0.90, low sensitivity for trotting and running indicated that the model had difficulty with positive classification of these behaviours (Table 3).
A threshold of 0.3 produced the model with the optimal balance between sensitivity and specificity (TSS score) and unclassified data points (Fig. S1). At this threshold, the overall number of incorrectly classified behaviours was very low, with higher classification errors occurring in the more active behaviours. Misclassifications produced by the model tended to confuse closely related behaviours; for example, ‘trotting’ most often misclassified as ‘walking’, and ‘running’ misclassified as ‘trotting’ (Table S4).
Post hoc comparisons using Tukey's test indicated that the mean ODBA for each of our highly active behaviours was significantly greater than all behaviours except ‘collar discomfort’ (Table S5). When sorted by mean ODBA, all 14 behaviours grouped into their pre-assigned activity level, displaying a positive relationship between ODBA and animal activity (Fig. 6).
DISCUSSION
Accurate classification across a range of behaviours has been a great challenge for the majority of accelerometry studies. Our study is the first to use accelerometry to accurately classify a broad range of behaviours for an apex predator, at the very low sampling frequency of 1 Hz. Employing a comprehensive, yet strategic, approach to fitting and selecting our best model allowed us to address a number of challenges associated with translating raw acceleration data into a meaningful and biologically relevant format.
As expected, the choice of predictor variables influenced classification accuracy. Several other studies provide evidence for the importance of predictor variable selection. For example, Alvarenga et al. (2016) achieved an overall model accuracy of ∼85% (across five behaviours) when using 44 derived predictor variables, whereas Martiskainen et al. (2009) used 28 relatively simple predictor variables and produced a mean accuracy across eight behaviours of ∼94%. Although there was only a minor change in overall model accuracy between the larger predictor variable sets, the model's predictive ability for individual behaviours was influenced considerably by choice of predictor variables. Changes in acceleration across each axis depended largely on the type of movement and, therefore, a single axis (or predictor variable) may better capture the acceleration signature of one behaviour over another. Graf et al. (2015) reported the heave (up and down) axis to be particularly important for classifying different behaviours in Eurasian beavers (Castor fiber), while Alvarenga et al. (2016) found that, in sheep (Ovis aries), the surge (back and forth) axis contributed most to the model. Given that some predictor variables will be better suited to assist the classification model in distinguishing certain behaviours, a priori selection of predictor variables should be used, where decisions are driven by the behaviours of interest.
Acceleration during movement can change over very short time periods and is therefore commonly measured at infrasecond frequencies (between 8 and 100 Hz). Measuring acceleration at high frequencies increases computational effort for fitting classification models, limits the deployment period of accelerometers due to memory constraints, and may be unnecessary for behavioural classification in many instances. We are the first to show that a large number of distinct behaviours (14) can be classified using tri-axial accelerometer data derived from an unconventionally low sampling frequency of 1 Hz. There are few instances in the literature where an equally low sampling frequency was used, which is surprising given that several studies report only minor decreases in classification accuracy when down-sampling, for example, from 64 to 8 Hz (Wang et al., 2015), or 25 to 10 Hz (Alvarenga et al., 2016). However, there is a lower limit to sampling frequency that can be used to classify certain behaviours. In our study, the classification accuracy for our most active behaviours was not high. Given that the more active behaviours, such as running and playing, are swift movements performed over short time periods, our sampling frequency may not have allowed enough of the acceleration signature to be captured in order to adequately train the model. This issue is highlighted in small species, owing to their tendency for rapid movements of short duration (Hammond et al., 2016). Hammond et al. (2016) attached accelerometers to chipmunks (body mass c. 50 g) and found the lowest sampling frequency that resulted in negligible decreases in model accuracy to be 20 Hz. Our study provides evidence that a very low sampling frequency can be used to classify a range of behaviours with high accuracy, in a medium-sized animal.
In our study, the expectation that functionally similar behaviours would most often misclassify as each other was only realised for the highly active behaviours. This is best explained by high intra-behaviour variation and inter-behaviour overlap within the axes. If classifying high-intensity behaviours with high accuracy is crucial, it may be necessary to increase the resolution of the acceleration signature by using a higher sampling frequency, but at a cost to deployment time. Our selected model performed extremely well (low misclassification rate) at classifying low-intensity functionally similar behaviours such as different resting postures. In studies where misclassification is particularly undesirable, it is common to group behaviours into broader classes, such as ‘active’ and ‘inactive’, which has the benefit of increasing classification accuracy but at the cost of behavioural diversity (e.g. Shamoun-Baranes et al., 2012). We chose a model that would identify a range of highly active behaviours, despite relatively low classification accuracy. We then managed our misclassification errors (model sensitivity) by choosing a threshold that would balance the number of unclassified and misclassified samples whilst retaining high overall model accuracy. If we were to apply our selected model to accelerometer data from free-ranging dingoes, we propose increasing the threshold from 0.3 to 0.5 for three reasons. Firstly, at 0.5 our overall model accuracy remains high at over 80%; secondly, our data will not be swamped by errors of omission; and, lastly, we are not overly concerned with a minor increase in misclassifications because the majority of behaviours will be misclassified to a functionally similar movement with comparable ODBA values. Overall, our methodological approach and resulting classification model is robust, and can readily be adapted to answer questions about different study systems.
Since Wilson et al. (2006) first presented evidence to suggest a positive correlation between ODBA and activity in cormorants, the use of accelerometry as a tool to remotely measure physiological traits such as energy expenditure and energy–time budgets has exploded. We also found a positive relationship between animal activity level and ODBA. Highly active behaviours exhibited significantly higher mean ODBA values than all low and medium behaviours. However, our results suggest that some caution should be taken when using ODBA as a proxy for energy expenditure. One of our highest ODBA scores came from the behaviour ‘collar discomfort’, which was typified by low overall body movement but acute movement of the accelerometer device due to quick side-to-side actions of the head. The result was ODBA values indistinguishable from ‘trotting’, an energetically demanding and ecologically important behaviour (Reilly et al., 2007). Energy and time budgets are of paramount importance for our understanding of how animals interact with their environment, especially for apex predators given their critical role in maintaining the structure of ecological communities (Fretwell, 1987). A recent study by Wang et al. (2015) used accelerometry to understand how an apex predator modulated their energy budget by examining foraging strategies (akin to ‘searching’ in our study), and in doing so they highlighted the potential benefits for conservation initiatives and human–wildlife conflict resolution. We extend the potential of future research by showing that classifying ecologically relevant behaviours whilst maintaining their aforementioned relationship with ODBA is possible even with acceleration data sampled at a very low frequency. The implication is that we can deploy accelerometers over much longer time periods to capture invaluable behavioural and physiological data across different life history stages of free-ranging animals.
Accelerometry is an exciting tool that is transforming the study of animal behaviour and physiology. The use of accelerometers to remotely classify behaviours of free-ranging animals has appreciable potential. However, prevailing methods limit our ability to establish meaningful ecological conclusions due to the challenge of classifying a diversity of behaviours over a significant period of time. Our approach addresses these constraints and has applicability to free-ranging terrestrial quadrupeds of comparable size. We propose that our approach using the RF model can be directly applied to accelerometer data from other members of the family Canidae, given their shared body type and consistent style of locomotion (Flynn et al., 1988). Canids are a diverse lineage whose members are ecologically and economically important the world over; for example, red wolves (Canis rufus) are threatened with extinction (Kelly et al., 2008), gray wolves (Canis lupus) are keystone predators (Estes et al., 2011), and red foxes (Vulpes vulpes) are invasive pests that cause millions of dollars of damage each year in Australia alone (McLeod, 2004). Through building a classification model that exhibits high predictive performance at low frequency and across a large number of ecologically relevant behaviours, we increase the accessibility of accelerometer-based behavioural research and support much-needed integration with the fields of animal physiology and movement ecology.
Acknowledgements
We thank Hayley Lewis for overseeing field work with captive dingoes, and Raissa Sepulvida Alves, Casey O'Brien and Hannah Bannister for their contribution to data collection. The use of captive dingoes was authorised by Byron Manning, Head Curator of Cleland Wildlife Park. This project was conceived as part of a larger collaborative project with the Australian Wildlife Conservancy. The specially designed collar equipped with tri-axial accelerometer was provided by Telemetry Solutions (Concord, CA, USA). J.T. acknowledges the support he has received for his research through the provision of an Australian Government Research Training Program Scholarship.
Footnotes
Author contributions
Conceptualization: J.T., P.C.; Methodology: J.T., T.A.A.P.; Formal analysis: J.T., P.C., T.A.A.P.; Resources: J.T., P.C.; Writing - original draft: J.T.; Writing - review & editing: J.T., P.C., T.A.A.P.; Supervision: P.C., T.A.A.P.
Funding
J.T. was funded by a Faculty of Sciences, University of Adelaide postgraduate scholarship, and also by an Australian Government Research Training Program Scholarship.
References
Competing interests
The authors declare no competing or financial interests.