Comparative phylogenetic studies of adaptation are uncommon in biomechanics and physiology. Such studies require data collection from many species, a challenge when this is experimentally intensive. Moreover, researchers struggle to employ the most biologically appropriate phylogenetic tools for identifying adaptive evolution. Here, we detail an established but greatly underutilized phylogenetic comparative framework – the Ornstein–Uhlenbeck process – that explicitly models long-term adaptation. We discuss challenges in implementing and interpreting the model, and we outline potential solutions. We demonstrate use of the model through studying the evolution of thermal physiology in treefrogs. Frogs of the family Hylidae have twice colonized the temperate zone from the tropics, and such colonization likely involved a fundamental change in physiology due to colder and more seasonal temperatures. However, which traits changed to allow colonization is unclear. We measured cold tolerance and characterized thermal performance curves in jumping for 12 species of treefrogs distributed from the Neotropics to temperate North America. We then conducted phylogenetic comparative analyses to examine how tolerances and performance curves evolved and to test whether that evolution was adaptive. We found that tolerance to low temperatures increased with the transition to the temperate zone. In contrast, jumping well at colder temperatures was unrelated to biogeography and thus did not adapt during dispersal. Overall, our study shows how comparative phylogenetic methods can be leveraged in biomechanics and physiology to test the evolutionary drivers of variation among species.
Experimental studies in comparative physiology and biomechanics often demonstrate the functional significance of phenotypes in a given environment. A close fit between environment and function can be seen as indicating that a phenotypic state is adaptive, with the assumption that the trait and its function increase the organism's fitness in its current environment (Reeve and Sherman, 1993). Such studies, sometimes called the equilibrium approach, can be and frequently are performed on single species (Lauder, 1982, 1996).
Demonstration of adaptation across species is much less common in comparative physiology and biomechanics (comparative physiology hereafter for brevity; Vogel, 2007). In explicitly historical studies, researchers collect data for multiple species and examine character evolution along a phylogeny. Studies analyzing morphological data with complex phylogenetic comparative analyses are widespread (Cooper et al., 2016a). However, similar studies using physiological data are much rarer owing to three key, interrelated challenges (Lauder, 1990, 2003; Garland et al., 2005; Bauer et al., 2020): (1) collecting data from multiple species distributed across broad geographic areas is expensive and logistically difficult; (2) challenging conditions limit the type of data that can be collected in the field; and (3) difficult experimental procedures using finely calibrated equipment may be hard to replicate on multiple species. Despite these challenges, such studies are essential for understanding the evolution of physiological diversity (Lauder, 2003), particularly over deep time scales.
Phylogenetic studies of adaptation have taken two key tracks. First, early practitioners emphasized studying the order of character change along a phylogeny (Lauder, 1982, 1991; Baum and Larson, 1991). Changes in phenotype that occurred along with or subsequent to changes in the environment on a branch indicated adaptation. This practice followed defining an adaptation as a character state that originated in response to natural selection in a novel environment (Lewontin, 1978; Gould and Vrba, 1982; Amundson, 1996). Yet this approach has largely fallen out of practice for many reasons. Ancestral states are hard to accurately estimate, particularly on phylogenies with only extant species (Cunningham et al., 1998; Cunningham, 1999; Losos, 1999). Moreover, in fast-evolving traits or on long branches, multiple changes on a single branch cannot be detected. Alternatively, infrequent evolutionary change in both phenotype and environment may limit robustly testing the fit between the two (Martins, 2000). Lastly, change in both phenotype and environment along the same branch may make it difficult to test which change was the cause of adaptation: did phenotypic change permit colonization of a novel environment, or did environmental change cause phenotypic adaptation (Lauder, 1991)? In the extreme case of change in several traits and environments along a single branch, causal inference may be impossible (Lauder, 1982, 1990; Leroi et al., 1994; Maddison and FitzJohn, 2015; Uyeda et al., 2018).
A second, alternative phylogenetic approach tests the long-term association between phenotypes and environments, the classical comparative method (Harvey and Pagel, 1991; Martins, 2000). Approaches based on this association – including phylogenetic regression and ANOVA – are now increasingly common in comparative physiology and implicitly test for adaptation (Martins, 2000; Olson, 2021). However, the most common approaches primarily aim to avoid statistical problems in hypothesis testing, such as the non-independence of species (Hansen and Orzack, 2005; Hansen, 2014; Pennell, 2015). Such approaches do not directly model the process of adaptation of a phenotype to a particular environment (Hansen, 1997; Butler and King, 2004), which leads to statistical and interpretational problems. Hansen (1997) introduced a method, based on the Ornstein–Uhlenbeck (OU) process, that models evolution as stochastic movement towards an adaptively optimal state. This ‘Hansen model’ (Butler and King, 2004) has its foundations in earlier quantitative-genetic models of stabilizing selection, with long-term fit of a character state to its environment as positive evidence for adaptation (Hansen, 1997; Martins, 2000).
Despite its age, the Hansen model remains greatly underutilized. As work in comparative physiology becomes increasingly phylogenetic, we believe that understanding this approach and its advantages for studying physiological adaptation is timely. Thus, in this study, we first describe Hansen's (1997) OU model of adaptation in more detail. We next demonstrate its use in a case study of physiological evolution in treefrogs. We then discuss commonly misunderstood statistical properties of the model, challenges in implementation, and the effects of studying relatively few species. Finally, we provide data analysis tutorials to help researchers use the Hansen model in their own work.
critical thermal minimum
lower temperature threshold for 80% of peak jumping performance
maximum clade credibility
phylogenetic generalized least squares
thermal performance curve
rate of approach to the primary optimum
primary optimum in OU model
scale of change in models of evolution
rate of stochastic character change in models of evolution
Overview of the Hansen OU model of adaptation
The Hansen model has been described and discussed considerably in the evolutionary literature (see reviews in Hansen, 2014; O'Meara and Beaulieu, 2014). Yet proper interpretation of the model, as well as best practices for implementation, remain poorly understood (Hansen, 2014; Cooper et al., 2016a,b). We thus summarize the model here and distinguish it from more frequently used approaches. Throughout this discussion we use ‘environment’ to indicate a discrete selective factor (e.g. diet, habitat, biogeographic region); such factors are also commonly called selective regimes. We use ‘phenotype’ to indicate the value of a continuous trait that responds to selection in the environment.
When introduced for comparative biology, the OU process was originally described as a model of stabilizing selection in which α is the strength of selection and the BM diffusion represents genetic drift (Martins, 1994; Hansen and Martins, 1996). Although this population-genetic interpretation has intuitive appeal, macroevolution proceeds far too slowly for the OU parameters to be interpreted this way (Lynch, 1990; Hansen, 1997, 2012; Harmon et al., 2010; Uyeda et al., 2011). Thus, Hansen (1997) described θ as the primary optimum for an environment: the phenotype species in that environment would have if no other factor affected the focal trait's evolution. In contrast, species can be considered to have (and already be at) a species-specific optimum value for the focal trait. These latter optima differ from species to species because the focal trait is likely also under selection for other species-specific functions (Hansen, 1997), as implied by widespread trade-offs (Roff and Fairbairn, 2007). Moreover, other factors may constrain evolution toward the primary optimum, including correlation with other traits or genetic and developmental constraints (Hansen, 1997). Thus, even though all species are pulled toward the primary optimum for their current environment, their phenotypes may not take that optimal value. Importantly, many sources of deviation from the primary optimum are likely shared by closely related species. Thus, even under the OU process, closely related species are expected to deviate from the primary optimum similarly. This is the role played by the BM diffusion on the right-hand side of Eqn 2. Overall, another way to think about the OU process is that each lineage in the same environment is a somewhat independent evolutionary replicate adapting toward the same primary optimum (which we refer to as simply ‘optimum’ hereafter for brevity). All of these lineages deviate in their own idiosyncratic way, associated with other aspects of their biology that are shared with other species to varying degrees (Hansen, 1997).
The heart of the OU process as a model of adaptation is the idea that the current phenotype of a species results from an evolutionary history of adaptation to different environments. A consequence is that past adaptation may have residual effects on current phenotypes. Adaptation is not expected to happen instantaneously (Labra et al., 2009; Hansen, 2012), so when a species changes environment, it may take time to adapt toward the new primary optimum (Moen et al., 2016; Toljagić et al., 2018). The parameter α represents the rate of movement toward the optimum. Moreover, a more intuitive understanding of α can result from reparameterizing it as log(2)/α=t1/2, the phylogenetic half-life (Hansen, 1997; Hansen et al., 2008). This half-life indicates the expected time a species would take to adapt halfway from its current phenotype toward the optimum. If α is large, the pull of the optimum is strong, and species adapt quickly. The half-life has units of time, so whether a half-life is considered small or large partly depends on the length of the phylogeny studied. For example, a half-life of 10 million years (myr) means slow adaptation on a phylogeny 1 myr long but rapid adaptation on a 100-myr phylogeny.
As a concrete example, imagine studying evaporative water loss (EWL) rate in a lizard species that lives in a tropical dry forest. Phylogenetic analysis suggests that the species occurred in tropical rainforests earlier in its evolutionary history. Moreover, its current EWL rate is approximately 75% of the distance from a typical rainforest EWL rate (i.e. the rainforest optimum) towards a typical dry-forest EWL rate (i.e. the dry-forest optimum). This result could stem from an early shift to the dry forest coupled with a weak pull toward the dry-forest optimum (Fig. 1A). Alternatively, the species may have colonized the dry forest more recently with a strong pull toward the dry-forest optimum (Fig. 1B). Regardless, the history of adaptation of a species may result in a phenotype that is not close to the primary optimum of its current environment (Moen et al., 2016; Toljagić et al., 2018). Interestingly, in this way the OU process helps explain variation among species adapting to the same environment (Fig. 1). It also accommodates the expectation that evolutionary history matters when interpreting adaptation (Lauder and Liem, 1989; Lauder, 1990).
This explicit modeling of adaptation to differing environments throughout the history of a species is the major difference between the Hansen model and BM. This difference leads to two key interpretational problems when using BM in studies of adaptation. First, even when species change environments throughout their history, they will never be modeled as better adapted to one versus the other. BM only allows correlated changes between a phenotype and its environment, rather than phenotypic tracking to changes in environment (Hansen and Orzack, 2005; Hansen et al., 2008). So if the phenotype of a species is far from the optimum of its current environment and then the species changes environment, its phenotype will be modeled as equally far from the optimum of its new environment. This would occur even if a species were already close to the optimum of its new environment, which paradoxically may have been the cause of changing environment in the first place (i.e. high fitness in an environment may promote colonization of that environment). This property of BM has been called inherited maladaptation (Hansen and Orzack, 2005; Hansen et al., 2008).
Second, common approaches that often use BM to model species non-independence, such as independent contrasts (Felsenstein, 1985) and phylogenetic generalized least squares (PGLS; Martins and Hansen, 1997), only incorporate the phylogeny in terms of residual variation around a fitted relationship, not in estimating the relationship itself (Martins and Hansen, 1997; Hansen and Orzack, 2005). Many researchers do not distinguish this important detail, instead expecting that any kind of similarity among species should be accounted for in a phylogenetic analysis (Revell, 2010; Hansen and Bartoszek, 2012). Yet the phenotypes of closely related species may also be similar because they occur in and are adapting to a similar (or the same) environment as a shared common ancestor (Labra et al., 2009). Such inherited similarity in environment should not be removed by a statistical model (Hansen and Bartoszek, 2012; Taylor and Thomas, 2014). Doing so causes a misfit of the adaptive relationship (Hansen and Orzack, 2005; Hansen et al., 2008; Labra et al., 2009) and high Type-I error rates (Revell, 2010). OU methods explicitly account for this shared adaptation, as does the common approach of using PGLS to simultaneously estimate a regression line and phylogenetic signal (Revell, 2010; Hansen and Orzack, 2005). However, the latter approach still suffers from the problem of inherited maladaptation.
Overall, the Hansen model solves many of the problems of improper modeling of the biological process of adaptation. BM-based models are simpler to implement, but they tend to focus on statistical fixes and do not properly account for the adaptive process. We have described the Hansen model in its simplest form for testing adaptation: environments differ in primary optima (θ) but not in rate of approach to the optima (α) or rate of stochastic evolution (σ2). Likewise, we demonstrate the model's use for studying adaptation in this framework. However, more recent extensions to this model include continuous environments (Hansen et al., 2008), ANCOVA-like designs (Escudero et al., 2012), distinct α and σ2 across environments (Beaulieu et al., 2012), and multivariate applications (Bartoszek et al., 2012). Moreover, these methods are provided in a wide array of R packages that vary in their implementation details and options (O'Meara and Beaulieu, 2014). We return to these extensions, as well as limitations and challenges of the Hansen model, in the Discussion.
Case study: thermal physiology in hylid treefrogs
The latitudinal biodiversity gradient is a pattern of declining species diversity from tropical to temperate latitudes (Willig et al., 2003; Hillebrand, 2004; Mittelbach et al., 2007; Mannion et al., 2014; Pontarp et al., 2019). A key hypothesis for explaining this gradient is that species have rarely dispersed from the tropics to the temperate zone because of ecological niche conservatism (Farrell et al., 1992; Ricklefs and Schluter, 1994; Wiens and Donoghue, 2004), meaning lineages do not adapt to novel environments outside their native range. Species distribution models of geographic range limits have shown that temperature seasonality and cold extremes are the key climatic factors limiting colonization of the temperate zone (Willig et al., 2003; Wiens et al., 2006). However, the physiological constraints that cold temperatures impose are often unclear (Chown et al., 2004; Gaston et al., 2009; Bozinovic et al., 2011; Spicer et al., 2019).
Thermal tolerance (e.g. surviving cold temperatures) and the effects of temperature on functional performance (e.g. in locomotion) are two strong candidates for explaining how physiology may limit colonization of the temperate zone (van Berkum, 1988; Bozinovic et al., 2011). First, tolerance to low temperatures predicts northern range limits in thermally sensitive species (Calosi et al., 2010), and terrestrial ectotherms at higher latitudes and cooler climates tolerate colder temperatures (Snyder and Weathers, 1975; Kimura, 2004; Sunday et al., 2011, 2019; Bennett et al., 2021). Thermal tolerance can be represented by critical thermal minima and maxima, which are the lowest and highest temperatures, respectively, at which an organism loses locomotor ability (Lutterschmidt and Hutchison, 1997). Thermal minima and maxima are often close to absolute (i.e. lethal) limits and have been called the point of ‘ecological death’: if an organism cannot move, then its likelihood of survival drastically decreases (Cowles and Bogert, 1944; Huey and Kingsolver, 1989; Sunday et al., 2011). Lower thermal tolerances have been shown to be more evolutionarily labile than upper tolerances (Araújo et al., 2013; Bennett et al., 2021), and physiological traits in general tend to show low phylogenetic signal (e.g. Blomberg et al., 2003; Hertz et al., 2013; Krause et al., 2014). These patterns may both suggest adaptation in lower tolerances. For example, low phylogenetic signal in tolerances (e.g. von May et al., 2017) may result from adaptation to a selective factor (e.g. elevation) that itself shows relatively low signal. This is a situation where the Hansen model excels (Labra et al., 2009; Kozak and Wiens, 2010).
Second, in ectotherms, locomotor performance must be maintained at high levels despite varying body temperatures (John-Alder et al., 1988; van Berkum, 1988; John-Alder et al., 1989; Bozinovic et al., 2011). For example, high-latitude species have lower minimum field body temperatures (John-Alder et al., 1988), and such species and populations often perform better at cold temperatures than species from lower latitudes (John-Alder et al., 1988; Wilson, 2001; Li et al., 2018). Additionally, the breadth of temperatures at which organisms perform well usually increases with latitude (Navas et al., 2008), consistent with the increase in temperature variation at higher latitudes (Sunday et al., 2011). These results suggest that selection favors broad performance curves in temperate ectotherms in order to function well at cold temperatures (John-Alder et al., 1988; van Berkum, 1988), which may consequently affect biogeographic dispersal.
Treefrogs of North and South America (Amphibia: Anura: Hylidae) are an excellent group for addressing physiological evolution associated with the latitudinal diversity gradient. They occur at both high and low latitudes (temperate and tropical regions), with most species in the tropics (Duellman, 1999). Moreover, five hylid clades have northern range limits in Mexico, yet only one of them – Hylini, also known as the Middle American clade – has colonized the temperate zone (Smith et al., 2005, 2007; Wiens et al., 2006; Moen et al., 2009). By examining how physiology changed in Middle American hylids upon colonizing the temperate zone, we can infer what may have historically limited other clades from colonizing. Such factors may also be applicable to many other organisms.
Previous studies of thermal physiology and biogeography in anurans have shown broader thermal performance curves (TPCs) in cooler climates (Renaud and Stevens, 1983; John-Alder et al., 1988; Wilson, 2001). Moreover, critical thermal minima (CTmin) decrease as latitude increases (Brattstrom, 1968; Layne and Romano, 1985; John-Alder et al., 1988). Recent studies have examined the evolution of thermal tolerances in anurans (von May et al., 2017, 2019). However, the joint evolution of TPCs and CTmin, and particularly their role in historically limiting the transition of tropical anuran lineages into the temperate zone, has not been studied. More broadly, only one study has compared the evolution of both these characteristics as they relate to latitudinal differences in climatic niche, in Drosophila (MacLean et al., 2019).
Here, we examined the evolution of physiology in temperate and tropical hylid frogs. We estimated the CTmin of six temperate and six tropical species in the Middle American clade. We then estimated TPCs in jumping for each species and calculated the lower temperature (L80) at which the performance of each species significantly declined. We used the OU model of adaptation to test whether broad TPCs, higher cold tolerance, both or neither evolved when hylid frogs colonized the temperate zone from the Neotropics.
MATERIALS AND METHODS
We sampled species evenly across the Middle American clade (Hylini) of the frog family Hylidae (Fig. S1; Wiens et al., 2005; Faivovich et al., 2018). This clade is largely endemic to Middle America (Mexico to Panama), but it includes species in temperate North America, Asia and Europe. In the temperate zone, we collected six species from Oklahoma, Texas and Arkansas, USA, which included three early-spring breeders (Pseudacris fouquettei, Pseudacris crucifer and Acris blanchardi) and three late-spring breeders (Hyla cinerea, Hyla avivoca and Hyla arenicolor). These two clades independently colonized the temperate zone (Smith et al., 2005) approximately 57 and 44 million years ago, respectively (Jetz and Pyron, 2018). In the tropics, we sampled six species from Oaxaca, Mexico, in the municipalities of Santiago Comaltepec (Ptychohyla zophodes, Charadrahyla nephila, Exerodonta abdivita and Smilisca cyanosticta) and Pluma Hidalgo (Smilisca baudinii and Tlalocohyla smithii). These species represented six of the eight major clades of the Middle American clade (Fig. S1; Smith et al., 2007; Faivovich et al., 2018). We collected adult male frogs at night during the breeding season (temperate taxa: March–June and August 2017 and 2018; tropical taxa: June 2018; see Supplementary Materials and Methods). We sampled 5–7 individuals for most species (Table 1). These sample sizes are consistent with previous comparative studies of amphibian TPCs (John-Alder et al., 1988; Gvoždík and Van Damme, 2006).
We performed all work with appropriate collecting and animal ethics permits. Collection permits were provided by the Oklahoma Department of Wildlife Conservation (permit 5552719), Arkansas Game and Fish Commission (permit 032820171), Texas Parks and Wildlife Department (SPR-0416-112) and SEMARNAT México (SGPA/DGVS/004473/18). All work was done under Oklahoma State University ACUP AS-17-3.
Thermal performance curves
We examined TPCs in jumping, a frog's primary mode of locomotion (Gans and Parsons, 1966; Jenkins and Shubin, 1998; Mendoza et al., 2020). Different jumping variables such as velocity and acceleration have been shown to be positively correlated (Moen, 2019). Thus, we used peak velocity during take-off as the performance variable for generating TPCs, and we expect results to be similar for acceleration.
Following previous studies (John-Alder et al., 1988; Wilson and Franklin, 2000; Wilson, 2001), we collected jumping performance data at six different temperatures: 8, 14, 20, 26, 32 and 35°C. Though only 3°C separated our two highest temperatures, we selected the highest temperature based on previous anuran work on TPCs (reviewed in Navas et al., 2008). Moreover, we were not able to measure performance for S. baudinii at 8°C, nor for S. cyanosticta at 8 and 14°C, as they refused to jump (see also Navas 1996 for similar behavior seen in other lowland tropical species). Thus, we tested them by increasing their body temperature by 1°C each trial until we found the minimum temperature at which they would jump (S. baudinii >12.5°C; S. cyanosticta >16°C).
We initially housed all animals individually in the laboratory at 20°C, a typical active temperature for all study species (Duellman, 2001). We kept them at 20°C for 1 week to minimize any potential effects of acclimation on performance (Dunlap, 1980; Renaud and Stevens, 1983; John-Alder et al., 1988, 1989; Whitehead et al., 1989; Wilson and Franklin, 2000), though little evidence exists for acclimation-based performance differences in anurans (see Discussion). We fed frogs fruit flies or crickets (based on gape size) every other day until starting data collection, then fed them once in the middle of the week of performance trials.
To change body temperature, we placed the frogs in a water bath at the desired temperature until their body temperature reached the treatment temperature, as determined with an infrared thermometer (see Supplementary Materials and Methods for additional detail, including demonstration that our methods did not induce thermal shock in animals). We randomly assigned experimental temperatures to each frog and tested one or two temperature treatments per day, waiting at least 5 h between trials at different temperatures. Trials generally lasted 1 week. If an individual did not perform well at a given temperature (e.g. it fatigued before we could obtain a video), additional days were added to retest that temperature. Even though performance may gradually decline when individuals are measured over the course of a week (Zug, 1985), we found similar performance at the beginning and end of our experiments (Supplementary Materials and Methods).
We recorded jumping performance using a Fastec TS5 high-speed camera, filmed lateral to the jump at 250 frames s−1 and with an exposure time of 500–1000 µs. We stimulated each frog to jump by gently tapping its back leg or clapping behind it. We recorded 3–4 jumps during a given set of trials, based on the number necessary to obtain peak performance in previous studies (Nauwelaerts et al., 2007; Moen et al., 2013, 2021b; Mendoza et al., 2020). We re-established the test temperature by placing frogs in the water bath between all jumps.
To estimate peak velocity, we digitized videos with ImageJ v.1.52a (Schneider et al., 2012). These digitized coordinates yielded displacement-by-time data, which we smoothed using quintic splines to reduce digitization error (Walker, 1998) in the R package fda v.5.1.4 (Ramsay et al., 2009; https://CRAN.R-project.org/package=fda). We calculated velocity curves as the first derivative of the displacement curve (Walker, 1998; Moen et al., 2013). From the velocity curves we calculated maximum velocity values at each temperature for each individual. We digitized multiple videos for each individual and temperature, and we selected the video with the highest velocity to represent peak performance for a given individual at that temperature. In total, we digitized, smoothed and analyzed videos from 781 jumps across 75 individuals. Finally, for each individual and temperature, we standardized peak values to the peak performance for that individual (across temperatures) to use in further analyses (John-Alder et al., 1988, 1989; Herrel and Bonneaud, 2012). This standardization reduces potential differences in absolute performance within and across species owing to body size (Bulté and Blouin-Demers, 2006). See Supplementary Materials and Methods for further justification.
To characterize TPCs for each species, we compared two regression models. In these models, we regressed relative peak performance on the measured temperature at which frogs achieved the performance. For each species, we compared the statistical support for quadratic and Gaussian regression functions to estimate a single curve across all individuals within a species (Table S1; Angilletta, 2006). Both models allow the typical hump-shaped form of performance curves. However, the Gaussian approach can fit TPCs better than standard quadratic regression (Angilletta, 2006). For each species, we compared the models with AICc and their associated weights (Burnham and Anderson, 2002). We performed all curve-fitting analyses with base functions in R v.4.0.2 (https://www.r-project.org/). Comparing these two models is one of many approaches proposed for characterizing TPCs (Angilletta, 2006; Bulté and Blouin-Demers, 2006; Rezende and Bozinovic, 2019), which we discuss in more detail in the Supplementary Materials and Methods.
We chose the optimal regression model for each species based on the lowest AICc. This model was used to determine the range of temperatures at which each species exhibited high performance. We used the lowest temperature at which velocity dropped to 80% of peak performance (L80; John-Alder et al., 1988) as a measure of the lowest temperature at which a species still has high jumping performance, following previous work (e.g. Huey and Stevenson, 1979; John-Alder et al., 1988, 1989; Navas, 1996; Angilletta et al., 2002; Herrel and Bonneaud, 2012; Logan, et al., 2014; Kellermann et al., 2019). We also tested alternative performance thresholds (L70, L90 and temperature of peak performance) to examine sensitivity of our results to our 80% threshold. Overall, we focused on the lower end of the TPC instead of breadth of the curve, to emphasize colonization of the (cooler) temperate zone. Moreover, previous studies of both thermal tolerances and performance curves in terrestrial ectotherms have shown that upper (i.e. hot) bounds vary little with latitude (Snyder and Weathers, 1975; Addo-Badiako et al., 2000; Sunday et al., 2011, 2019) or when comparing temperate and tropical species (John-Alder et al., 1988; van Berkum, 1988). Nonetheless, we also tested curve breadth, finding nearly identical results in phylogenetic comparative analyses (Supplementary Materials and Methods).
To test whether tolerance to cold temperatures has limited tropical hylids from colonizing the temperate zone, we estimated each species mean of the critical thermal minimum (CTmin): the lowest temperature at which an organism loses the ability to right itself (Lutterschmidt and Hutchison, 1997). When compared with alternative measures of tolerance (e.g. lower-lethal limit), CTmin produces very similar results in interspecific comparative studies (Sunday et al., 2011, 2019).
As CTmin trials occurred after performance trials, frogs had been acclimated to 20°C for 2 weeks before measuring CTmin. We first placed frogs in a water-filled beaker, with the water level and beaker size adjusted to submerge each frog (John-Alder et al., 1988; Muñoz et al., 2014). We then placed the beaker in an ice bath, which reduced each frog's body temperature at a constant rate of approximately 1°C min−1 (John-Alder et al., 1988; von May et al., 2017, 2019). This rate should produce negligible lag between changes in water temperature and the core body temperature of small amphibians (e.g. Near et al., 1990). We also used an infrared thermometer on a subset of individuals to verify concordance of body and water temperature when conducting trials, which was true across body sizes. Once each frog started to reduce movement within the water, we conducted righting-response trials every 1°C. We removed frogs from the water and stimulated them to move by touching their legs with a thin piece of plastic to keep them from remaining motionless (e.g. as a defensive tactic).
To reduce measurement error, we calculated CTmin as the average of two temperatures: (1) the temperature at which a frog lost its ability to right itself within 60 s (Layne and Romano, 1985; John-Alder et al., 1988); and (2) the immediately preceding recorded temperature at which it could right itself. The actual CTmin for a given individual will lie between these two temperatures. We collected CTmin data from all species except for T. smithii, given an unexpected death after performance trials. We used mean±s.e.m. CTmin for each species in phylogenetic comparative analyses.
Phylogenetic comparative analyses
We extracted a tree of our 12 study species from the amphibian phylogeny of Jetz and Pyron (2018). That study represents the most comprehensive estimate of anuran phylogeny available. It also has branch lengths in units of time, which are the most appropriate units for comparative analyses (Butler and King, 2004; O'Meara et al., 2006). Jetz and Pyron (2018) presented a Bayesian posterior distribution of time-calibrated trees, from which we calculated a consensus for our 12 species. We first used the VertLife website (www.vertlife.org/phylosubsets) to download 1000 trees from the posterior distribution, each pruned to include only our 12 taxa. We then used TreeAnnotator (Bouckaert et al., 2019) to calculate a single maximum clade credibility (MCC) tree, with branch lengths estimated as their mean values across the posterior distribution of trees (Fig. 2). All trees in this posterior distribution had identical topologies for our 12 taxa (i.e. posterior probability of 1.0 for all nodes) and showed low variation in branch lengths, suggesting that phylogenetic uncertainty would not influence our results. We thus used the resulting MCC for all analyses (see Appendix S3 in Moen et al., 2021a).
We compared four models of phenotypic evolution to test whether each physiological trait (CTmin, L80) adapted with the colonization of the temperate zone from the tropics. We conducted separate analyses on the two traits to assess them individually. We compared BM and three different OU models. Two of our OU models explicitly modeled change in selective environments, or regimes (Hansen, 1997; Butler and King, 2004), such as changing from the tropics to the temperate zone. A strong fit to a BM model would indicate that similarity among species is best described by the amount of their shared evolutionary history. Support for our second model, a single-optimum OU process (OU1), would indicate that species are adapting to a single selective regime for the whole clade. For our third model, we fit a two-regime OU model (OU2) in which one selective regime was for temperate species and one for tropical. Support for this third model would favor the hypothesis that CTmin or L80 changed the two times that hylid clades colonized the temperate zone (Fig. 2).
Tropical species of the Middle American clade can be further subdivided into species that inhabit highland and lowland tropical climates (Smith et al., 2007). Given distinct seasonal climatic zonation along elevational gradients in the tropics (Janzen, 1967; Polato et al., 2018), one might expect that each climate has a separate optimum phenotype (Navas et al., 2008). Thus, our fourth model (OU3) had three OU regimes: temperate, tropical highlands and tropical lowlands. We consider only one temperate regime because five of our six temperate species occur only at low elevations, and we collected the sixth species (Hyla arenicolor) at low elevations. We categorized each tropical species as highland or lowland based on elevational distributions reported in Duellman (2001) and AmphibiaWeb (2021). Species with an elevational midpoint of 500 m and below were considered as lowland, whereas species with an elevational midpoint of 1000 m and above were considered highland. Importantly, we collected all highland species in this study above 1000 m and all lowland species below 500 m.
To compare models of phenotypic evolution of CTmin and L80, we used the R package OUwie v.2.6 (Beaulieu et al., 2012; https://CRAN.R-project.org/package=OUwie) to calculate likelihood support and parameter values. We then compared models based on AICc and their weights (Burnham and Anderson, 2002). Models 2–4 only differed in OU regimes and thus optima. While optima can be estimated well at small sample sizes (Beaulieu et al., 2012; Ho and Ané, 2013, 2014a; Cressler et al., 2015), estimating α – particularly different α for each regime – is much more challenging (Beaulieu et al., 2012; Ho and Ané, 2013). Therefore, we held α and σ2 constant across regimes. We assumed the stationary distribution for the ancestral phenotypic state in OU models (Ho and Ané, 2014a), given that this assumption produced more reasonable optima values in OU2 and OU3. Models OU2 and OU3 required assigning internal node states for area (tropical and temperate) and elevation (lowland and highland), respectively. We assigned these states based on previous estimates for the entire Middle American clade (Smith et al., 2007; Moen et al., 2009; Pyron, 2014), which we map on the phylogeny in Fig. 2 and describe in detail in the Supplementary Materials and Methods.
For CTmin, we analyzed species means with their standard errors (s.e.m.) to account for intraspecific variation and measurement error (Hansen and Bartoszek, 2012; Silvestro et al., 2015). We calculated standard errors using the small sample size method of Ives et al. (2007), which involves calculating a pooled estimate of sampling variance and allows assigning standard errors to species with only a single individual (see Ives et al., 2007, their Appendix S3). We could not calculate standard errors for L80 because species values were derived from a curve fit through data of all individuals. To test the potential consequences of having no standard errors for L80, we compared model fits with and without standard errors for CTmin (Supplementary Materials and Methods). As a further test of the robustness of our results, we repeated analyses of L80 after excluding data from Tlalocohyla smithii, given that its TPC was based on a single individual and thus more likely to be affected by measurement error.
General patterns in thermal performance curves
We found that the majority of the species-level performance curves were unimodal curves (Table S1), with performance peaking at intermediate to high temperatures (20–35°C) and decreasing towards extreme temperatures (Figs 3 and 4). However, two of the 12 performance curves were best fit by parameters that produced a monotonically rising curve (Table S1, Figs 3 and 4), showing the highest performance at the highest temperatures.
In the temperate zone, the lowest temperatures at which velocity dropped to 80% of peak performance (L80) ranged from 10.23 to 12.28°C for early-spring breeders (Acris blanchardi, Pseudacris crucifer and P. fouquettei). Late-spring breeders (Hyla cinerea, H. avivoca and H. arenicolor) showed higher L80 values, ranging from 14.78 to 16.46°C (Table 1, Fig. 3). L80 values for tropical highland and lowland species overlapped, with species means ranging from 10.67 to 17.59°C for highland species and from 13.74 to 27.45°C for lowland species (Table 1, Fig. 4).
General patterns in thermal tolerance (CTmin)
In the temperate zone, CTmin was lower in early-spring breeders (0.27–0.56°C) than in late-spring breeders (2.86–5.44°C; Table 1). Likewise, CTmin among highland tropical species (5.14–6.42°C) did not overlap with that of lowland species (6.90–8.92°C; Table 1). However, differences in CTmin in tropical lowland and highland species were generally much smaller than between early- and late-breeding temperate species (Table 1). Moreover, nearly all tropical species showed higher CTmin values than temperate species (Table 1).
Evolution of L80 and CTmin
BM was the most highly supported model for L80, with an AICc weight of 0.778 (Table 2). This means that similarity among species was best described by shared evolutionary history, rather than biogeographic region. The only other model with appreciable support was the single-optimum OU1 model (AICc weight=0.179), which means species are adapting to a single phenotypic optimum for the whole clade. We found nearly identical results for L70, L90, temperature of peak performance, and the lower half of thermal performance breadth (Table S2). Furthermore, our results were the same with and without Tlalocohyla smithii (Table S3), the species whose TPC we estimated from just one individual (Fig. 4). Thus, our results were robust to both the method of characterizing jumping performance at low temperatures, as well as to inclusion of a species whose TPC was based on few data.
We found that CTmin decreased when hylids colonized the temperate zone. The model with the most support for CTmin evolution was OU2 (AICc weight=0.729; Table 2), which had one CTmin optimum for temperate species (−1.40°C) and another, much higher, for tropical species (6.50°C; Table 3, Fig. 5). Results were nearly identical when excluding standard errors (Table S4), suggesting that L80 analyses were likewise robust to exclusion of standard errors.
In this study, we tested how thermal tolerances and performance curves evolved with the colonization of the temperate zone from the tropics in six tropical and six temperate hylid treefrog species. We found that CTmin most likely adapted by decreasing upon colonization of temperate climates. However, L80 showed no adaptation associated with biogeographic region. Our results suggest that increasing tolerance to low temperatures was a key to colonizing the temperate zone, whereas improving jumping performance at low temperatures was not. Here, we discuss these results further and address the prospects and challenges of implementing the OU process in studies of physiological adaptation.
Physiology of colonizing the temperate zone
We found that colonization of the temperate zone led to an adaptive reduction in CTmin in Middle American treefrogs. In a previous study on the latitudinal gradient of species diversity in hylid frogs, Wiens et al. (2006) examined the northern range limits of four hylid clades that have colonized tropical Middle America from South America, but have not reached the temperate zone. They showed that these limits were associated with higher temperature seasonality and lower annual temperature extremes. Our results suggest that the ecophysiological explanation for this pattern is niche conservatism in CTmin in tropical hylid frogs, limiting colonization of the temperate zone from the tropics.
Despite niche conservatism in most hylid clades, Middle American hylids colonized temperate North America twice from the lowland tropics (Smith et al., 2005; Moen et al., 2009). The Acris–Pseudacris clade arrived first, and species in this clade are generally early-spring breeders. For example, Pseudacris crucifer becomes active during winter months and can tolerate freezing temperatures (Schmid, 1982; Dodd, 2013), which is consistent with our CTmin results for species in this clade (Table 1). In contrast, Hyla colonized temperate North America later and comprises late-spring breeders (Dodd, 2013). Species in this clade have less cold tolerance than members of the Acris–Pseudacris clade, but more tolerance than tropical species (Table 1). However, how freezing tolerance relates to CTmin across species remains unclear. In particular, Hyla versicolor and H. chrysocelis can survive freezing temperatures (Schmid, 1982; Costanzo et al., 1992; Storey and Storey, 1992), despite being in the temperate clade with higher CTmin than the Acris–Pseudacris clade. Whether other Hyla species can survive freezing is an intriguing direction for future work.
In contrast to CTmin, biogeographic region did not affect L80 evolution. Our results resemble those of MacLean et al. (2019), who found that cold tolerance was associated with latitude in Drosophila, but aspects of TPCs (e.g. breadth, optimal temperature) were not. A similar contrast between tolerances and performance has also been found in actively thermoregulating taxa (van Berkum, 1988). Though frogs generally do not thermoregulate (Navas, 1996; Navas et al., 2008), they may use seasonal activity as a long-term thermoregulation strategy, waiting for optimal temperatures before becoming active (e.g. to reproduce, locate prey, disperse). But without migration, individuals must tolerate all climatic conditions during the whole year, even if inactive during much of that time (Ludwig et al., 2015). Thus, selection may more strongly favor tolerance to low temperatures upon colonizing the temperate zone than the ability to function at colder temperatures.
Given that Hyla and members of the Acris–Pseudacris clade show largely overlapping distributions (Duellman and Sweet, 1999), why do they differ in CTmin and L80 (Table 1)? The answer might relate to time since colonization and evolutionary ‘starting point’. The majority of Middle American hylids are found in the tropics at high elevations (Duellman, 2001), with lowland species nested within largely highland clades (Smith et al., 2007). Thus, species with physiology fitting highland climates may have made the initial colonization of Middle America, then later expanded to lowlands (Wiens et al., 2006; Moen et al., 2009). The Hyla clade that most recently colonized temperate North America (Fig. 2) shares its most recent common ancestor with the Smilisca–Tlalocohyla clade, the clade with the highest CTmin. This could explain why Hyla generally showed higher CTmin and L80 than members of the Acris–Pseudacris clade: their evolutionary starting point was higher, and they have been adapting to a temperate climate for less time. We note that this ‘time for adaptation’ forms an integral part of the OU model of adaptation (Fig. 1; Butler and King, 2004; Moen et al., 2016). Other approaches (e.g. phylogenetic ANOVA) assume all lineages have been in their current environment for their entire history (Hansen and Orzack, 2005; Hansen, 2014), potentially compromising the study of adaptation.
Lastly, we acknowledge that future work should consider additional species and a higher maximum temperature for characterizing TPCs. While our results clearly favored different CTmin optima for temperate and tropical hylids, the optimum for temperate taxa was below freezing (−1.40°C). This value is warmer than lethal thermal minima for many temperate anurans (Brattstrom, 1968), and thus reasonable, yet it was lower than the CTmin of any species we measured here (Table 1). Moreover, additional simulations suggested that increasing species sampling may increase our ability to distinguish OU3 from OU2 and allow for more precise estimates of OU optima (see Supplementary Materials and Methods, Fig. S2). We also found monotonically rising TPCs in two tropical lowland species (Fig. 4), which means the temperature of their peak performance may be higher than our highest tested temperature (35°C). If true, their L80 values here would be underestimates. Future work should thus consider higher temperatures, particularly for tropical lowland species.
Remaining questions in thermal physiology and biogeography
Both our approach and our results raise several questions for future research on the physiology underlying colonization of the temperate zone. First, are the relevant factors in hylids the same as those important in other organisms, including other families of anurans? Members of two additional families have also colonized temperate North America from the Neotropics: Bufonidae (Mendelson et al., 2011; Pyron, 2014; Portik and Papenfuss, 2015) and Microhylidae (Streicher et al., 2012; Pyron, 2014). These families differ from hylids in ecology and climatic niche (Dodd, 2013; Moen and Wiens, 2017). Thus, studying them could address whether tolerances matter more than locomotor performance (present study; MacLean et al., 2019), or if instead our results are specific to particular aspects of ecology. Moreover, examining additional taxa could determine the importance of time of colonization, as hypothesized above for Hyla.
Second, what role do other aspects of physiology play in colonizing the temperate zone from the tropics? Metabolism has often been compared in temperate and tropical amphibians. These studies have shown that low-temperature metabolism is higher in temperate than in tropical taxa (Brattstrom, 1968; Hutchison et al., 1968; Whitford, 1973; Snyder and Weathers, 1975; Feder, 1978, 1982; Walton, 1993; Navas, 1996). Future evolutionary studies that test adaptation in metabolism may reveal an additional important factor in determining patterns of species richness.
Finally, what role does acclimation play in colonization of the temperate zone? In this study we accounted for acclimation by holding individuals at a constant temperature for 1 week (TPCs) or 2 weeks (CTmin) prior to data collection, following previous studies. Moreover, most studies show no acclimation to cold temperatures in locomotor performance in adult anurans (Putnam and Bennett, 1981; Whitehead et al., 1989; Knowles and Weigl, 1990; Wilson and Franklin, 2000; but see Padilla et al., 2019), particularly in non-aquatic species (Wilson et al., 2000), which we studied here. However, nearly all anuran acclimation studies have been performed on temperate species. A synthesis across ectothermic taxa suggested that tropical organisms may show a higher capacity for physiological acclimation (Seebacher et al., 2015), but comparisons of temperate and tropical amphibians show the opposite pattern in metabolism (Feder, 1978, 1982). Moreover, acclimation effects on CTmax in anurans seem much weaker in tropical species (Brattstrom, 1968; Mahoney and Hutchison, 1969; Christian et al., 1988; Navas et al., 2008), and no work of which we are aware has examined how acclimation affects CTmin in anurans. If acclimation has been important in past colonization of the temperate zone in hylids, we would expect that lineages closely related to successful colonists (e.g. Smilisca closely related to Hyla; Fig. 2) would show high propensity to reduce CTmin after acclimation to lower temperatures. For example, Smilisca baudinii has a broad climatic distribution, from the tropical lowlands (where we sampled) to populations in south Texas, USA. Understanding whether this broad distribution is due to genetic change, plasticity or both could clarify the role of acclimation in colonization of temperate climates.
Prospects and challenges of the Hansen model in comparative physiology
The Hansen model has seen limited use in comparative physiology. Part of the reason is that analyzing the model is less direct than analogous approaches, such as phylogenetic ANOVA. Uncertainties in best practices (Ho and Ané, 2014a; O'Meara and Beaulieu, 2014) lead to further confusion, and statistical properties of the method have been questioned in recent years (e.g. Boettiger et al., 2012; Ho and Ané, 2014a). Thus, we describe here six key hurdles to using the Hansen model, and we outline strategies to explicitly address these issues in an empirical study. We also provide R Markdown tutorials to help readers understand how to use the model and interpret results (Moen et al., 2021a, Appendices S10, S14). Our discussion reveals that many problems are less severe than often characterized in the literature, and we remain optimistic about more widespread use of the model to test adaptation.
(1) Ancestral-state estimation
A key implementation challenge of the Hansen model with discrete environments is specifying environments at internal nodes. Here, we sampled few species from our clade of interest (12 of 197 species; AmphibiaWeb, 2021). Thus, we used previously estimated states based on a much larger sample (Moen et al., 2009) to prevent inaccuracy owing to insufficient sampling (Salisbury and Kim, 2001). However, estimation of internal states for OU models is not trivial (e.g. Moen et al., 2016), and such estimates have been criticized for their uncertainty or inaccuracy (Cunningham et al., 1998; Cunningham, 1999; Losos, 1999). An increasingly adopted solution to overcome these challenges is using Bayesian stochastic character mapping to generate many possible character histories (Huelsenbeck et al., 2003; Bollback, 2006). One can then fit the Hansen model on each character history and integrate results over the uncertainty of internal states (e.g. Price et al., 2015; Grossnickle et al., 2020; Corn et al., 2021). Alternatively, a likelihood-based approach can be used to account for internal-state uncertainty, as outlined by Cressler et al. (2015). However, implementation of this approach is less straightforward than stochastic character mapping. Regardless of the strategy chosen for internal-state estimates, the exponential decay in the influence of adaptation to past environments means that a strong adaptive pull (i.e. high α) will quickly erase the influence of internal node states (Hansen, 1997; Butler and King, 2004). Therefore, when α is high, OU results are likely robust to ancestral-state uncertainty (Moen et al., 2016).
(2) Different statistical designs
Some types of predictor variables (e.g. continuous) and more complicated statistical designs (e.g. ANCOVA, multivariate) will present their own challenges. In particular, multivariate OU models (i.e. multiple traits adapting to the same environments) can require estimation of additional parameters (Bartoszek et al., 2012). It is also unclear whether the implementation of the multivariate, multi-optimum model (Bartoszek et al., 2012) shares some of the same statistical problems of multivariate, single-optimum OU models (Adams and Collyer, 2018). Further model developments, such as those of Clavel and Morlon (2020) and Clavel et al. (2019), may be necessary to ensure optimal statistical performance of OU models.
(3) Data from few species
Studies in comparative physiology are often limited by number of study species, given the difficulty of data collection (Lauder, 1990, 2003; Garland et al., 2005). For example, we collected and digitized high-speed video of 781 jumps at five field locations across two countries to obtain peak jumping performance of 75 individuals at six temperatures. Yet this effort rendered L80 values for just 12 species for our phylogenetic comparative analyses. Given that the simplest version of the Hansen model requires estimating more parameters than equivalent PGLS or ANOVA tests, researchers have right to worry when they have data for few species (Cooper et al., 2016a). However, the effect of sampling is more nuanced than simply comparing the number of parameters to species. For example, α may be hard to estimate regardless of species number (Beaulieu et al., 2012; Ho and Ané, 2013, 2014a; Cressler et al., 2015), but the optima can be estimated well with few species (Ho and Ané, 2013; Cressler et al., 2015), particularly when at least one regime convergently originates two or more times in the phylogeny (Ho and Ané, 2014a). Simulations have shown that effect size, rather than sample size per se, may matter more for estimating optima (Ho and Ané, 2013, 2014a) and for comparing models that only differ in number of optima (Boettiger et al., 2012; Cressler et al., 2015). Our empirical results were consistent with these simulation results: with just 12 species, we found strong evidence for an OU model with multiple optima in CTmin (Table 2) because these optima were distinct (Table 3, Fig. 5). Moreover, parametric bootstrapping showed that comparison of our top OU model (OU2) with the simpler model with the highest support (BM) was statistically robust (Fig. S2), further suggesting that concern about small sample sizes really depends on the particular dataset. More generally, because optima are often the focus of model comparison (e.g. present study; Scales et al., 2009; Collar et al., 2011; Grossnickle et al., 2020), these results are promising for comparative physiology.
(4) Problems with model selection and parameter estimation
One criticism of model selection in OU studies is that even modest measurement error (if ignored in the analysis) can bias AIC-based model comparison to favor a single-optimum OU model over BM (Cooper et al., 2016b). To avoid this problem, most OU implementations accommodate measurement error (Beaulieu et al., 2012; Hansen and Bartoszek, 2012; Ho and Ané, 2014b). Moreover, the multi-optimum Hansen model shows much less error in model selection than comparing single-optimum OU models to BM (Cressler et al., 2015). A second criticism is that AIC-based model selection can be biased to favor more complex OU regime mappings, elevating Type-I error rates (Boettiger et al., 2012). As we have shown (Fig. S2), this is an empirical problem that depends on each individual dataset. Many of our own empirical studies have shown highest statistical support for models of intermediate, rather than highest, complexity (present study; Moen et al., 2016; Moen, 2019). Nonetheless, researchers can conduct parametric bootstrapping analyses to examine how much information exists in their data to distinguish models (Boettiger et al., 2012). A third statistical criticism is that parameter estimates can be imprecise, even with many species. α and σ2 can be hard to estimate separately, as high values of both parameters may be as likely as low values of both (O'Meara and Beaulieu, 2014). One solution is to focus on the estimate of stationary variance, which is a ratio of the two (σ2/2α), rather than the two parameters separately (e.g. Price et al., 2015). Estimating multiple α values is challenging for even very large datasets, so α may be best left constant across regimes when analyzing small datasets (Beaulieu et al., 2012; O'Meara and Beaulieu, 2014). Finally, when selective regimes originate only a single time on a tree, different optima can be impossible to distinguish (Ho and Ané, 2014a), so it is important to ensure at least one regime occurs on different parts of the tree. In summary, while researchers undoubtedly need to be aware of these concerns, many of them can be addressed with careful analysis.
(5) Testing the robustness of results
Currently, several tools are available to directly test the robustness of the results obtained from the Hansen model. First, the R package OUwie allows users to conduct eigenanalysis of the likelihood surface for α and σ2 to verify whether their estimates are stable. Second, both packages ouch and OUwie allow parametric bootstrapping of optimal models to calculate 95% confidence intervals for parameters (Scales et al., 2009; Boettiger et al., 2012). Third, parametric simulations to test statistical power and error rate are conveniently implemented in the R package pmc (Boettiger et al., 2012), which uses ouch to fit OU models. Such simulations are less developed to work with OUwie, so we have written a function that works similar to pmc but simulates data and analyzes models with OUwie. We include this function in Appendix S5 (Moen et al., 2021a) and demonstrate its use in the Supplementary Materials and Methods, where we examine the statistical properties of our dataset with parametric bootstrapping. We also recognize that methods are more frequently used when more resources exist to guide researchers. Thus, we demonstrate with tutorials how to use the three strategies to test for robustness of results (Moen et al., 2021a, Appendices S10, S14). Moreover, we encourage collaboration between researchers collecting empirical data and those more experienced in phylogenetic comparative methods. Such collaboration will lead to development and testing of more robust analyses and additional biological insight (Lauder, 2003; Cooper et al., 2016a; Waldrop and Rader, 2020).
(6) Potential misuse of the OU process
We caution against using the OU process in standard PGLS regression (Cooper et al., 2016b), as such analyses model evolution very differently than described in this paper (Hansen, 2014; Pennell, 2015). Instead of modeling adaptation of phenotypes to adaptive optima, typical PGLS analyses use the OU process to model residual variation around a regression line. This use has the same problems that we described in the Introduction for BM: inherited maladaptation and inability to simultaneously estimate the mean and error structure of a model (Hansen and Orzack, 2005; Hansen, 2014). When testing adaptation of a trait to a continuous environment, we encourage researchers to instead use the approach described by Hansen et al. (2008), which models adaptation as described throughout this paper.
Here, we reviewed the Hansen model to increase its accessibility and application in comparative physiology and biomechanics. In our case study, we examined the thermal physiology associated with biogeographic dispersal from the tropics to the temperate zone in Middle American treefrogs. We found cold tolerance (CTmin) important for explaining the transition, whereas jumping well at low temperatures (L80) was not. Our results suggest that the niche conservatism that has prevented most hylid clades from colonizing the temperature zone relates to intolerance to its seasonally cold temperatures. However, the physiological basis for such conservatism in other ectotherms has not been explicitly tested with phylogenetic analysis. Future work should use OU models of adaptation to examine the importance of cold tolerance, as compared with other physiological factors such as locomotion. Testing more factors, such as metabolism and acclimation, will also enrich our understanding of the origins of species richness patterns that vary over climatic gradients. As data are increasingly available for large-scale comparative studies in physiology and biomechanics (Higham et al., 2021), we anticipate that OU models will aid in testing the role of different traits in driving patterns of physiological and species diversity.
We thank Gen Morinaga, Monique Simon and two anonymous reviewers for helpful comments on earlier versions of the manuscript. For help with fieldwork and data collection in the USA, we thank Connor Adams, Sean Graham, Bryan Juarez, Robert McClure, Morgan Page, Korey Roberts, Madison Stevens, J. D. Willson and Mardi Wisdom. For work in Mexico, we thank Don Felipe Hernández Hernández, Carlos Flores Hernández, Medardo Arreortúa Martínez, Fortunata López, Candido Jacinto, the Finca Juquilita, and the communities of La Esperanza and Pluma Hidalgo, Oaxaca.
Conceptualization: D.S.M., A.R.H.; Methodology: D.S.M., E.C.-G., I.W.C.-S., E.G.-B., A.R.H.; Software: D.S.M.; Validation: D.S.M.; Formal analysis: D.S.M., A.R.H.; Investigation: D.S.M., E.C.-G., I.W.C.-S., E.G.-B., A.R.H.; Resources: D.S.M., E.G.-B.; Data curation: D.S.M.; Writing - original draft: D.S.M., A.R.H.; Writing - review & editing: D.S.M., E.C.-G., I.W.C.-S., E.G.-B., A.R.H.; Visualization: D.S.M.; Supervision: D.S.M.; Project administration: D.S.M.; Funding acquisition: D.S.M.
For funding fieldwork and preparation of the manuscript, we thank Oklahoma State University, the Southwestern Association of Naturalists (Howard McCarley Student Research Award to A.R.H.), the Biology Department at the University of Washington [Washington Research Foundation (WRF-Hall) Fellowship to I.W.C.-S.] and the National Science Foundation (IOS-1942893 and DEB-1655812 to D.S.M.).
All supplementary data are available from the Dryad Digital Repository (Moen et al., 2021a): https://doi.org/10.5061/dryad.t4b8gtj2m. R code and R Markdown tutorials are hosted on Zenodo (https://doi.org/10.5281/zenodo.5165563 and https://doi.org/10.5281/zenodo.5165565) and also available via a link on Dryad.
The authors declare no competing or financial interests.