When two closely related species meet, there are a variety of different outcomes. They can compete with each other (competitive exclusion); they can change so that they don't compete (character displacement); and/or they can interbreed (hybridization). For each instance of this 'secondary contact', it's incredibly difficult to predict how likely each of these three options are, even between the same two species. This is really important because secondary contact is increasing with human influences on the world. We're bringing closely related species together, and we don't know what's going to happen. Don't panic. This is a delightfully interesting question to tangle with, which brings together field work, genetics, genomics, theory and statistics to answer.
Much of the StatsGen Lab research centres on the question of predictability in evolutionary processes. Can we predict which species will hybridize, or how many hybrids will be in each population? How does environmental variation affect the outcomes at secondary contact? What can genomic data tell us predicting outcomes? What are our theoretical expectations of the process we think are occuring we can see, and how can we use the resulting patterns to infer processes in wild systems?
We're a small but mighty group trying to address these questions that can be loosely sorted into three research themes.
If two species meet in more than once place, what do we expect to happen? Do we expect that the rates of hybridization are always the same? No. Firstly, we expect that the number of hybrids in different hybrid zones can be different for reasons entirely due to chance McFarlane et al. 2024. Secondly, we expect the enviroment to matter to these outcomes, particularly when the secondary contact itself might be caused by human influenced environmental changes. This means that, to understand the underlying processes leading the patterns we see, we need to understand a lot about environmental variation, while thinking hard about differences due to chance. This is delightfully complicated, and a topic many people in the lab are working on, including Jenna on Fescue, Sargon on Peromyscus, Amanda on Colias, and Marqus using simulations.
We use a lot of statistics to try to understand the natural world. Typically, our first step to answer a question is to collect data, and then do a statistical model determine if those parameters (such as the environment, the weather, the year, the species etc.) can explain the differences in what we're interested (such as the number of hybrid individuals in a population). If we see a statistical relationship between these things, awesome, and we then use that model to describe the relationship. Great, except there are many pitfalls to these models, including overconfidence (both statistical and emotional).
Both Amandeep and Eryn have been working on this lately. Amandeep asked how well a genome wide association study (GWAS) done on hybrid deer in Scotland (a project Eryn did as a postdoc with Josephine Pemberton) could be used to predict the phenotypes of deer that weren't used to build the model. Eryn, as a result of her previous postdoc work with the modelscape project with Alex Buerkle is having a blast using simulations to ask how well we can do prediction in general (Jahner et al. in review). Watch this space, this is a problem that the lab is generally thinking quite hard about.Finally, as in any quantiative lab, we're happily testing new methods on old problems whenever we can. This can mean lots of things, including using simulations so that we can see the patterns that occur when we know the underlying processes, and sometimes, even suggest new methods (oldie but a goodie - RepeatABEL lives on, see Martin Johnsson's excellent description of how to install RepeatABEL given GenABEL has gone offline.). For example, Santiago has been using publically available genotyping-by-sequencing data to ask whether different GBS databases can be used together, especially for common animals. Eryn and friend-of-the-lab Liz Mandeville have discussed how to use hierarchical (mixed effects) models to address stocastic and environmental sources of variation in replicate hybrid zones paper here. This is a point of constant interest for the lab. We know that all models are wrong, but some are useful (thanks George Box!), and some are more useful than others, so we're always keen to look at the tools (statistical and otherwise) to ask if we could do better.
Thank you to Yosuf who wrote the first draft of this research page!