Details of Award
NERC Reference : NE/X009637/1
Generative adversarial networks for demographic inferences of nonmodel species from genomic data
Grant Award
- Principal Investigator:
- Dr M Fumagalli, Queen Mary University of London, Sch of Biological & Behavioural Sciences
- Science Area:
- Atmospheric
- Earth
- Freshwater
- Marine
- Terrestrial
- Overall Classification:
- Unknown
- ENRIs:
- Biodiversity
- Environmental Risks and Hazards
- Global Change
- Natural Resource Management
- Pollution and Waste
- Science Topics:
- Gene flow
- Genetic diversity
- Population Genetics/Evolution
- Artificial Intelligence
- Machine Learning (AI)
- Genomics
- Genomics
- Algorithm Development
- Bioinformatics
- Abstract:
- Understanding the temporal and geographic movement of populations is vital to address key questions in evolutionary and conservation biology. Whilst the generation of high-throughput genomic data enabled the inference of population genomic parameters at unprecedented rate, large-scale datasets also prompted the development of novel computational techniques. In recent years, the predictive power provided by machine learning algorithms, in particular deep learning, has led to breakthrough discoveries in many disciplines. Nevertheless, the application of deep learning in evolutionary genomics is still in its infancy. Deep learning algorithms exhibits several advantages over commonly-used inferential approaches in population genomics, as they can handle large data sets with minimal compression and are theoretically universal approximators of arbitrarily complex models. The intrinsic statistical uncertainty associated with genomic sequencing data, the lack of natural training data sets, and the computational resources needed have hampered the exploitation of these powerful techniques to generate novel findings in evolutionary biology. These challenges are particularly prominent in the study of nonmodel species, where prior knowledge of key parameters is typically missing. A promising strategy to partly overcome such barriers is given by the recent application of Generative Adversarial Networks (GANs), a branch of deep learning methods, which have been successfully applied to generate artificial genomes and estimate cryptic evolutionary parameters. GANs consist of two deep neural networks which are trained together and, at the end, the algorithm generates simulations that are indistinguishable from real examples (as in the case of "Deepfake" methods in Artificial Intelligence). Thus, the final simulator provides estimates of model parameters. In this project, we aim to to pilot the design, implementation, and deployment of a novel GAN architetcure for population genomic data. As an illustration, we will focus on the inference on demographic parameters, , including temporal changes in population size and migration rate, describing the recent evolution of Anopheles mosquito populations among three villages in Burkina Faso. As the first objective, we will adapt a recently proposed GAN architecture for population genomic data to incorporate multiple populations with unequal sizes. As the second objective, we will train the algorithm by integrating simulations with extensive genomic data from Anopheles mosquito populations. We will include a significant technological advance by integrating a model selection step to discriminate among competing evolutionary scenarios. By estimating the migration rate of mosquito populations among villages, we will be able to assist predictions on the spread of resistance mutations and support molecular surveillance and intervention strategies at local scale. In fact, it is still unclear to what extent resistant mutations can spread across the entire continent as different studies have led to contrasting findings on the extent of migration between Anopheles populations. Upon completion of this pilot study, we will be able to scale the deep learning algorithm to all available mosquito populations from sub-Saharan Africa and infer gene flow at the continental scale. Additionally, the novel deep learning framework will be applicable to all mutations potentially associated with resistance or other notable phenotypes. It can be further extended to model complex modes of adaptation (e.g. via introgression or polygenic adaptation) and to other species of importance.
- NERC Reference:
- NE/X009637/1
- Grant Stage:
- Completed
- Scheme:
- Standard Grant FEC
- Grant Status:
- Closed
- Programme:
- Exploring the frontiers
This grant award has a total value of £76,699
FDAB - Financial Details (Award breakdown by headings)
Indirect - Indirect Costs | DA - Investigators | DI - Staff | DA - Estate Costs | DA - Other Directly Allocated |
---|---|---|---|---|
£37,421 | £5,938 | £23,353 | £9,139 | £847 |
If you need further help, please read the user guide.