UK Biobank is launching the world’s most comprehensive study of the proteins circulating in our bodies, which will transform the study of diseases and their treatments.
This project aspires to measure up to 5,400 proteins in each of 600,000 samples, including those taken from half a million UK Biobank participants and 100,000 second samples taken from these volunteers up to 15 years later.
This will allow researchers to explore a first-of-its-kind database, detailing how changes to an individual’s protein levels over mid-to-late life influence disease. The study will begin by analysing the first 300,000 samples, which will include initial samples from 250,000 UK Biobank volunteers and 50,000 second samples taken at follow-up assessments.
Measuring the abundance of thousands of proteins circulating in the blood enables researchers to investigate their potential role in many types of diseases that occur during mid-to-late life. This emerging research field – known as population proteomics – has demonstrated huge potential for diagnostics and therapeutics.
In October 2023, a pilot project released data on nearly 3,000 circulating proteins from 54,000 UK Biobank participants. The pilot was already the world’s largest study of its kind and led to research identifying over 14,000 links between common genetic variants and altered protein levels, over 80 per cent of which were previously unknown.
The research has already been cited over 400 times, laying the foundations for scientists to better understand how and why diseases develop. So far, studies using the data have led to advances in disease prediction and developing future targeted treatments for breast cancer, cardiovascular disease, Parkinson’s disease, and other brain illnesses.
This new study, which aims to increase this unique dataset by ten-fold, is being funded by a consortium of 14 leading biopharmaceutical companies, known as the UK Biobank Pharma Proteomics Project.
Professor Sir Rory Collins, principal investigator and chief executive of UK Biobank, said: “For the first time at this scale, researchers will be able to detect the exact causes of diseases by comparing how protein levels change over mid-to-late life in a large group of people.
“Proteomic data has already paved the way for better cancer, autoimmune and dementia diagnostics, and this truly exciting study of proteins will significantly speed up drug discovery, leading to major improvements in public health and care everywhere.”
UK Biobank’s proteomics dataset will allow researchers to examine proteomic and genetic data from half a million people simultaneously. UK Biobank released the whole genome sequencing of its half a million participants in November 2023.
Adding proteomic data will allow researchers to combine these massive datasets, providing a more detailed picture of the biological processes involved in disease progression. This may in turn drive the development of personalised treatments.
It will also enable researchers to examine how and why protein levels change over time. Half a million participants provided UK Biobank with a blood sample when they joined and 100,000 of them provided a second sample up to 15 years later.
Researchers will be able to see how protein levels have changed over mid-to-late life, enhancing understanding of age-related changes in healthy individuals and shedding light on how diseases develop. This will further accelerate research into diagnostic and prognostic markers.
Further, it will enable them to uniquely use proteomic data in combination with imaging data and open avenues for developing AI models. Already, machine learning tools can predict future disease many years before diagnosis, with the potential to shape early interventions.
The depth and breadth of the proteomic data held within UK Biobank may enable machine learning to accurately subtype diseases, which has the potential to inform what treatments should be given at the point of diagnosis.
Professor Naomi Allen, chief scientist of UK Biobank, said: “Proteomics provides an incredibly detailed snapshot of health. This new frontier of science can unveil how genetics and external factors – like diet, exercise and climate – interact, and will help to pinpoint the key causes of diseases and identify drug targets. It has already led to important scientific discoveries, such as identifying proteins that can help to diagnose disease – including multiple sclerosis – and helping to identify those at higher risk of developing dementia and cancer many years before clinical diagnosis.
“Over 19,000 researchers around the world are using UK Biobank data; adding proteomic data to everything else we hold will enable scientists to make rapid discoveries to help diagnose and treat life-altering diseases.”
It will take about a year to measure the protein levels in 300,000 participant samples. The proteomic data will be made available to UK Biobank-approved researchers 12 in staggered releases from 2026, with the full dataset expected to be added to the UK Biobank Research Analysis Platform by 2027.
During this time, additional funding will be sought to analyse samples from all remaining UK Biobank volunteers.
Dr Chris Whelan, director of neuroscience, data science and digital health at Johnson & Johnson Innovative Medicine and Pharma Proteomics project lead, said: “UK Biobank’s proteomic dataset has the potential to enable more powerful biomarker discovery, more accurate disease prediction, and more successful drug development.
“Analysing samples from two time points in the same volunteer will allow us to examine how protein levels change across hundreds of health and disease states over time, at an unprecedentedly large scale.
“This will represent one of the world’s largest ever biopharmaceutical research collaborations, underlining the growing importance of proteomics as a drug discovery tool. I can’t wait to see how the scientific community will explore these data to pinpoint molecular drivers of disease progression, disease subtypes, and ageing.”
Before the data are made available to UK Biobank-approved researchers, and in keeping with its Access policy, members of this industry consortium will have a short period of exclusive access. Any results gleaned will be returned to UK Biobank, further enhancing a ground-breaking health dataset accessible to approved researchers globally.