November 17, 2021 | Updated: November 18, 2021
Think of where the state of medicine would be without the blood pressure cuff. Without it, it would be more difficult to understand the risk of strokes, heart disease and kidney disease. Blood pressure is what is known as a biomarker — a signal that can help medical researchers understand underlying disease mechanisms and biological responses to specific disorders.
For obvious reasons, biomarker identification has direct benefits for early detection and external interventions. When it comes to neurological disorders, neuroimaging techniques like magnetic resonance imaging (MRI), positron emission tomography (PET), and computed tomography scans are frequently used to identify underlying disease phenotypes that indicate the presence of Alzheimer’s disease or brain cancer.
Increasingly, researchers have applied deep learning (DL), which consists of advanced machine learning techniques and a subset of artificial intelligence, to the task of biomarker identification by studying and understanding neuroimaging results.
DL has one key advantage over traditional machine learning and statistical methods. It does not rely on hand-crafted datasets that are themselves the subject of human limitations (time, resources and research bias). Rather, DL can leverage information from raw data that is processed with minimal human intervention.
Because of this, the results we get for various downstream applications using DL are more accurate. They also open a variety of analysis opportunities, one of them being the discovery of new biomarkers. In other words, DL can help build tools that better predict and classify patients suffering from disease from healthy subjects. DL can also reveal potentially useful insights for medical research that was not known.
Apart from building tools that can predict and classify diseased patients from healthy subjects, DL has shown extreme advantages for biomarker discovery and potentially revealing some insights useful for medical research previously not possible or known. Here are three leading examples of how DL is forging promising new research in biomarker discovery.
Imaging the brain
Neuroimaging biomarkers are imaging-derived measurements that can tell us about the severity or degree of probability of a certain disease. For instance, the biomarkers related to Alzheimer’s disease, (AD) that can be measured by MRI include hippocampal volume and entorhinal cortical thickness. Since DL allows us to use raw data with minimal domain knowledge, multiple research studies have been able to identify additional candidate neuroimaging biomarkers for AD, as well as for other brain diseases, such as Parkinson’s disease, autism, schizophrenia and severe depression. The avenue of research is especially promising for identifying candidate neuroimaging biomarkers for other little-known brain disorders.
Since these studies are purely data-driven it opens the possibility of discovering many new biomarkers previously unknown to medical experts. I am involved with researchers at Arizona State University and Mayo Clinic in Arizona to identify such neuroimaging biomarkers for Post-Traumatic Headache. The results we see are not only overlapping with existing literature but hints at some interesting discoveries to be further explored coming from pure data-driven insights. Results will be published by early 2022.
The study of molecular biomarkers relates to cancer research. The goal is to learn more about the underlying reasons for tumor mutations in cancers of the brain, prostate and lungs. Deep learning has enabled researchers to understand these molecular “signatures” or biomarkers connected to morphological patterns using histological data. The major focus is on predicting complicated changes in gene expressions relating to mutations. DL techniques, which are effective at finding patterns in non-linear data, have shown great promise.
Combined data sources
Using only a single modality of data — i.e., only neuroimaging or only clinical data — often leads to non-conclusive results. It is difficult to understand complex and heterogeneous disease phenotypes with small data sets. Hence researchers are now focusing on developing “multi-modal” deep learning approaches that can incorporate multiple data points to train the model. In the case of Alzheimer’s Disease, researchers have shown that cerebrospinal fluid, PET, clinical and neuropsychological assessments, along with other biological markers can be used alongside deep learning techniques to improve biomarker discovery. However, significant challenges remain in the collection of such large-scale multi-modal data and DL techniques that can make the best use of that.
Growing fast, but challenges remain
Biomarker discovery using data-driven techniques is a fast-growing area of research with novel approaches being proposed based on problem definition and dataset availability. Despite the developments, challenges remain that need to be addressed. Medical datasets are often only collected in sufficient quantities by a few organizations and are not available publicly due to privacy concerns. Hence, access to that data and the research on top of it remains in-house to only a limited number of teams. Other public datasets are available but in limited quantities to really train a model outperforming standard benchmarks. However, a few institutions are slowly trying to address this gap. This should help independent researchers and graduate students explore novel ideas. Also, these deep learning algorithms are oftentimes subject to biased or wrong predictions if the dataset used to train the models is not carefully managed.
Therefore, regulations and standards should be established to validate these results using expert knowledge and consistent measures. We are also seeing an interest in explainable AI and interpretability methods of DL techniques that can explain these “black-box” models and improve the human-AI collaboration for new discoveries.