Did you know that smoking could lead to lung cancer? Or that untreated diabetes might cause blindness? This is what doctors call causal relationships between conditions. Documenting which diseases directly cause others has been a major challenge for medical researchers. Now, new international research has changed that.
In the study published in Bioinformatics, researchers developed an automated method for extracting causal relationships between diseases from scientific literature and created a map showing which conditions lead to others. This knowledge is already improving how scientists calculate genetic risk scores that predict one’s likelihood of developing specific diseases.
The Domino Effect of Disease
Most people know that Type 2 diabetes can lead to complications. However, the exact sequence—diabetes causing hyperglycemia, which causes microvascular disease, ultimately resulting in diabetic retinopathy—illustrates the domino effect one condition can have. Understanding these chains helps doctors anticipate problems before they develop and potentially intervene earlier.
The research team used sophisticated text mining techniques to scour thousands of medical journal abstracts. They weren’t just looking for diseases that commonly occur together (comorbidities) but specifically for statements asserting that one disease directly causes another. The team identified 8,191 unique causal relationships spanning 1,860 different disease categories.
To validate their findings, they cross-referenced them with real-world patient data from the UK Biobank, a massive database containing health information from over 500,000 participants. They checked whether diseases that supposedly had causal relationships showed statistical connections in actual patients and whether the timing of diagnoses matched expectations (cause preceding effect).
Better Risk Prediction
Researchers then transformed their findings into a mathematical structure called a directed acyclic graph (DAG). This allowed scientists to perform causal inference, a sophisticated form of analysis that goes beyond mere correlation to understand true cause-and-effect relationships.
When the researchers added their disease map to genetic risk scores, which estimate your chances of getting a disease based on your DNA, they found it made predictions more accurate. For example, combining risk scores for related conditions, like heart disease and the problems it can lead to, helped them better predict who might develop heart issues.
Untangling the Complex Web
Doctors could use this map of diseases to predict risks for conditions lacking extensive genetic data by analyzing the genetic risks of diseases that cause them. This method also helps untangle a common problem in genetics called pleiotropy, where one gene appears to influence several different conditions that don’t seem connected.
The research team found that many genetic variants previously thought to independently influence multiple diseases actually follow causal chains, affecting one disease and then causing another. More targeted treatments could be developed that address the root cause rather than just the symptoms.
This method can automatically analyze thousands of gene-disease combinations, which could change how we understand the links between our genes and different health conditions.
All the data, including the disease dictionary, full network of relationships, and the disease graph, are freely available through GitHub, allowing other researchers to build upon this foundation.
Diseases are complexly intertwined. By mapping the causal connections between conditions, scientists now have a powerful new tool to improve risk prediction, understand disease chain reactions, and potentially create more effective treatments that address the true origins of illnesses rather than waiting until they develop.