Graphical Modeling and Analysis for Discovering Hidden Relationships between Diseases and Symptoms: A Data-driven Approach for Differential Diagnosis and Symptom Co-occurrence

Document Type : Original Article

Authors
Department of Industrial Engineering and Management Systems, Amirkabir University of Technology, Tehran, Iran
10.48305/him.2026.45792.1346
Abstract
Abstract

Introduction: Understanding hidden relationships between diseases and symptoms is one of the fundamental challenges in differential diagnosis and comorbidity analysis in medicine. Given the complexity of clinical interactions, identifying shared patterns among diseases can play a crucial role in improving the diagnostic process and clinical decision-making. This study proposes a data-driven, graph-based analytical framework for modeling and uncovering hidden relationships between diseases and symptoms.

Methods: This study utilized the publicly available Disease–Symptom Prediction Dataset from the Kaggle platform, consisting of 4920 records covering 41 diseases and 131 unique symptoms. Using a linear combination of occurrence frequency indices and the kappa coefficient, a weighted bipartite disease–symptom graph was constructed. Disease–disease and symptom–symptom unipartite graphs were then extracted, and their network structures were analyzed using the Louvain, Greedy Modularity, and Girvan–Newman clustering algorithms.

Results: The results indicated that the symptom fatigue and the disease dengue fever played a key role in the disease–symptom network. Disease clustering using the Louvain and Greedy Modularity algorithms identified meaningful clusters of related diseases. Symptom clustering with the Louvain algorithm revealed six clinically interpretable clusters, which could be useful for rapid disease identification. Moreover, seemingly unrelated symptoms could be associated with a common disease, while specific symptom clusters can guide the diagnosis of particular diseases.

Conclusion: The findings indicate that graph-based analysis can serve as an effective tool for uncovering hidden relationships between diseases and symptoms and can play an important role in improving differential diagnosis and designing intelligent decision-support systems.

Keywords

Subjects



Articles in Press, Accepted Manuscript
Available Online from 18 May 2026