Kannan V, Swartz F, Kiani NA, Silberberg G, Tsipras G, Gomez-Cabrero D, Alexanderson K, Tegnèr J
Sci Rep 6 (-) 26170 [2016-05-23; online 2016-05-23]
Health care data holds great promise to be used in clinical decision support systems. However, frequent near-synonymous diagnoses recorded separately, as well as the sheer magnitude and complexity of the disease data makes it challenging to extract non-trivial conclusions beyond confirmatory associations from such a web of interactions. Here we present a systematic methodology to derive statistically valid conditional development of diseases. To this end we utilize a cohort of 5,512,469 individuals followed over 13 years at inpatient care, including data on disability pension and cause of death. By introducing a causal information fraction measure and taking advantage of the composite structure in the ICD codes, we extract an effective directed lower dimensional network representation (100 nodes and 130 edges) of our cohort. Unpacking composite nodes into bipartite graphs retrieves, for example, that individuals with behavioral disorders are more likely to be followed by prescription drug poisoning episodes, whereas women with leiomyoma were more likely to subsequently experience endometriosis. The conditional disease development represent putative causal relations, indicating possible novel clinical relationships and pathophysiological associations that have not been explored yet.