Weitz P, Wang Y, Kartasalo K, Egevad L, Lindberg J, Grönberg H, Eklund M, Rantalainen M
Bioinformatics 38 (13) 3462-3469 [2022-06-27; online 2022-05-21]
Molecular phenotyping by gene expression profiling is central in contemporary cancer research and in molecular diagnostics but remains resource intense to implement. Changes in gene expression occurring in tumours cause morphological changes in tissue, which can be observed on the microscopic level. The relationship between morphological patterns and some of the molecular phenotypes can be exploited to predict molecular phenotypes from routine haematoxylin and eosin-stained whole slide images (WSIs) using convolutional neural networks (CNNs). In this study, we propose a new, computationally efficient approach to model relationships between morphology and gene expression. We conducted the first transcriptome-wide analysis in prostate cancer, using CNNs to predict bulk RNA-sequencing estimates from WSIs for 370 patients from the TCGA PRAD study. Out of 15 586 protein coding transcripts, 6618 had predicted expression significantly associated with RNA-seq estimates (FDR-adjusted P-value <1×10-4) in a cross-validation and 5419 (81.9%) of these associations were subsequently validated in a held-out test set. We furthermore predicted the prognostic cell-cycle progression score directly from WSIs. These findings suggest that contemporary computer vision models offer an inexpensive and scalable solution for prediction of gene expression phenotypes directly from WSIs, providing opportunity for cost-effective large-scale research studies and molecular diagnostics. A self-contained example is available from http://github.com/phiwei/prostate_coexpression. Model predictions and metrics are available from doi.org/10.5281/zenodo.4739097. Supplementary data are available at Bioinformatics online.
PubMed 35595235
DOI 10.1093/bioinformatics/btac343
Crossref 10.1093/bioinformatics/btac343
pmc: PMC9237721
pii: 6589889