Grisi C, Kartasalo K, Eklund M, Egevad L, van der Laak J, Litjens G
Med Image Anal 105 (-) 103663 [2025-10-00; online 2025-07-07]
Practical deployment of Vision Transformers in computational pathology has largely been constrained by the sheer size of whole-slide images. Transformers faced a similar limitation when applied to long documents, and Hierarchical Transformers were introduced to circumvent it. This work explores the capabilities of Hierarchical Vision Transformers for prostate cancer grading in WSIs and presents a novel technique to combine attention scores smartly across hierarchical transformers. Our best-performing model matches state-of-the-art algorithms with a 0.916 quadratic kappa on the Prostate cANcer graDe Assessment (PANDA) test set. It exhibits superior generalization capacities when evaluated in more diverse clinical settings, achieving a quadratic kappa of 0.877, outperforming existing solutions. These results demonstrate our approach's robustness and practical applicability, paving the way for its broader adoption in computational pathology and possibly other medical imaging tasks. Our code is publicly available at https://github.com/computationalpathologygroup/hvit.
PubMed 40644915
DOI 10.1016/j.media.2025.103663
Crossref 10.1016/j.media.2025.103663
 pii: S1361-8415(25)00210-5