Predictions of Periventricular Fazekas Scores on MRI: Addressing Label Noise, Interpretability and Uncertainty
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Assessing white matter hyperintensities (WMH) using Fazekas scores — a scale from 0 to 3 that grades the severity — is essential for understanding the role of WMH in aging, estimating the risk of stroke or cognitive decline, and supporting the diagnoses of small vessel diseases. Predicting Fazekas scores directly using deep learning could provide a scalable alternative to the current WMH segmentation analysis. However, this direction is underexplored, probably due to limited model performance and the lack of inherent explainability in classification tasks. This thesis addresses those challenges by developing a multichannel 3D deep learning model that predicts the periventricular Fazekas score, trained on T1-weighted and FLAIR MR images of cognitively normal subjects. The proposed approach modifies the loss function 1) to address label noise caused by inter-rater variability driven by the diverse phenotypes of WMH, and 2) to incorporate the ordinal nature of the Fazekas scores. Therefore, different loss functions are investigated and their models compared. Heatmaps were produced by layerwise relevance propagation (LRP) visualizing relevant decision-driving features and thereby increasing the explainability of the model. As inter-rater variability influences model performance, interpreting misclassified cases is complicated. To address this, this thesis introduces a novel analysis that combines prediction uncertainty, estimated with Monte Carlo dropout, with t-SNE visualization. This framework helps to distinguish between cases that are misclassified due to class ambiguity and those that are truly misclassified. The best model was trained with the clipped ordinal log loss and achieved a Matthews correlation coefficient (MCC) of 0.71 in a nested cross-validation approach, outperforming other loss functions. The heatmaps created with LRP exhibited anatomically meaningful details and outperformed more commonly used techniques. Uncertainty mapping revealed that noise-robust and ordinal-aware loss functions improved not only performance but also the quality of the learned feature representation. This thesis demonstrated the feasibility of automated periventricular Fazekas score prediction with explainable deep learning. Heatmaps confirmed spatial plausibility, while uncertainty mapping provided an additional perspective on performance under label noise beyond standard metrics.