MMGL: Multi-Scale Multi-View Global-Local Contrastive Learning For Semi-Supervised Cardiac Image Segmentation
Ziyuan Zhao, Jinxuan Hu, Zeng Zeng, Xulei Yang, Peisheng Qian, Bharadwaj Veeravalli, Cuntai Guan
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:11:43
Recently, visual transformers have shown promising results in tasks such as image classification, segmentation, object detection, etc. The explanation of their decision remains a challenge. This paper focuses on exploiting self-attention for an explanation. We propose a generalized interpretation of the transformers i.e model agnostic but class-specific explanations. The main principle is in the use and weighting self-attention maps of a visual transformer. To evaluate it, we use the popular hypothesis that an explanation is good if it correlates with human perception of a visual scene. Thus, the method has been evaluated against the Gaze Fixation Density Maps obtained in a psycho-visual experiment on a public database. It has been compared with other popular explainers such as Grad-Cam, LRP, Rollout, and Adaptive Relevance methods. The proposed method outperforms the best baseline by 2% in a standard Pearson Correlation Coefficient (PCC) metric.