Abstract
1 min readLung cancer is one of the most common malignancies globally, with malignant nodules being an early indicator of the disease. Thus, accurate early diagnosis of lung nodules is imperative. Positron Emission Tomography-Computed Tomography (PET/CT) is a non-invasive imaging technique that provides both anatomical and metabolic information, playing a crucial role in the diagnosis of cancer. Existing deep learning-based multimodal fusion strategies often rely on the simple concatenation of features from two modalities, overlooking the intricate interactions between them. In this study, we proposed a multimodal information interaction framework named MI2A for the automated diagnosis of lung nodules using PET/CT imaging. Specifically, the lung parenchymal regions were cropped as regions of interest using a pre-trained U-Net model. Secondly, higher-order multimodal features from PET/CT scans were extracted and integrated using a custom-designed PET-CT Imaging Encoder (PCIE) module and a Cross-Attention Multimodal Encoder (CAME) module, respectively. Predictions were generated using multi-path pooling layers and a multi-layer perceptron (MLP) layer. Furthermore, an alignment loss function was designed to minimize the discrepancy between modality features during training. Finally, the proposed model was evaluated on an actual clinical dataset, achieving accuracy, precision, recall, specificity, and F1 scores of 0.9179, 0.8972, 0.8937, 0.9335, and 0.8954, respectively. In addition, the findings revealed that certain benign lesions, particularly those related to inflammatory or infectious conditions, displayed high metabolic activity, which is the main reason for limiting the model’s performance. This insight provides a promising direction for future research.
Discussion(0)
No comments yet. Be the first to comment.