We aim to improve the efficiency of traditional deep learning methods for remote sensing by reducing the reliance on annotated data and minimizing training time. Instead of using large-scale unimodal remote sensing image datasets for pre-training, we propose the use of multimodal data (text-image pairs), which we believe to be more effective. To enhance the model's generalization performance in the remote sensing domain and achieve accurate remote sensing image scene classification, we employ the Feature Adaptive Embedding Module. For this purpose, we introduce a cross-modal comparison learning network that is based on openly accessible generalized datasets. This network is capable of recognizing specific photo scenarios from remote sensing photographs, maximizing the accuracy of classification.
Discussion(0)
No comments yet. Be the first to comment.