Improving Remote Sensing Change Detection Via Locality Induction on Feed-forward Vision Transformer

  • Lhuqita Fazry Universitas Indonesia
  • Mgs M Luthfi Ramadhan Universitas Indonesia
  • Wisnu Jatmiko Universitas Indonesia

Abstract

The main objective of Change Detection (CD) is to gather change information from bi-temporal remote sensing images. The recent development of the CD method makes use of the recently proposed Vision Transformer (ViT) backbone. Despite ViT being superior to Convolutional Neural Networks (CNN) at modeling long-range dependencies, ViT lacks a locality mechanism, a critical property of pixels that comprise natural images, including remote sensing images. This issue leads to segmentation artifacts such as imperfect changed region boundaries on the predicted change map. To address this problem, we propose LocalCD, a novel CD method that imposes the locality mechanism into the Transformer encoder. Particularly, it replaces the Transformer's feed-forward network using an efficient depth-wise convolution between two $1 \times 1$ convolutions. LocalCD outperforms ChangeFormer by a significant margin. Specifically, it achieves an F1-score of 0.9548 and 0.9243 on CDD and LEVIR-CD datasets.

Published
2024-02-25
How to Cite
Fazry, L., Mgs M Luthfi Ramadhan, & Jatmiko, W. (2024). Improving Remote Sensing Change Detection Via Locality Induction on Feed-forward Vision Transformer. Jurnal Ilmu Komputer Dan Informasi, 17(1), 37-48. https://doi.org/10.21609/jiki.v17i1.1188