Abstract:This study proposes an adaptive cross-stage lightweight detection model to address the challenges of high noise, feature loss, and low resolution in sonar image target detection. A lightweight feature extraction network with multi-scale attention was designed to enhance feature focusing, while a focus modulation network replaced the conventional SPPF to improve localization and recognition of key regions. Furthermore, an improved adaptive spatial feature fusion module was introduced in the prediction stage to expand the receptive field and strengthen both physical and semantic representations while maintaining compactness. Experimental results on real sonar datasets show that the proposed model achieves fewer parameters, about 4.7% higher detection accuracy, and over 18% faster speed compared with existing methods. The findings indicate that the model achieves a favorable balance among accuracy, efficiency, and complexity, offering an effective approach for future designs of underwater acoustic target detection.