Abstract:Underwater acoustic-optical image object detection serves as a core supporting technology for underwater intelligent operations and unmanned system collaboration, boasting irreplaceable application value in marine engineering, military reconnaissance, and other fields. The rapid iteration of Deep Learning technologies has provided new pathways for breaking through the technical bottlenecks of underwater object detection, yet the complex underwater detection environment has led to the lagging development of this field compared with terrestrial and aerial scenarios. To systematically sort out the technical context and clarify the development direction, this paper conducts a comprehensive review of the research progress on Deep Learning-based underwater acoustic-optical image object detection. Firstly, it traces the evolution of object detection algorithms, comparatively analyzes the technical frameworks, advantages and disadvantages of traditional handcrafted feature methods, Convolutional Neural Network (CNN)-based methods, and Transformer-based methods. Secondly, combined with the modal characteristics of underwater detection, it elaborates on the application status and adaptation strategies of Deep Learning algorithms in underwater optical image, sonar image, and acoustic-optical joint image object detection respectively. Finally, it dissects the core bottlenecks faced by the current technology and prospects the future research directions from the dimensions of dataset construction, model optimization, and cross-modal fusion. The collation and summary in this paper can provide theoretical references and practical guidance for the breakthrough and implementation of underwater object detection technology.