最新刊期

    31 5 2026

      Review

    • 李波, 丛润民, 宋巍, 付先平, 董军宇, 杨嘉琛, 陆慧敏, 李华, 庄培显, 郭春乐, 韩向娣
      Vol. 31, Issue 5, Pages: 1285-1287(2026) DOI: 10.11834/jig.2600005
        
      191
      |
      57
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 156733944 false
      更新时间:2026-05-15
    • Embodied marine perception:a review AI导读

      Lu Huimin, Zheng Yuchao, Li Yujie
      Vol. 31, Issue 5, Pages: 1288-1299(2026) DOI: 10.11834/jig.250429
      Embodied marine perception:a review
      摘要:Artificial intelligence (AI) is increasingly permeating marine applications, significantly driving the development of autonomous marine systems. The continuous advancement of AI pushes intelligent agents into progressively complex and challenging physical environments, emphasizing the necessity for effective environmental perception and cognitive understanding within dynamic and uncertain marine contexts. The concept of embodied intelligence provides essential insights for addressing these challenges. Embodied intelligence underscores the importance of ongoing physical interaction between agents and their environments, substantially enhancing perceptual capabilities, environmental understanding, and decision-making through continuous real-time sensing-action loops. Driven by rapid advances in computational capabilities and foundational large-scale models, embodied intelligence has become a prominent research area within terrestrial and aerial domains. Despite notable advancements in these areas, applying embodied intelligence within marine environments introduces fundamentally different and considerably greater challenges due to their inherent complexity and uncertainties. Marine environments pose significant limitations to traditional perception technologies. Rapid attenuation of light severely restricts underwater visibility, substantially complicating optical sensing methods. Constraints on electromagnetic wave propagation limit wireless communication and navigation techniques underwater, making these conventional methods unreliable. Although acoustic sensing provides possibilities for long-range perception, it is challenged by considerable limitations, including low bandwidth, high latency, and complex multipath effects. Additionally, underwater intelligent agents operate within stringent energy constraints, severely limiting their capacities for sustained movement, prolonged data collection, and processing. These constraints make direct transfers of terrestrial or aerial intelligent solutions into marine environments difficult, substantially hindering the development of foundational marine perception capabilities. In response, this work introduces and systematically elaborates on the concept of embodied marine perception, which emphasizes proactive physical interaction by marine intelligent agents. Embodied marine perception integrates adaptive motion and morphological adjustments to actively acquire and fuse multimodal sensory information, encompassing visual, acoustic, haptic/force, fluid, and chemical perceptions. Such integrated multimodal sensing allows marine intelligent agents to construct comprehensive internal world models that are closely tailored to specific environmental contexts and operational tasks. For instance, underwater robots can autonomously execute optical and acoustic mapping missions, detect chemical concentration variations accurately through fluid traversal, and discern object properties through direct physical interaction. This proactive and multimodal perception approach offers effective solutions to address the inherent sparsity, uncertainty, and variability of marine environmental information. This work systematically reviews the state of embodied marine perception research internationally, highlighting key theoretical gaps and technological limitations. It defines critical scientific challenges, including achieving unified environmental representation from incomplete and unstructured sensory data, developing methods for task-oriented active perception and efficient environmental exploration, and facilitating the emergence and adaptive evolution of perception strategies through continuous physical interaction. This study proposes a comprehensive technological framework termed five embodied marine senses, integrating vision, acoustics, haptic/force perception, fluid perception, and chemical perception. By synthesizing these multimodal sensory capabilities, marine intelligent agents significantly enhance their environmental modeling accuracy, cognitive understanding, and decision-making abilities, thereby effectively addressing the complexities and uncertainties inherent in marine environments. Taking deep-sea mining operations as a representative scenario, this work elucidates the practical implications and transformative potential of embodied marine perception. Deep-sea mining operations require precise multimodal perception to navigate complex seabed terrains, accurately identify and evaluate mineral resources, and effectively manage operational risks and uncertainties. Embodied marine perception’s active perception paradigm enables underwater robots to dynamically adjust their sensory strategies and operational behaviors on the basis of real-time environmental feedback, thus significantly enhancing operational efficiency, reliability, and safety. Ultimately, this study envisions embodied marine perception evolving toward physics-integrated AI, highlighting the importance of continuous physical interaction for robust, adaptive, and reliable environmental cognition in marine contexts. It advocates for accelerated foundational research, targeted technological breakthroughs, and practical demonstrations in representative marine scenarios to drive future advancements in this critical area of embodied marine perception.  
      关键词:embodied intelligence;Embodied Marine Perception;Multimodal Perception;marine robotic;Deep-Sea Mining;artificial intelligence(AI)   
      486
      |
      325
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 152863616 false
      更新时间:2026-05-15
    • Survey of underwater image quality assessment research AI导读

      Xiao Fan, Duan Shuai, Wang Yaling, Ma Jiaxu, Cao Jingchao, Liu Yutao, Dong Junyu
      Vol. 31, Issue 5, Pages: 1300-1323(2026) DOI: 10.11834/jig.250516
      Survey of underwater image quality assessment research
      摘要:Underwater imaging has become a crucial sensing modality for marine science, environmental monitoring, offshore engineering inspection, underwater archaeology, and autonomous robotic operations. However, light propagation in water differs from air propagation due to wavelength-dependent absorption, multiple scattering, and the presence of suspended particles and dissolved matter. Collectively, these factors lead to characteristic degradations including severe color casts, contrast attenuation, veiling light, texture blurring, non-uniform illumination, and signal-dependent noise, thereby limiting visual interpretability and the reliability of subsequent computer vision tasks. Therefore, underwater image quality assessment (UIQA) has emerged as a key research topic, providing quantitative measures of image usability, guiding the optimization of enhancement and restoration algorithms, and serving as an intermediate layer that connects imaging physics, perceptual characteristics, and application-oriented visual systems. Unlike general image quality assessment, UIQA must explicitly consider the physical imaging process in water and the strong coupling between degradation mechanisms and perceptual appearance. Additionally, this assessment must address the multidimensional nature of quality in underwater scenarios, where images that appear visually natural to human observers do not necessarily benefit from machine vision algorithms, and vice versa. This divergence has driven the field beyond traditional perceptual-consistency modeling toward highly comprehensive frameworks that incorporate physical interpretability and task relevance. This survey presents a systematic review of UIQA from physical, perceptual, and task-oriented perspectives. First, the physical foundations of underwater optical imaging are reexamined, including radiative transfer-based formulations such as the Jaffe–McGlamery model and its simplified variants, and an analysis is provided of how direct transmission attenuation and backscattered light jointly determine the formation of recorded image. Based on this framework, typical degradation categories—such as color shift induced by differential spectral absorption, contrast reduction and haze-like veiling caused by scattering, and texture loss related to forward scattering and motion blur—are discussed in terms of their physical causes and perceptual manifestations. This discussion forms the basis for subsequent quality modeling. Commonly used statistical criteria for evaluating UIQA algorithms, including rank-based correlation measures (SRCC, KRCC), linear correlation (PLCC), and error-based indicators such as RMSE, are then summarized, and their roles in characterizing prediction monotonicity, accuracy, and consistency with subjective ratings are clarified. A detailed review of existing UIQA-related databases is provided, highlighting differences in scale, scene diversity, annotation protocols (absolute scoring versus pairwise ranking), and intended usage (assessment of raw underwater images versus evaluation of enhancement results). Particular attention is also given to recent large-scale datasets with ranking-based and multidimensional annotations, which efficiently capture complex perceptual preferences and support data-driven modeling. From a methodological perspective, existing approaches are categorized into several paradigms: 1) traditional feature-driven methods rely on hand-crafted descriptors derived from color statistics, contrast measures, sharpness indicators, and imaging priors, with representative metrics such as UCIQE, UIQM, and CCF offering clear interpretability and low computational cost but limited capability in modeling nonlinear distortions and algorithm-induced artifacts. 2) Deep learning-based methods learn end-to-end mappings from images to quality scores using convolutional networks, attention mechanisms, Transformer architectures, and, more recently, state space-based models. These methods enable the joint capture of local textures and global context, achieving improved prediction performance and enhanced cross-dataset generalization. 3) Physics-guided deep models incorporate estimated transmission maps, backscatter components, or degradation embeddings into neural architectures to enhance interpretability and align prediction with imaging mechanisms. 4) Task-driven quality assessment redefines image quality in terms of its utility for downstream tasks such as detection and segmentation, using task performance indicators as supervisory signals and revealing systematic discrepancies between human perceptual quality and machine-oriented effectiveness. On this basis, several open challenges are discussed: the high cost and limited scale of reliable subjective annotations; domain shifts across water types, depths, and imaging conditions; the tradeoff between model complexity and interpretability; and the absence of widely accepted frameworks that jointly consider human perception and machine task requirements. Future directions are expected to include multidomain dataset construction, highly efficient annotation strategies, physics-informed representation learning, and adaptive evaluation mechanisms that dynamically adjust quality criteria according to specific application scenario. Therefore, a closed loop can be formed among quality assessment, enhancement, and task performance to support reliable underwater visual perception in real-world marine environments. The methods, datasets, and evaluation metrics mentioned are linked at:https://www.scidb.cn/s/eum6zf and https://github.com/OUC-AI/UIQA.  
      关键词:underwater vision;underwater image quality assessment (UIQA);database;evaluation method;deep learning   
      424
      |
      744
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 148994262 false
      更新时间:2026-05-15
    • A review of underwater image enhancement and restoration technology AI导读

      Luo Hu, Wen Jiabao, Li Zhengjian, Chen Desheng, Xi Meng, He Jingyi, Yang Jiachen
      Vol. 31, Issue 5, Pages: 1324-1349(2026) DOI: 10.11834/jig.250479
      A review of underwater image enhancement and restoration technology
      摘要:Underwater image processing has emerged as a rapidly growing research field owing to its broad range of applications in marine biology, geological exploration, underwater archaeology, and military reconnaissance. With the increasing deployment of autonomous underwater vehicles, remotely operated vehicles, and intelligent sensing platforms, the ability to acquire high-quality visual information from challenging aquatic environments has become a critical prerequisite for effective perception, navigation, and decision-making. However, underwater optical imaging remains an inherently ill-posed and complex problem, primarily because light propagation in water is severely affected by absorption, scattering, wavelength-dependent attenuation, and the presence of suspended particles. These effects jointly cause blurring, reduced contrast, severe color casts, and information loss, which significantly deteriorate the visual quality and the utility of underwater imagery for downstream tasks such as detection, recognition, mapping, and semantic understanding. Various enhancement and restoration techniques have been proposed over the past decade to address these degradation factors. Early research efforts were primarily grounded in physical imaging models that aimed to invert the underwater image formation process by incorporating assumptions about water optics, scene depth, or color priors. Representative methods include dark channel prior-based models, wavelength compensation and image dehazing approaches, and polarization-based restoration. These physics-driven techniques provided valuable insights into the fundamental properties of underwater light transmission and served as the theoretical foundation for subsequent developments. However, such approaches often rely on strong assumptions that may not hold in diverse underwater conditions, leading to limited robustness and generalization. The advent of deep learning has brought a paradigm shift to the underwater image enhancement community. Convolutional neural networks (CNNs) have been widely adopted to learn direct mappings between degraded and enhanced images, leveraging large-scale training data to capture nonlinear relationships beyond handcrafted priors. Several landmark works, such as underwater image enhancement network (WaterNet) and underwater image enhancement CNN (UWCNN), demonstrated that CNN-based models could outperform traditional algorithms in objective metrics and subjective perception. To further address the lack of paired ground truth data—a critical bottleneck in supervised learning—researchers introduced generative adversarial networks (GANs) and domain adaptation strategies. Methods like fast and efficient underwater image enhancement model based on conditional GAN (FUnIE-GAN) and unsupervised generative network to enable real-time color correction of monocular underwater images (WaterGAN) exploited adversarial learning to generate realistic enhanced images without requiring pixel-perfect paired supervision. At the same time, semisupervised and self-supervised frameworks introduced perceptual and cycle-consistency losses to align the enhancement process with human visual preferences and cross-domain distributions. More recently, attention mechanisms and transformer architectures have been integrated into underwater image processing, reflecting the broader trend of vision transformers in computer vision. Transformer-based approaches excel at modeling long-range dependencies and global contextual information, which are particularly valuable in handling nonuniform lighting, scattering, and spatially varying color distortions in underwater imagery. Studies have reported that hybrid CNN-transformer designs or pure transformer-based pipelines can effectively balance local detail preservation with global contrast enhancement, yielding superior results across diverse datasets. In parallel, the emergence of diffusion models and frequency-domain learning further expanded the methodological landscape, offering new opportunities for capturing multiscale scattering effects and restoring fine structural details that are typically lost in conventional enhancement methods. Beyond enhancement and restoration, underwater image super-resolution (SR) has gained increasing attention as a complementary research direction. While most enhancement algorithms focus on improving contrast, color fidelity, and dehazing, SR techniques explicitly address resolution limitations caused by imaging sensors and environmental attenuation. By reconstructing high-resolution representations from low-resolution inputs, SR models improve the visibility of small-scale details and textures, which are crucial for scientific analysis and object recognition. Although earlier surveys seldom treated SR as a core category in underwater imaging, recent works have shown its significant potential and practical importance, particularly when combined with enhancement networks in a unified framework. Evaluation of underwater image processing methods is another critical challenge. Standard image quality assessment metrics such as peak signal-to-noise ratio(PSNR) and structural similarity indox measure(SSIM) are not always reliable in reflecting perceptual quality in underwater conditions. Domain-specific no-reference metrics, including the underwater image quality measure and the underwater color image quality evaluation, have been proposed to address this gap. More recent learning-driven metrics incorporate perceptual similarity networks or subjective human ratings to better correlate with visual realism and task performance. However, establishing universally accepted benchmarks remains an open problem given the subjective nature of visual quality and the variability of underwater environments. The availability of datasets has played a pivotal role in enabling the development and benchmarking of learning-based methods. Over the past decade, a number of representative datasets, ranging from synthetic datasets like WaterGAN-generated images to real-world collections such as underwater image enhancement benchmark dataset(UIEB), enhancing underwater visual perception(EUVP), UFO-120, and USR-248, have been released. Each dataset emphasizes different aspects, including paired/unpaired structure, subjective annotations, or application-specific degradations, collectively enriching the diversity of evaluation resources. Nonetheless, challenges such as limited scale, diversity, and alignment with real-world operational needs continue to hinder progress, motivating the development of larger, more representative datasets in the coming years.  
      关键词:underwater image;image enhancement;image restoration;super-resolution(SR);imaging model   
      543
      |
      962
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 147049069 false
      更新时间:2026-05-15
    • Underwater image segmentation methods: a survey AI导读

      Fang Hao, Yu Zongji, Chen Zhiyang, Cong Runmin
      Vol. 31, Issue 5, Pages: 1350-1371(2026) DOI: 10.11834/jig.250481
      Underwater image segmentation methods: a survey
      摘要:Underwater image segmentation serves as a foundational technology for numerous marine-related fields, including ocean engineering, underwater robot navigation, marine biodiversity monitoring, underwater resource exploration, and underwater archaeology. Its primary objective is to accurately extract target regions (such as marine organisms, underwater equipment, and seabed terrain) from underwater images that are inherently degraded owing to the unique underwater environment. Unlike terrestrial image segmentation, underwater scene segmentation poses unprecedented challenges, including light attenuation, water scattering, color distortion (predominantly blue or green tones caused by differential wavelength attenuation), complex target–background interactions, varying water depth effects, and the scarcity of high-quality annotated datasets. These challenges have driven the evolution of underwater image segmentation methods from traditional handcrafted feature-based approaches to modern deep learning-driven paradigms, with task granularity advancing from coarse to fine grained. This review comprehensively summarizes the state-of-the-art progress in underwater image segmentation, categorizing existing methods into three core tasks: underwater salient object detection (USOD), underwater image semantic segmentation (UISS), and underwater image instance segmentation (UIIS). USOD is the most fundamental segmentation task, aiming to rapidly locate visually prominent targets (e.g., fish and divers) from complex backgrounds and output binary foreground–background maps. Early USOD methods, including quaternion-based distance Weber descriptors and improved histogram equalization techniques, relied on handcrafted low-level visual features such as color, texture, contrast, and contour to model target-background differences. However, these methods lack robustness in low-contrast and color-distorted underwater scenes. With the advent of deep learning, convolutional neural networks (CNNs) have become the mainstream given their ability to automatically learn multilevel semantic features. Representative CNN-based models include SVAM-Net, which integrates bottom-up and top-down learning via a dual-branch architecture. To address the limitations of CNNs in capturing global dependencies, scholars have introduced transformer-based methods, such as TC-USOD with a hybrid transformer-convolution architecture and HEHP, which leverage heterogeneous experts and hierarchical perception to achieve state-of-the-art performance on datasets like USOD10K. Recent advancements have also seen the integration of vision foundation models (VFMs) such as the segment anything model (SAM) and diffusion models. Dual-SAM and MAS-SAM adapt SAM to underwater scenes by enhancing marine feature learning and refining fine-grained details, while DiffMSS and FSCDiff utilize conditional diffusion models to mitigate underwater image degradation and improve feature representation. UISS is a more refined task that assigns predefined semantic class labels (e.g., fish, coral, seabed, and water column) to each pixel, enabling structured scene understanding. Traditional UISS methods include threshold-based segmentation, clustering algorithms (e.g., k-means), and classifier-based approaches (e.g., SVM), which are only applicable to simple scenes with distinct interclass differences. Deep learning has revolutionized UISS, with models based on FCN, U-Net, and DeepLab series dominating the field. Key improvements include module optimization (e.g., UISS-Net with auxiliary feature extraction networks and SEA-Net with severity-aware dual branches), data augmentation techniques (e.g., multispatial transformation and CutStitch), and loss function innovations (e.g., combinations of cross-entropy loss, Dice loss, and boundary-aware loss). Transformer-based methods such as Swin Transformer and SegFormer have further enhanced performance by capturing long-range dependencies, with CoralSCOP (a SAM-based coral segmentation foundation model) achieving exceptional results on the large-scale CoralMask dataset. These advancements have significantly improved the ability of UISS models to handle complex underwater scenes with multiple categories and overlapping targets. UIIS is an advanced task that distinguishes individual instances within the same semantic class (e.g., different fish in a school), requiring category classification and instance differentiation. Because of its high complexity, UIIS primarily relies on deep learning frameworks adapted from terrestrial instance segmentation. Early UIIS models were based on Mask R-CNN, such as WaterMask, which introduces attention modules and boundary refinement to address underwater image degradation. YOLO series models (e.g., YOLOv9-N and AASNet) have also been applied for efficient instance segmentation. The integration of VFMs has marked a major breakthrough in UIIS: USIS-SAM adapts SAM with underwater domain prompts, MarineInst optimizes mask filtering to reduce oversegmentation, and UWSAM achieves efficient knowledge distillation from large SAM encoders to smaller models. Emerging architectures like Vision Mamba (e.g., UIS-Mamba) and self-supervised pretrained models (e.g., DINOv2-based DiveSeg) have further pushed the boundaries of UIIS performance, with DiveSeg achieving the highest mAP on datasets like UIIS and USIS10K. Meanwhile, various benchmark datasets and evaluation metrics have been proposed to support the development of underwater image segmentation methods. For USOD, datasets such as USOD10K (the largest and most widely used), UFO-120, and MAS3K cover diverse underwater scenes and target types, with evaluation metrics including S-measure (structural consistency), F-measure (balance of precision and recall), E-measure (robustness to degradation), and MAE (pixel-level error). UISS datasets like SUIM (the first large-scale semantic segmentation dataset), DeepFish, and CaveSeg provide annotated samples for different application scenarios, with core metrics including mean intersection over union (IoU), mean accuracy, and average accuracy. UIIS datasets such as UIIS10K and USIS16K (with 158 fine-grained categories) enable comprehensive evaluation of instance segmentation performance, using metrics like mAP, AP50 (loose IoU threshold), and AP75 (strict IoU threshold). Benchmark results show that hybrid architectures (combining CNNs/transformers/VFMs) and domain-specific optimizations (e.g., underwater image enhancement modules and depth-aware fusion) are key to achieving superior performance across datasets. Despite significant progress, underwater image segmentation still faces several critical challenges. The scarcity of high-quality annotated datasets, especially for deep-sea and rare marine species, limits the generalization of deep learning models. Additionally, the variability of underwater environments (e.g., light conditions and water turbidity) leads to poor cross-domain performance. Future research directions are focused on addressing the following challenges: 1) few-shot and weakly supervised segmentation to reduce reliance on large annotated datasets, leveraging metalearning and transfer learning with underwater physical priors; 2) open-vocabulary segmentation, integrating vision–language models (e.g., CLIP) to handle unseen target categories in real-world applications; 3) referential segmentation, enabling interactive target localization based on text descriptions or other reference information; 4) real-time multimodal segmentation, fusing RGB, depth, polarization, and hyperspectral data to meet the low-latency requirements of underwater robot navigation; and 5) cross-domain generalization, combining domain adaptation techniques and physical models to enhance robustness to environmental variations. In conclusion, underwater image segmentation has evolved significantly from traditional methods to advanced deep learning and foundation model-based approaches, with continuous improvements in accuracy, robustness, and applicability. The comprehensive overview of segmentation tasks, methods, datasets, and evaluation metrics provided in this review offers valuable insights for researchers in the field. Future advancements in low-annotation-cost, high-practicality, multimodal fusion, and cross-domain robust methods will further unlock the potential of underwater image segmentation, enabling efficient and intelligent exploration and utilization of marine resources.  
      关键词:underwater image segmentation;salient object detection(SOD);semantic segmentation;instance segmentation;deep learning   
      308
      |
      491
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 146768805 false
      更新时间:2026-05-15
    • Review of the evaluation and enhancement of underwater sonar image quality AI导读

      Lin Jie, Chen Weiling, Xu Xiaoyi, Zhao Tiesong
      Vol. 31, Issue 5, Pages: 1372-1393(2026) DOI: 10.11834/jig.250421
      Review of the evaluation and enhancement of underwater sonar image quality
      摘要:Sonar imaging serves as a foundational technology for deep-sea exploration, resource mapping, and underwater infrastructure inspection. However, inherent physical constraints (e.g., narrow bandwidth and acoustic wavelength limits) coupled with complex underwater propagation effects (including water scattering, multipath interference, and reverberation) inevitably degrade sonar image quality. Such quality degradation commonly manifests as low resolution, intense speckle noise, and weak contrast. These quality issues directly impair visual perception and considerably reduce the accuracy and reliability of downstream tasks, such as target recognition. Therefore, developing effective sonar image quality assessment and enhancement techniques is crucial for ensuring the efficacy of ocean exploration missions. Although much progress has been achieved by sonar image processing research in recent years, existing review works primarily focus on underwater optical images or specific application tasks. A systematic review of sonar image quality assessment frameworks and enhancement algorithms, along with an in-depth analysis of their technical evolution, remains insufficient. Furthermore, the articulation of key future challenges and breakthrough directions in this field lacks clarity. To address this issue, this study, for the first time, reviews and constructs a collaborative quality assessment-enhancement research framework for sonar images, aiming to provide a systematic and comprehensive survey. Single stimulus, double stimulus, and paired comparison: Centering on the core dimension of reference image availability, we systematically classify objective quality assessment methods and conduct an in-depth analysis of the latest developments in each category. Full-reference image quality assessment (IQA): we emphasize the analysis of local entropy-structure fusion models (local entropy backed sonar image (LESQP) and sonar image quality predictor (SIQP)) tailored for sonar statistical characteristics and reveal how they leverage edge sparsity and patch consistency to overcome gray-level nonuniformity. Concurrently, the gain boundaries of multiscale, frequency-domain, and perceptual weighting strategies in evaluating sonar compression artifacts are pointed out. Reduced-reference IQA: Three representative approaches are distilled: partial-reference sonar image quality predictor(PSIQP) based on information entropy comfort, sonar image utility quality assessment(SIUQA) oriented toward task utility, and TPSIQA, a meta-learning fusion framework for small-sample scenarios. task and perception oriented sonar image quality assessment(TPSIQA) has a three-tier framework for bandwidth-limited transmission and integrates application-specific features via selective ensemble learning. No-reference IQA: centered on attribute consistency theory, a new paradigm for cross-scenario no-reference models is constructed. This theory asserts that high-quality sonar images must maintain statistical consistency across four critical attributes, namely, regional distinctness, geometric integrity, detail preservation, and cleanliness, regardless of scene variation. Furthermore, no-reference contour degradation measurement (NRCDM), unified quality assessment method for sonar imaging and processing (UASIP), and perception-cognition aware sonar assessment (PCASS) models are analyzed. NRCDM quantifies contour degradation via feature ratios and support vector regression (SVR) regression. UASIP builds upon attribute consistency theory, enforcing the four task-critical attributes for cross-scenario generalization. PCASS evaluates super-resolution reconstruction (SR)-reconstructed images via hierarchical feature fusion, and SRIQA mimics ventral visual pathway processing. With regard to quality enhancement, we trace the evolution of SR from traditional interpolation and frequency-domain methods to deep learning paradigms. Early convolutional neural network (CNN)-based approaches (e.g., ResNet blocks) and generative adversarial networks (e.g., multibranch generators) achieve 4× upscaling but struggle with texture fidelity. Recent breakthroughs integrate physical constraints and hybrid architectures. For example, STDPNet employs a dual-stream CNN-Transformer parallel structure to preserve global structures and local textures, overcoming the limitations of serial fusion. Its hybrid loss function, which combines mean absolute error, spectral loss, and local gradient-aware loss, effectively mitigates over-smoothing while enhancing edge integrity. Blind SR models address unknown degradation processes via dual-cross optimization, and MHGAN utilizes multiheaded adversarial learning to recover high-frequency details under data scarcity. Subsequently, the development trajectory of denoising techniques is reviewed. Early mean/median filters gave way to anisotropic diffusion, adaptive statistical modeling, and sparsity-driven optimization, whereas today’s leading approaches embed anisotropic guidance into the kernel itself or couple dual-stage U-Nets with edge-aware losses. Specifically, dual-stage U-Net suppresses photometric noise while preserving textures, and anisotropic guided filtering introduces adaptive regularization and directional weighting, achieving a 6.3 dB peak signal-to-noise ratio gain and superior edge retention. Optimizes spatial pixel ranking for computational efficiency. Restoration techniques leverage multitask learning. For example, deep adaptive phase learning corrects synthetic aperture sonar (SAS) phase errors without iterative optimization, and recurrent neural networks/long short-term memory networks model reverberation dynamics. Furthermore, we investigate the emerging direction of multimodal fusion. RepDNet applies reparameterization to side-scan sonar(SSS) despeckling within a lightweight multibranch CNN, and FLSSNet integrates CNN-Transformer hybrids for cross-modal feature transformation to enhance saliency detection. In experimental section, systematic performance comparison experiments for sonar image quality assessment algorithms are conducted on three public datasets (sonar image quality database(SIQD), super-resolution sonar image database(SRSID), and multi-scenario sonar image dataset(MSSID)). By quantitatively analyzing the performance of the three mainstream methods (i.e., full reference, reduced reference, and no reference) across different distortion scenarios, the experiments reveal substantial scene dependence in algorithm performance. For instance, PCASS exhibits prominent advantages in evaluating reconstructed images but struggles to adapt to other distortion types. SIQP is stable in traditional degradation evaluation but limited in reconstruction tasks. Moreover, validation experiments targeting general-purpose SR and denoising algorithms are performed to assess their applicability and performance boundaries on sonar images. The core value of this review lies in providing a systematic theoretical reference and technical roadmap for the advancement of sonar image processing; revealing algorithm performance bottlenecks and applicability boundaries through experiments; and offering crucial support for researchers to grasp the current situation, understand challenges, and plan future research.  
      关键词:sonar image(SI);image quality assessment(IQA);image enhancement;super-resolution reconstruction;image denoising   
      501
      |
      524
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 139329369 false
      更新时间:2026-05-15
    • Wang Yueying, Wu Hao, Qing Yuhao, Zhang Weidong, Shen Liquan, Xu Xin
      Vol. 31, Issue 5, Pages: 1394-1424(2026) DOI: 10.11834/jig.250439
      Progress in deep learning-driven intelligent perception and decision-making technologies for unmanned surface vehicles
      摘要:Unmanned surface vehicles (USVs) are rapidly emerging as one of the most promising unmanned maritime platforms, offering high autonomy, low operational costs, and long endurance, and they have demonstrated significant potential in missions such as hydrographic surveying, environmental monitoring, port security, search and rescue, and military reconnaissance. Despite these advantages, the ocean remains one of the most challenging and unpredictable operational environments owing to its dynamic and unstructured nature, where sudden illumination changes, wave-induced reflections, fog, rain, strong winds, adverse weather conditions, and limited communication links all introduce severe uncertainties for the perception and decision-making capabilities of USVs. Against this background, this paper presents a comprehensive survey of intelligent perception and decision-making technologies for USVs under complex maritime environments, with a particular emphasis on deep learning, multimodal sensor fusion, and cooperative swarm intelligence. The development history and overall system architecture of USVs are first reviewed, highlighting the evolution of hull platforms, onboard sensors, navigation and control modules, and communication subsystems and clarifying the technological foundations that have shaped their current capabilities. In terms of perception, recent advances are categorized in accordance with algorithmic paradigms such as convolutional neural networks (CNNs), transformer-based architectures, graph neural networks (GNNs), and deep reinforcement learning (DRL). Their roles in object detection, obstacle recognition, semantic scene understanding, sea-state perception, and multitarget tracking are evaluated. Multimodal sensing and fusion, including the integration of optical cameras, LiDAR, millimeter-wave radar, sonar, and inertial navigation units, are further analyzed, with an emphasis on feature- and decision-level fusion strategies that enhance robustness under adverse conditions. Representative maritime vision datasets, such as MODD, MaSTr1325, SMD, and SeaDronesSee, are introduced to reveal gaps in data diversity, annotation quality, and generalization capacity. In the decision-making domain, methods that transform perception outputs into safe and efficient navigation actions, ranging from traditional graph search and sampling-based planners to emerging artificial intelligence approaches such as imitation and reinforcement learning, are examined. Particular attention is given to the incorporation of environmental disturbances including wind, waves, and currents into dynamic planning, as well as the integration of international maritime rules such as COLREGs into collision avoidance strategies. Beyond individual vehicles, this survey extends to multi-USV cooperative frameworks that enable distributed task allocation, formation control, and swarm intelligence for collective perception and decision-making in large-scale missions. The analysis reveals that deep learning models have significantly improved perception accuracy in unstructured maritime scenarios compared with traditional rule-based or handcrafted feature methods, with CNNs excelling at feature extraction, transformers showing advantages in capturing global contextual dependencies in water-sky backgrounds, and GNNs and DRL extending perception into temporal and interaction-aware domains. Nonetheless, real-world deployment remains constrained by generalization bottlenecks, as models trained on specific datasets often fail under unseen weather, lighting, or sea conditions. Multimodal sensor fusion proves indispensable, given that cameras provide rich semantic information but degrade under poor visibility, radar and LiDAR offer reliable distance estimation but face issues of low resolution or cost, and sonar extends sensing underwater but poses challenges of alignment with surface sensors. While deep learning-based fusion techniques show promise, spatiotemporal misalignment under dynamic vessel motion continues to hinder robust performance. On the decision-making side, reinforcement learning approaches outperform classical planners in dynamic environments, especially for real-time obstacle avoidance and long-horizon trajectory optimization. However, their interpretability limitations and high sample complexity remain problematic for safety-critical applications, motivating research on hybrid frameworks that integrate domain knowledge such as COLREGs rules into learning processes. Multi-USV collaboration is recognized as a frontier for scaling maritime autonomy, with swarms offering distributed sensing, information sharing, and coordinated mission execution, yet unresolved challenges in communication latency, distributed consensus, and scalability restrict practical adoption. Overall, the findings of this survey highlight persistent gaps in dataset availability, computational efficiency on edge hardware, interpretability of deep models, and standardization of benchmarks, all of which restrict the path toward widespread operational deployment. In light of these findings, several future research directions are proposed. They include enhancing perception robustness under extreme conditions via domain adaptation and physics-informed learning; advancing cross-modal and domain fusion methods to integrate surface, subsurface, and aerial data; designing lightweight real-time models with embedded safety constraints and explainability for onboard decision-making; exploring distributed collaborative intelligence for scalable swarm autonomy; and establishing standardized open benchmarks for fair evaluation and industrial adoption. By systematically reviewing current progress, challenges, and prospects, this survey provides a structured reference for researchers and technical insights for practitioners, aiming to accelerate the development of intelligent, safe, and resilient USVs capable of autonomous operation in complex and uncertain ocean environments.  
      关键词:unmanned surface vehicle (USV);complex maritime environment;intelligent perception;deep learning;multimodal sensor fusion;autonomous decision-making;swarm intelligence   
      278
      |
      349
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 147850715 false
      更新时间:2026-05-15
    • Overview and prospects of underwater novel view synthesis AI导读

      Yuan Jieyu, Zhao Qianqian, Li Yujun, Zhang Yuanlin, Guo Chunle, Li Chongyi
      Vol. 31, Issue 5, Pages: 1425-1450(2026) DOI: 10.11834/jig.250469
      Overview and prospects of underwater novel view synthesis
      摘要:Underwater novel view synthesis (UNVS) is an emerging research area at the intersection of computer vision, computer graphics, and marine science. It aims to reconstruct complete 3D scenes from sparse or limited observations and generate photorealistic images from arbitrary viewpoints. Compared with traditional underwater imaging techniques, UNVS provides geometrically consistent scene reconstructions and enables flexible visualization of marine environments, thereby offering richer spatial information to support applications in marine observation, ecological monitoring, resource exploration, and digital ocean construction. With the rapid development of smart ocean initiatives globally, the demand for high-quality underwater 3D reconstruction and novel view synthesis has increased. Despite the remarkable success of novel view synthesis methods, such as neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS), in terrestrial environments, their direct application underwater remains severely limited. The underwater imaging environment is fundamentally different from natural scenes because of its complex optical propagation effects. Light attenuation is wavelength-dependent, causing severe color distortion and loss of contrast. Forward and backward scattering caused by water molecules and suspended particles further degrade image quality by introducing haze-like effects, blur, and noise. In addition, the refraction at the water-camera interface, the caustics caused by surface waves, and dynamic disturbances (e.g., moving marine organisms) pose additional challenges to stable reconstruction and consistent novel view generation. These unique physical and environmental factors make UNVS a highly challenging problem that requires specialized solutions. Despite the growing research interest and emerging progress in UNVS, comprehensive surveys remain scarce, particularly those that systematically analyze recent developments and provide a holistic perspective of the field. To address this gap, this study presents a comprehensive review of the current state of UNVS. First, we revisit the physical principles of underwater imaging and analyze how scattering, absorption, and multipath transmission affect visual data acquisition and subsequent reconstruction quality. Understanding these mechanisms is crucial because they form the theoretical basis for integrating physics-based priors into learning-driven approaches. Second, we provide an overview of representative methodological advances. Physics-inspired methods attempt to explicitly model underwater light propagation and compensate for degradation, and data-driven approaches, particularly those based on NeRF and its variants, leverage deep neural networks to learn radiance fields directly from observations. Recent efforts have further extended 3DGS, which offers order-of-magnitude improvements in rendering efficiency and is particularly attractive for real-time applications. Hybrid approaches that integrate physical modeling, deep learning, and multimodal sensing (e.g., combining optical and acoustic data) have also been discussed because they show promise in addressing the limitations of single-modality methods. By analyzing representative studies from domestic and international communities, we highlight their technical principles, performance metrics, and practical applications. Specific attention is paid to how different approaches address underwater optical effects, dynamic interference, and the restoration of complex scenes (e.g., coral reefs and shipwrecks). Comparative evaluations are provided across several dimensions, including geometric accuracy, visual fidelity, rendering efficiency, and robustness under varying turbidity conditions. These comparisons reveal that although remarkable progress has been attained, a trade-off often exists between reconstruction quality and computational cost, and current solutions are still far from achieving the generalization ability required for real-world deployment in diverse underwater environments. On the basis of this review, we identify several open challenges and potential future directions. One key challenge lies in disentangling medium properties from scene geometry, a task that requires improved joint modeling of water volume and object surfaces. Another important research direction is domain adaptation and generalization; models trained under one turbidity condition or geographic region often fail when they are applied elsewhere, highlighting the need for cross-domain robustness. Data scarcity also remains a bottleneck because collecting large-scale, high-quality underwater datasets is expensive and technically demanding. This situation suggests opportunities for leveraging unsupervised, weakly supervised, or simulation-to-real transfer learning strategies. Furthermore, real-time performance is essential for integration with underwater robotics and autonomous underwater vehicles (AUVs), and such integration calls for the development of efficient architectures and hardware-accelerated implementations. Furthermore, long-term perspectives point to the integration of multimodal sensing, the incorporation of physical priors into neural rendering frameworks, and the design of scalable systems that can operate in open, dynamic, and large-scale marine environments. In conclusion, UNVS represents a rapidly growing but still challenging area in vision and graphics research. By systematically reviewing the physical foundations, methodological advances, and representative applications, this study aims to provide a comprehensive technical roadmap for researchers and practitioners. The insights summarized herein are expected to guide future investigations; foster interdisciplinary collaboration across oceanography, robotics, and artificial intelligence; and ultimately accelerate the transition of UNVS from laboratory prototypes to practical tools that support marine science, ecological protection, and the sustainable development of ocean resources.  
      关键词:underwater imaging;underwater stereo observation;3d visual representation;novel view synthesis;underwater visual restoration;volume rendering;physics-based modeling   
      127
      |
      53
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 144741187 false
      更新时间:2026-05-15
    • A review of data-driven methods for predicting sea surface temperature AI导读

      He Qi, Tao Mengxin, Zhu Zihang, Song Wei, Du Yanling
      Vol. 31, Issue 5, Pages: 1451-1477(2026) DOI: 10.11834/jig.250496
      A review of data-driven methods for predicting sea surface temperature
      摘要:Sea surface temperature (SST) is a key variable that characterizes the thermal state of the ocean and reflects global climate change. Accurate prediction of SST is a fundamental cornerstone for numerous critical applications, including marine weather and climate forecasting, the sustainable assessment of fishery resources, the protection of marine ecosystems, and the socioeconomic development of coastal regions. As a result, SST prediction becomes a focal point of research in oceanography and atmospheric science. Currently, SST prediction methodologies are predominantly categorized into two paradigms: numerical and data-driven models. Numerical models, grounded in the fundamental principles of fluid dynamics and thermodynamics, solve complex partial differential equations with specified initial and boundary conditions to generate forecasts. They offer distinct advantages such as clear physical interpretability and the ability to reveal underlying oceanic processes, yet their performance is highly dependent on the accuracy of initial and boundary settings and imposes substantial demands on computational resources. By contrast, data-driven methods have emerged as a core technical pathway to enhance prediction accuracy. By leveraging statistical modeling and machine learning algorithms, they automatically extract spatiotemporal patterns and evolutionary laws of the marine environment from massive volumes of historical and real-time observational data. Requiring minimal prior physical knowledge, these methods excel at capturing intricate nonlinear relationships between oceanic variables, demonstrating remarkable flexibility and efficiency——particularly in short-to-medium-term forecasts (approximately 1–30 days) and regional-scale studies. With the rapid advancement of data acquisition technologies, including satellite remote sensing, buoy observations, and reanalysis techniques, the quality (e.g., precision and spatiotemporal continuity) and coverage of SST-related data have been significantly improved, laying a solid foundation for the development of sophisticated data-driven models. This paper presents a comprehensive and systematic review of data-driven SST prediction methods. An integrated research framework is constructed to synthesize state-of-the-art progress in the field. The analysis begins by clarifying the core characteristics and physical mechanisms of numerical and data-driven methods. Through detailed comparative analysis, this study highlights the unique advantages of data-driven approaches. These methods excel at capturing complex nonlinear relationships and adapting to diverse regional characteristics. At the same time, this review acknowledges the complementary value of numerical models. These models remain essential for long-term climate simulations and physical process interpretation. This study then follows a clear technical evolution trajectory. It covers the transition from statistical models and traditional machine learning to shallow neural networks and deep learning. Early research focused on statistical models such as canonical correlation analysis and autoregressive integrated moving average model(ARIMA). These models provided foundational contributions to SST anomaly and El Niño-southern oscillation(ENSO) prediction. Traditional machine learning models, including support vector regression and random forest, are also analyzed. This review explores their ability to handle nonlinear marine processes through feature engineering and hyperparameter optimization. The evolution continues into shallow neural networks like artificial neural network and adaptive neuro-fuzzy inference system. These models mine potential correlations between various marine environmental factors. They distinguish between single- and multi-factor synergetic frameworks to improve accuracy. Currently, deep learning represents the cutting edge of data-driven research. This review covers temporal sequence models such as long short-term memory(LSTM), gated recurrent unit(GRU), and temporal convolutional network(TCN). It also examines spatiotemporal fusion models like ConvLSTM and vision transformers. Special attention is paid to emerging large marine models, including WenHai, FengWu, and XiHe. These models integrate deep oceanographic domain knowledge into their architectures. They have achieved breakthroughs in high-resolution (≈ 3 km) and long-horizon forecasting. Furthermore, they effectively preserve fine-scale oceanic structures such as eddies and fronts. Beyond model architectures, this study conducts an in-depth analysis of multisource datasets. These datasets include in-situ observations from array for real-time geostrophic oceanography(ARGO) buoys and satellite remote sensing data like NOAA’s optimum interpolation sea surface temperature(OISST). This study also incorporates reanalysis data such as ERA5 and CMIP6 model simulations. These sources are compared on the basis of their spatiotemporal resolution and reliability. This review discusses how data quality and scale directly affect model performance. Through case studies in regions like the tropical Pacific and Chinese coastal waters, it evaluates model adaptability. This evaluation provides practical guidance for data-model matching in specific scenarios. For instance, remote sensing data isare highly suitable for regional field prediction using spatiotemporal models. By contrast, reanalysis data offer critical support for long-term climate trend analysis. This study further identifies the core challenges facing this research field. The most significant obstacle is the lack of model interpretability. Deep learning architectures often function as “black boxes”, which limits their scientific credibility. To address this limitation, this review proposes three promising solutions. First, it advocates for physics-informed machine learning to embed oceanic equations into loss functions. Second, it discusses inherently interpretable architectures like the Kolmogorov-Arnold network. Third, it explores post-hoc explanation methods such as shapley additive explanations(SHAP) and partial dependence plots(PDP). Large marine models also enhance interpretability by mapping latent variables to known physical modes, including patterns like the Pacific decadal oscillation. Another key challenge is the scale adaptation of multisource data. Datasets often exhibit inconsistencies in spatial resolution and temporal frequency. These inconsistencies lead to feature misalignment and reduced model generalization. This study advocates for a two-stage strategy to solve this issue. The first stage involves preprocessing steps like hierarchical data fusion to unify scales. The second stage emphasizes multiresolution feature learning during training. This approach helps the model exploit hierarchical information and compensate for data loss. Finally, this paper outlines future development directions for the global community. One primary goal is to expand to three-dimensional subsurface temperature structure modeling. This shift is necessary to capture vertical energy exchange processes. This review also emphasizes advancing interannual to decadal prediction capabilities. It suggests further integrating physical constraints to enhance model robustness. Ultimately, data-driven methods will play an increasingly pivotal role in advancing prediction accuracy. This comprehensive system serves as a systematic reference for research and technological innovation. It contributes to international collaboration in oceanography and climate science.  
      关键词:marine environment;sea surface temperature(SST);data-driven model;deep learning;spatio-temporal feature   
      141
      |
      433
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 146770162 false
      更新时间:2026-05-15
    • A survey of underwater visual datasets for ocean intelligence AI导读

      Li Hua, Li Zhiyuan, Liu Jiawei, Cong Runmin
      Vol. 31, Issue 5, Pages: 1478-1502(2026) DOI: 10.11834/jig.250483
      A survey of underwater visual datasets for ocean intelligence
      摘要:Underwater computer vision has emerged as a critical enabling technology for advancing marine scientific research and ocean engineering applications, attracting considerable attention from academic and industrial communities in recent years. However, compared with terrestrial imaging, underwater imaging presents unique and formidable challenges, including light attenuation, color distortion, scattering effects, and equipment limitations, which together make underwater computer vision tasks significantly more complex and demanding than their land-based counterparts. The construction of large-scale, high-quality, and diverse datasets serves as the fundamental resource for advancing underwater computer vision technologies, with their quality, quantity, and diversity directly influencing the training effectiveness, generalization ability, and overall performance of deep learning models. For a comprehensive understanding of the development trajectory, advantages, and limitations of underwater datasets, this paper presents a systematic survey of representative underwater datasets currently available in the research community. The survey encompasses three major domains of underwater computer vision, namely, underwater visual enhancement, scene understanding, and 3D reconstruction, providing a holistic and detailed view of the dataset landscape in this rapidly evolving field. In the underwater visual enhancement domain, we carefully analyze datasets specifically designed for image and video enhancement, color correction and restoration, and super-resolution reconstruction. These datasets directly address the fundamental and long-standing challenge of improving underwater image quality, which is a crucial prerequisite for enabling reliable performance in subsequent high-level vision tasks. For underwater scene understanding, we systematically examine a broad set of representative datasets across multiple core tasks, including object classification, object detection, semantic segmentation, instance segmentation, salient object detection, camouflaged object detection, and long-term object tracking. This comprehensive analysis clearly reveals the gradual progression from basic classification tasks to complex and integrated multimodal understanding challenges. The classification datasets have evolved from image-level classification problems to fine-grained species recognition involving hundreds of distinct categories. Detection datasets have expanded from relatively small-scale image collections to extremely large-scale benchmarks containing millions of bounding box annotations. Segmentation datasets have progressed from coarse pixel-level annotations to precise instance-level masks enriched with multiattribute labeling and contextual information. In the underwater 3D reconstruction domain, we explore datasets related to simultaneous localization and mapping, neural radiance fields, and 3D Gaussian splatting technologies. These emerging datasets reflect the growing research interest in comprehensive 3D scene understanding and reconstruction within underwater environments, explicitly addressing challenges unique to marine settings, such as severely limited visibility and unpredictable lighting. Through an in-depth analysis of existing datasets, this survey highlights and reveals several significant development trends in underwater computer vision dataset construction. First, dataset scales have expanded dramatically from hundreds of images in early collections to current million-scale datasets, demonstrating the research community’s recognition of the critical importance of large-scale data for training robust and generalizable models. Second, annotation precision has gradually evolved from coarse-grained image-level labels to fine pixel-level segmentation and multiattribute annotations, enabling sophisticated and reliable algorithmic developments. Third, task complexity has progressed from single-modal classification to multimodal fusion and even general foundation models, reflecting the increasing technical sophistication of modern underwater vision applications. The survey also identifies the adoption of automated annotation techniques as a crucial development in significantly reducing dataset construction costs and human labor requirements. The successful application of advanced models, such as the segment anything model , across multiple representative underwater datasets clearly demonstrates the potential of using powerful pretrained models for efficient, scalable, and semiautomated annotation generation. Moreover, multimodal data fusion has rapidly emerged as an important trend, with new datasets increasingly combining RGB imagery, sonar data, textual descriptions, and additional sensor information to provide rich and comprehensive scene understanding capabilities. Despite these encouraging advances, our analysis reveals several critical and persistent limitations in existing underwater datasets. Geographic coverage remains heavily biased toward developed regions and easily accessible coastlines, with insufficient representation of tropical deep-sea ecosystems or polar marine environments. Species and scenario diversity is still limited, with most datasets focusing predominantly on common marine organisms and typical water scenes, while coverage of rare, endangered, or cryptic species and extreme environmental conditions remains inadequate. The lack of long-term temporal data severely hampers the ability to monitor dynamic ecosystem changes, as most existing datasets represent short-term snapshots rather than continuous longitudinal observations. Annotation standards also vary significantly across datasets, with inconsistent quality control mechanisms negatively affecting interdataset comparability. On the basis of this analysis, we propose several promising future development directions for the underwater computer vision dataset community. The construction of large-scale pretraining datasets will become increasingly important for developing universal underwater foundation models and accelerating advances in self-supervised and weakly supervised learning approaches. Synthetic data generation technologies, supported by generative artificial intelligence techniques, hold great promise for addressing data scarcity challenges, particularly in the representation of rare species. Cross-domain generalization capabilities require the design of enhanced evaluation frameworks to rigorously assess algorithm robustness across different geographic regions, temporal periods, environmental conditions, and equipment configurations. This survey ultimately provides underwater computer vision researchers with comprehensive dataset resource references and development insights, thereby facilitating deep technological advances and broad applications in marine scientific research, ocean resource development, sustainable fisheries, and marine environmental protection. It will contribute to the gradual construction of comprehensive, high-quality, open-access, and standardized underwater dataset ecosystems that will provide robust and enduring data support for advancing human understanding, exploration, and long-term stewardship of fragile marine environments. The link is https://cstr.cn/31253.11.sciencedb.j00240.00173 or https://github.com/Linzy0227/UVD.  
      关键词:underwater datasets;underwater scene understanding;image enhancement;object recognition;3D reconstruction   
      320
      |
      303
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 151835718 false
      更新时间:2026-05-15
    • Survey of underwater light field imaging: theory and applications AI导读

      Zhuang Peixian, Wang Yihang, Zhang Xinheng, Liu Fei, Tong Junjie, Fu Zhenqi
      Vol. 31, Issue 5, Pages: 1503-1522(2026) DOI: 10.11834/jig.250255
      Survey of underwater light field imaging: theory and applications
      摘要:Underwater light field imaging, as an emerging cross-disciplinary technology, integrates advanced principles of light field imaging with the specific demands of complex underwater environments, thereby pioneering a new model for aquatic visual perception. Unlike conventional underwater imaging techniques, this technology enables the acquisition of more multidimensional visual information from real-world, complex underwater settings, effectively overcoming the limitations of traditional two-dimensional imaging in marine environments. Conventional underwater imaging technology is a two-dimensional projection recording of the three-dimensional light field, capturing only light intensity information within a limited angular range while losing the directional characteristics of light rays. This deficiency becomes particularly pronounced in complex underwater environments, resulting in compromised image quality due to multiple factors such as water absorption, scattering, and plankton interference. Through specially designed imaging systems, underwater light field technology simultaneously records the spatial distribution and directional information of light, therefore achieving complete four-dimensional sampling of the underwater light field. This comprehensive light capture capability enables the technology to acquire angular details that cannot be preserved in traditional two-dimensional imaging, providing a rich data foundation for subsequent underwater visual tasks. Although underwater light field imaging poses high-dimensional data challenges, this multidimensional representation improves the understanding capability of marine scenes and significantly boosts the performance of various underwater vision tasks. Consequently, underwater light field imaging has garnered increasing attention in computer vision and computational photography. To this end, this paper provides a comprehensive survey and in-depth exploration of relevant research over the past two decades, structured around a two-dimensional “theory-applications” framework. At the theoretical level, our survey begins with a detailed introduction to the fundamental models and mechanistic developments in underwater light field imaging. The evolution progressed from the initial seven-dimensional plenoptic function to a simplified five-dimensional function, ultimately culminating in the establishment of the four-dimensional model. This four-dimensional model of underwater light field preserves essential information, including spatial and angular light data, while simultaneously reducing the complexity of data acquisition and processing. This reduction in complexity has made the practical application of light field imaging technology in underwater environments feasible. Existing theoretical research on underwater light field can be categorized into three main phases: underwater light field simulation, underwater light field measurement, and underwater light field reconstruction. The theoretical developments and corresponding representative works are systematically summarized, encompassing aspects such as algorithm design, hardware equipment, experimental validation, and application scenarios. In addition, parameter calibrations for underwater light field are a critical component to achieve effective restoration and reconstruction, and existing calibration approaches are primarily classified into two categories: simulation-based methods and iterative-based methods. At the application level, our survey expounds on four major application scenarios and their technological breakthroughs: underwater image enhancement, expansion of underwater imaging distance, underwater image object detection and tracking, and underwater three-dimensional reconstruction. In terms of underwater image enhancement, this technology primarily leverages the spatial-angular information inherent in light fields to address critical challenges such as image blurring and color distortion caused by underwater scattering and refraction. By effectively distinguishing target signals from noise under water environments, these specific algorithms achieve significantly superior image enhancement results compared with conventional single-image enhancement methods. Regarding expansion of underwater imaging distance, the utilization of multiview data from underwater light field enables the optimization of light propagation models, effectively mitigating the limitations imposed by water absorption and scattering on imaging distance. As a result, imaging systems can capture clear and identifiable underwater targets at substantially greater ranges. In underwater image object detection and tracking, the multiview information provided by underwater light field data dramatically enhances the robustness and accuracy of detection algorithms, particularly in challenging underwater scenarios involving small target detection against complex backgrounds and rapid motion tracking. Underwater light field imaging also demonstrates unique advantages and has been successfully deployed in practical applications of marine observation. In underwater three-dimensional reconstruction, this technique enables the acquisition of depth information from single exposures without requiring active illumination or specialized scanning equipment. This capability streamlines the underwater 3D measurement workflow, providing novel technical solutions for seabed topographic mapping and underwater archaeological research. Finally, our survey analyzes the current challenges of underwater light field imaging, involving the highly unpredictable nature of intricate underwater environments, difficulties in developing high-precision sensors or devices, and bottlenecks in efficiently processing high-dimensional massive datasets. Moreover, this survey outlines future developmental directions that will focus on establishing more accurate underwater imaging models, studying more stability properties of underwater light field, developing miniaturized and highly robust sensors or devices, and designing more efficient high-dimensional data processing algorithms. Underwater light field imaging has evolved into a multidisciplinary technology that integrates innovations from various fields such as optical engineering, physical modeling, materials design, and artificial intelligence. This convergence of technologies positions underwater light field imaging as a potentially core technology in blue economy sectors such as marine science and underwater engineering. Furthermore, this technique is poised to play a crucial role in national marine development strategies, contributing significantly to the advancement of ocean exploration and utilization capabilities.  
      关键词:underwater light field;light field imaging;angle details;theoretical level;practical applications   
      512
      |
      470
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 129893125 false
      更新时间:2026-05-15
    • Guo Wei, Hua Xia, Li Denan, Cui Xiaopeng, Deng Lu
      Vol. 31, Issue 5, Pages: 1523-1544(2026) DOI: 10.11834/jig.250368
      Progress of research on polarization image restoration technology for complex underwater scenes
      摘要:As marine development and research continue to advance in terms of depth, underwater images find growing applications in marine scientific observation, underwater engineering exploration, and marine biological monitoring. However, because of the coupled absorption and scattering of light by water, combined with turbulence, suspended particles, and biological disturbances, underwater images exhibit multidimensional degradation (e.g., color distortion and blurred details), which severely influences their visual effect and practical value. Consequently, efficient and reliable underwater image restoration technology has become a research hotspot. Polarization imaging, which only requires an ordinary camera and a polarizer and balances low cost with high reliability, has emerged as one of the most promising technical approaches for image restoration in complex underwater scenes. In terms of technical foundation, water molecules absorb light with wavelength selectivity (blue light has the least attenuation and red light has the highest attenuation), leading to a blue-green tone in underwater images. Forward scattering by suspended particles blurs target edges, and backscattering causes image haze and distortion. On this basis, underwater polarization imaging physical models (e.g., the Jaffe-McGlamery model modified by Schechner et al. by introducing polarization analysis) have been presented, along with an overview of the principles and performance differences of four types of polarization imaging systems: time-division, amplitude-division, aperture-division, and focal-plane-division systems. With regard to technical progress, this study elaborates from three dimensions. First, the polarization difference method aims to suppress backscattering by analyzing the differences in light intensity across various polarization states. Focusing on estimating backscattered light intensity at infinity, selecting optimal orthogonal polarization image pairs, and establishing nonuniform models for dual polarization characteristics, this study examines in detail the evolution of various techniques and how they address issues, such as over-reliance on manual operations and overly idealized assumptions. Second, the physical degradation model-based method achieves image clarity restoration by optimizing the parameters of the imaging physical model and encompasses three technical branches: imaging model optimization, active circularly polarized light illumination, and refined estimation of transmittance. Imaging model optimization enables image restoration by constructing models that precisely characterize the physical mechanisms of underwater imaging. Active circularly polarized light illumination leverages the memory effect of circularly polarized light—specifically, its superior polarization preservation compared with linearly polarized light—to enhance the separation of scattered light from target light. As a core parameter in underwater imaging physical models, transmittance is estimated precisely through refined methods, directly improving the clarity of underwater polarization images. Third, deep learning methods are categorized into model- and data-driven approaches. Model-driven methods integrate underwater imaging physical models with neural networks, utilizing the robust feature extraction and mapping capabilities of deep learning while leveraging prior knowledge from physical models to compensate for data scarcity. Thus, they provide physically plausible and algorithmically advanced solutions for underwater polarization image restoration. Data-driven methods, by contrast, directly learn image restoration mapping through extensive labeled data, enabling an end-to-end transformation from input raw images to restored clear images. This study comprehensively reviews the aforementioned methods and provides in-depth discussions of their respective advantages and limitations. It also identifies current technical challenges, including insufficient consideration of forward scattering in imaging models, limited stability and generality of algorithms, difficulty meeting real-time requirements, and gaps between laboratory validation and real-world environments. Future research should focus on optimizing imaging models to accurately characterize physical processes, improving the cross-scene adaptability of algorithms, enhancing efficiency through model lightweighting and hardware acceleration, expanding multiscene polarization datasets, and strengthening validation in real environments. This study systematically reviews the research status and development trends of polarization image restoration technology for complex underwater scenes, offering a comprehensive reference for the advancement of this technology and facilitating its effective implementation in practical fields, such as marine resource development and environmental monitoring.  
      关键词:polarization imaging;underwater image restoration;polarization difference;physical degradation model;deep learning   
      383
      |
      1044
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 139329409 false
      更新时间:2026-05-15

      Image Processing and Coding

    • Zhang Weidong, Gao Xingyun, Zhou Ling, Lu Haoxiang, Zhao Wenyi
      Vol. 31, Issue 5, Pages: 1545-1556(2026) DOI: 10.11834/jig.250484
      PrioLLIE: latent space decomposition guided by illumination-prior constraints for low-light image enhancement
      摘要:ObjectiveMost existing low-light image enhancement methods, as a core branch of computer vision, primarily rely on deep learning frameworks, such as convolutional neural networks, vision transformers, or their hybrid architectures, to directly learn end-to-end mapping from raw low-light images (often suffering from low signal-to-noise ratio (SNR), uneven brightness distribution, and color information loss) to visually enhanced counterparts. This data-driven paradigm has gained widespread adoption because of its simplicity in network design and remarkable performance on large-scale labeled datasets. However, a critical limitation persists: these methods lack explicit illumination prior constraints tailored for illumination-reflection separation, a fundamental principle derived from Retinex theory that distinguishes between illumination components (responsible for global brightness distribution and environmental light effects) and reflection components (possessing object-specific attributes, such as intrinsic color, texture details, and structural information). Without such constraints, deep networks often fail to disentangle the two components during the enhancement process. For instance, when brightening dark regions to improve visibility, the model may mistakenly amplify inherent noise (e.g., sensor noise in low-light photography or compression artifacts) alongside proper signals, resulting in a grainy or noisy enhanced image. Similarly, the confusion between color information (part of the reflection component) and brightness fluctuations (part of the illumination component) frequently leads to color distortion. Common issues include an unnatural bluish tint in night scenes or an oversaturated warm tone in indoor low-light environments. Moreover, in pursuit of overall brightness balance, the network may over-smooth high-frequency texture details (e.g., fine lines in text, subtle patterns on fabrics, or edge contours of small objects), rendering the enhanced image visually blurry and lacking in clarity. These shortcomings become even increasingly pronounced in complex low-light scenarios that are close to real-world applications. For example, in scenes with local extreme darkness, such as shadowed areas under streetlights at night or dimly lit corners in indoor spaces, pure data-driven models tend to apply excessive brightness gains to dark pixels; such application of excessive brightness gains amplifies noise and causes local overexposure in adjacent regions with normal brightness. In cases involving color deviation (e.g., low-light images captured with incorrect white balance settings or under colored ambient-light-like neon signs), the model, without prior guidance on natural color distribution, may fail to correct the color shift and instead reinforce the distorted color palette. In addition, the poor generalization ability of these models poses a serious challenge: trained predominantly on specific low-light datasets (e.g., only outdoor night scenes or controlled indoor low-light environments), they struggle to adapt to unseen scenarios, such as low-light conditions in tunnels, rainy nights with light scattering, or backlit scenes with strong brightness contrast. This lack of adaptability makes it highly challenging to achieve balanced optimization across multiple key performance metrics, including brightness uniformity, noise suppression, detail preservation, and color fidelity. Simultaneously, it hinders the practical deployment of these methods in real-world systems, such as surveillance cameras, mobile phone photography, or autonomous driving vision modules. To address this problem, this study proposes a latent space decomposition method constrained by illumination priors; the method is referred to as prior-illumination-based latent space decomposition for low-light image enhancement(PrioLLIE).MethodSpecifically, we design a spatial prior fusion module, which combines the color information of RGB space and the brightness characteristics of the V channel in HSV space. This module extracts and fuses prior information to construct a robust illumination prior. Subsequently, we propose a latent vector extraction module that maps the image to the latent feature space for decoupled modeling of illumination components, with prior features being incorporated during the generation process. Then, we present a Retinex-driven cross-space decomposition module that accurately decomposes illumination components in the latent space while modeling reflection components in RGB space, thereby realizing complementary enhancement of content and illumination modeling.ResultQualitative and quantitative experiments on supervised datasets (i.e, low-light dataset(LOL) and single image contrast enhancer(SICE)) and an unsupervised dataset (i.e, different images collected from multiple cameras(DICM)) demonstrate that PrioLLIE achieves superior visual performance in terms of color restoration and contrast enhancement. It yields excellent quantitative results across the three datasets, particularly on SICE. PrioLLIE exhibits high performance in various metrics, including peak SNR, structural similarity index measure, learned perceptual image patch similarity, and DeltaE, further verifying the advanced capability of the proposed method.ConclusionThrough the design concept of combining prior knowledge with data-driven learning, PrioLLIE overcomes the limitations of pure data-driven methods. It provides a technical approach for low-light image enhancement that balances physical rationality and visual quality. Furthermore, it offers a reference for integrating prior constraints and latent space modeling into other image degradation restoration tasks. Future research can further expand the types of priors (e.g., texture and noise priors) and explore the lightweight deployment of the model in real-time low-light video enhancement scenarios.  
      关键词:low-light image enhancement;reflection component;illumination component;prior constraint;latent feature   
      274
      |
      354
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 143378841 false
      更新时间:2026-05-15

      Image Understanding and Computer Vision

    • Gong Zheng, Zhang Jialiang, Gao Feng, Gan Yanhai, Dong Junyu
      Vol. 31, Issue 5, Pages: 1557-1568(2026) DOI: 10.11834/jig.250318
      Wavelet-based multiscale residual aggregation network (WRANet) for Arctic sea ice forecasting
      摘要:ObjectiveArctic sea ice forecasting is a critical and challenging task with profound implications for global climate research, polar ecosystem protection, resource development, and international strategic interests. As a core indicator of sea ice distribution, sea ice concentration (SIC)——the fraction of a given ocean area covered by ice——provides essential parameters for studying ice-ocean-atmosphere interactions. Improving the accuracy of SIC prediction is paramount for refining global climate models, understanding the mechanisms of sea ice formation and evolution, and enhancing the early warning capabilities for extreme climate events. However, existing spatiotemporal forecasting models, particularly those based on deep learning, face two significant challenges that limit their performance. First, a substantial loss of high-frequency detail information exists. Mainstream models, particularly those reliant on convolutional neural networks, often use convolution and pooling operations that inadvertently discard fine-grained structural and textural details while reducing computational load. This loss is detrimental to the accurate prediction of intricate sea ice features, such as edges and small ice floes. Second, feature information is insufficiently utilized. Deep networks generate a wealth of rich feature representations at various intermediate layers containing crucial information at different semantic levels and scales. Relying solely on the final, most abstract feature map for prediction, many current models fail to leverage these intermediate features, thereby constraining the model’s expressive power and its ability to capture complex dynamics. This study aims to develop a novel deep learning architecture for Arctic sea ice forecasting; in this way, the limitations mentioned can be overcome. The primary objective is to design a network that can robustly express spatiotemporal features by preserving multiscale information and can efficiently utilize hierarchical features from all stages of the network to capture complex dependencies and improve overall prediction accuracy.MethodWe propose the wavelet-based multiscale residual aggregation network (WRANet), which enhances a standard encoder-decoder framework with three synergistic innovations, to address the key challenges in sea ice prediction. The cornerstone of our approach is the wavelet multiscale feature extraction module, which is designed to combat the loss of fine-grained information. Instead of using standard convolutions alone, this module first employs a discrete wavelet transform to decompose the input reversibly into distinct low-frequency (coarse structure) and high-frequency (fine details) sub-bands. Then, this frequency-separated representation is processed by a specialized multiscale convolution block, which uses parallel branches with varied kernel sizes to capture local and contextual information. After processing, a lightweight pixel-wise attention mechanism dynamically recalibrates the feature map to amplify important spatial regions while suppressing noise before an inverse wavelet transform reconstructs a feature-rich spatial map. This mechanism complements our novel progressive residual aggregation structure, which tackles the problem of insufficient feature utilization in deep networks. In this structure, the output from each feature extraction module is combined not with the previous layer’s output but always with the initial feature map from the encoder. This approach ensures a constant flow of foundational information and prevents its decay. Furthermore, instead of relying on only the final output, the structure aggregates all intermediate feature maps produced throughout the network, thereby creating a comprehensive, multilevel representation that captures a full spectrum of learned dynamics for the final prediction. The entire network is trained end-to-end using the AdamW optimizer.ResultOur model was evaluated on two public SIC datasets, the OSI‑450‑a (global sea ice concentration climate data record, release 3) and the AMSR2 (ASI‑AMSR2 sea ice concentration). Our training set covered the period from 2000 to 2010, with a validation from 2011 to 2015. We evaluated model performance using four accuracy metrics: root mean square error (RMSE), mean absolute error (MAE), Nash-Sutcliffe efficiency (NSE), and balanced accuracy (BACC), along with giga floating point operations per second (GFLOPs) to measure computational complexity. Our proposed WRANet demonstrated superior performance against six state-of-the-art baseline models, including ConvLSTM (convolutional LSTM network), PredRNNv2 (predictive recurrent neural network), SimVP (simpler yet better video prediction), TAU (temporal attention unit), WaST (wavelet-driven spatiotemporal predictive learning), and PastNet (physics-assisted spatio-temporal network). Quantitative evaluation shows that on the OSI-450-a benchmark, WRANet achieved the best results across all accuracy metrics, with the RMSE, MAE, NSE, and BACC being 6.44%, 2.02%, 97.03%, and 96.96%, respectively. These results represent a significant improvement over all baselines. For instance, on the OSI-450-a benchmark, WRANet reduced the RMSE by 2.33 percentage points compared with ConvLSTM and by approximately 0.5 percentage points compared with the highly competitive SimVP and TAU models. These accuracy gains were achieved with high computational efficiency, as WRANet required only 118.09 GFLOPs; thus, WRANet becomes more efficient and more competitive than other models. Additionally, our model demonstrated consistent state-of-the-art performance on the AMSR2 dataset, thereby confirming its strong generalization capability and robustness across different data sources.ConclusionThe WRANet network proposed in this paper successfully improves the capture of sea ice spatiotemporal features and enhances the model’s ability to model complex spatiotemporal dependencies by effectively integrating frequency domain analysis with multiscale feature extraction and efficiently utilizing intermediate network layer features. The proposed network provides an effective solution for high-precision Arctic sea ice prediction.  
      关键词:arctic sea ice prediction;spatio-temporal prediction;deep learning;wavelet transform;residual aggregation;sea ice concentration(SIC)   
      340
      |
      407
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 133634958 false
      更新时间:2026-05-15
    • Wang Yingjun, Yang Xiaopeng, Zhou Ling, Lu Haoxiang, Zhao Wenyi, Zhang Weidong
      Vol. 31, Issue 5, Pages: 1569-1582(2026) DOI: 10.11834/jig.250548
      Cross-scale adaptive frequency domain enhancement for maritime vessel detection
      摘要:ObjectiveShip detection plays a crucial role in modern maritime governance, traffic safety assurance, and the precise classification of civilian and military vessels. With the rapid growth of global maritime trade, particularly China’s shipping industry, ensuring maritime situational awareness and vessel tracking has become increasingly important. However, ship detection in open-sea environments is a challenging task due to the highly dynamic and complex nature of maritime scenes. Real-world conditions often include frequent inter-ship occlusions, severe weather disturbances, low visibility, and image degradation due to fog, motion blur, or sunlight glare. Additionally, cluttered backgrounds comprising waves, clouds, and reflections further complicate detection. These environmental complexities introduce visual ambiguities, including unclear object boundaries, distorted color information, and the loss of fine structural features, especially for small targets such as fishing boats, patrol craft, or distant vessels. Traditional object detection algorithms, including two-stage models such as Faster R-CNN and early one-stage models such as YOLOv3 and YOLOv4, tend to exhibit low boundary precision, high false-positive rates, and frequent missed detections under small or occluded ships. These limitations hinder accurate maritime monitoring and fail to meet the high standards required in civilian scenarios and military applications.MethodAiming to address the aforementioned challenges, using YOLO11 as the baseline model with targeted improvements, this paper introduces a ship detection method designed for complex maritime environments by integrating cross-scale perception with adaptive frequency-domain feature enhancement. While maintaining computational efficiency, the proposed method introduces two lightweight but effective modules to enhance robustness, localization accuracy, and overall detection performance. The first core component is the adaptive frequency-domain feature enhancement module (AFEM), which is designed to alleviate feature degradation in complex maritime scenes. AFEM enables effective separation of global structural information and local texture details by applying a Fourier transform to map spatial features into the frequency domain, thereby enhancing high-frequency edge information that is often weakened by conventional spatial convolutions. A dual-branch gated fusion mechanism is employed to adaptively model global and local frequency components: the global branch captures long-range semantic dependencies to support large-scale ship recognition under low-visibility conditions, while the local branch preserves high-frequency details to improve detection of small and low-saliency targets. Additionally, a linear gated convolutional unit dynamically regulates the contributions of different frequency components, enhancing robustness to noise, fog, motion blur, and uneven illumination. The second key component is the multiscale feature perception (MFP) module, designed to address substantial scale variation and frequent occlusions in maritime scenes. Aiming to accommodate ships of diverse sizes and limited visibility, the MFP module employs convolutional kernels with multiple receptive fields, including standard, asymmetric elongated, and dilated convolutions, enabling joint modeling of fine-grained local features and broader contextual information. These multiscale features are then fused through a channel-wise attention mechanism, which highlights ship-related regions while suppressing background clutter such as sea foam, reflections, and distant coastlines, thereby maintaining discriminative capability across scales. Furthermore, AFEM and MFP are designed with practical deployment in mind. Through parameter sharing and lightweight convolutional structures, the proposed method achieves performance improvements without substantially increasing model complexity, increasing its suitability for maritime surveillance and intelligent perception systems that require high accuracy and real-time efficiency.ResultExtensive experiments on the MVDD (maritime vessel detection dataset) and RTTS (real-world task-driven testing set) datasets demonstrate the effectiveness of the proposed method. Experimental results show the excellent performance of the method in detecting 13 types of ships, revealing particularly remarkable advantages in small-target and occluded ship detection. On the MVDD dataset, the model achieves 95.18% mAP50, outperforming state-of-the-art models such as YOLOv5 and D-FINE, and achieving a 1.16% higher recall compared to YOLOv12s. On the RTTS dataset, the method attains 74.79% mAP50, surpassing YOLOv10s and performing comparably to specialized methods such as AMSP-UOD. The network is highly efficient, containing only 6.29 million parameters——notably fewer than Faster R-CNN (206.68 M) while achieving 227 frame/s, thereby exceeding YOLOv12s and D-FINE. Ablation studies confirm the contributions of AFEM and MFP, revealing performance drops of 1.29% and 0.90% upon removal, respectively.ConclusionThe proposed method balances detection accuracy, robustness, and computational efficiency. This method addresses critical limitations in existing ship detectors, particularly under adverse maritime conditions, by effectively enhancing degraded features and capturing multiscale contextual information. Beyond ocean-based applications, the model demonstrates strong adaptability to terrestrial scenarios with similar visual challenges, such as fog, snow, and poor illumination, revealing its extensive application potential in traffic surveillance, disaster response, and environmental monitoring. Owing to its lightweight design and high-speed inference, the system can be seamlessly deployed in real-time platforms, including unmanned surface vehicles, smart port surveillance, maritime radar systems, and ecological observation devices. Future work will focus on extending the framework to multi-modal ship detection through the integration of infrared and radar data, thereby improving detection reliability in extreme conditions such as nighttime, storms, and heavy fog.  
      关键词:frequency domain feature enhancement;Fourier transform;complex background interference;object detection;degradation characteristics;multi-scale feature perception   
      279
      |
      537
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 148994471 false
      更新时间:2026-05-15
    • Li Zhongwei, Guo Haoning, Qi Yanping, Yuan Dekun
      Vol. 31, Issue 5, Pages: 1583-1594(2026) DOI: 10.11834/jig.250352
      Zooplankton classification based on multiscale feature fusion and attention guidance
      摘要:ObjectiveZooplankton are a key component of the marine ecosystem, the basis of the marine food web, and a sensitive indicator of environmental change. Changes in its population composition and quantity reflect water quality, primary productivity, and ecological balance, which are significant for marine ecological assessment and red tide prediction. The traditional manual microscope inspection method is time consuming, labor consuming, and subjective, hardly meeting the requirements of large-scale real-time monitoring. With the development of intelligent perception and deep learning, image-based automatic recognition has become an important means of marine ecological research. However, the fine-grained classification of zooplankton still faces challenges, including high similarity between species, significant differences within species, blurred or occluded targets in microscopic images, and considerable scale differences among various species. Therefore, this study proposes a Transformer-based zooplankton recognition method that integrates multiscale features and an attention guidance mechanism. The method aims to enhance the network’s ability to perceive and fuse local details and global semantics, thereby improving classification accuracy, model robustness, and interpretability.MethodIn this study, a vision Transformer (ViT) is adopted as the backbone network, and two modules, namely, multiscale dilated convolution (MSDC) and dual attention (DA), are introduced to optimize the network structure. The MSDC module extracts local details and global contour information simultaneously through parallel MSDCs with different dilation rates and expands the receptive field without significantly increasing the computation amount. The DA mechanism consists of channel attention and spatial attention, which are used for weight distribution of feature channels and spatial guidance of significant regions, respectively. Meanwhile, an alternate insertion strategy is adopted, where a DA module is embedded every three layers. This strategy enhances the texture discrimination ability in the shallow layers and the semantic focusing and interclass discrimination abilities in the deep layers. These two modules complement each other: the MSDC module provides multiscale perception capabilities, while the DA module selectively enhances key features, thus improving the expression and discriminative abilities of features. The overall network structure is denoted as ViT-MDFA, and the model is trained and validated on four datasets, namely, WHOI-Plankton, ZooScanNet, Kaggle-Plankton, and the self-built Dec-22. Several ablation experiments are designed to verify the effectiveness of the proposed method. The effectiveness of the MSDC and DA modules is verified through module independence and synergy experiments; the scientificity of the alternate insertion strategy is verified via attention insertion strategy experiments; and the impact of different dilation rate combinations on model performance is verified by conducting different dilation rate combination experiments and scale sensitivity analysis experiments. Finally, the role of the attention mechanism in the image understanding process of the proposed model is explored through visual analysis.ResultExperimental results show that the classification accuracy and F1-score of the proposed model on the four datasets are higher than those of the existing mainstream convolutional and Transformer-based models. On the Dec-22 dataset, the model achieves an accuracy of 97.4% and an F1-score of 96.7%, which are 3.9% and 4% higher than those of ViT-B/16, respectively. Ablation experiments demonstrate that when only the MSDC module is used, the model accuracy can be improved by approximately 1.5%; when only the DA module is used, the model accuracy can be improved by approximately 0.7%. The combination of these two modules can further enhance the overall performance of the model. The void rate sensitivity experiment shows that the [6, 12, 18] combination exhibits the highest stability across different datasets. In the scale grouping experiment, the small target has the best effect at a low hole rate, the medium target is balanced at a medium hole rate, and the large target has the highest performance at a high hole rate, which verifies the rationality and universality of the [6, 12, 18] configuration. The visualization results show that the attention map generated by the DA module focuses on the main body of zooplankton rather than the background impurities, and the model converges fast and has low variance, significantly improving the interpretability.ConclusionThe Transformer zooplankton recognition framework, which integrates multiscale hole convolution and DA mechanism, achieves a balance between local detail extraction and global semantic understanding and significantly improves the classification accuracy and generalization ability in complex background. The main contributions of this study are as follows: 1) A new model, ViT-MDFA, is proposed. For the first time, MSDC and channel-spatial DA are embedded into a ViT simultaneously to address the issues of large scale differences, complex backgrounds, and high interclass similarity in zooplankton images. 2) A lightweight collaborative module is designed. The MSDC module pre-enhances multiscale local features, and the DA module is alternately inserted to enhance the focus on key regions. These two modules complement each other, enhancing fine-grained discriminative capabilities and ensuring efficient computation. 3) A performance breakthrough is achieved. The model achieves state-of-the-art accuracy on four mainstream datasets, and ablation experiments and visualization results verify the effectiveness of the modules. This model provides a plug-and-play solution for edge device deployment and intelligent marine ecological monitoring. Future work will explore an adaptive dilation rate learning mechanism based on gradient optimization to achieve dynamic receptive field adjustment. Model lightweight technology and knowledge distillation strategies will be combined to support the real-time application of marine online monitoring equipment. Multimodal data (e.g., environmental parameters and time series data) can be introduced to expand the model’s application potential in ecological prediction and biodiversity research.  
      关键词:Zooplankton;fine-grained image classification;Vision Transformer;dilated convolution;channel-spatial dual attention   
      323
      |
      391
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 134749775 false
      更新时间:2026-05-15
    0