Radiation Anomaly Detection of Sub-Band Optical Remote Sensing Images Based on Multiscale Deep Dynamic Fusion and Adaptive Optimization

Ci, Jinlong; Tan, Hai; Zhai, Haoran; Tang, Xinming

doi:10.3390/rs16162953

Open AccessArticle

Radiation Anomaly Detection of Sub-Band Optical Remote Sensing Images Based on Multiscale Deep Dynamic Fusion and Adaptive Optimization

Land Satellite Remote Sensing Application Center, Ministry of Natural Resources of P.R. China, Beijing 100048, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(16), 2953; https://doi.org/10.3390/rs16162953

Submission received: 12 July 2024 / Revised: 1 August 2024 / Accepted: 11 August 2024 / Published: 12 August 2024

(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Radiation anomalies in optical remote sensing images frequently occur due to electronic issues within the image sensor or data transmission errors. These radiation anomalies can be categorized into several types, including CCD, StripeNoise, RandomCode1, RandomCode2, ImageMissing, and Tap. To ensure the retention of image data with minimal radiation issues as much as possible, this paper adopts a self-made radiation dataset and proposes a FlexVisionNet-YOLO network to detect radiation anomalies more accurately. Firstly, RepViT is used as the backbone network with a vision transformer architecture to better capture global and local features. Its multiscale feature fusion mechanism efficiently handles targets of different sizes and shapes, enhancing the detection ability for radiation anomalies. Secondly, a feature depth fusion network is proposed in the Feature Fusion part, which significantly improves the flexibility and accuracy of feature fusion and thus enhances the detection and classification performance of complex remote sensing images. Finally, Inner-CIoU is used in the Head part for edge regression, which significantly improves the localization accuracy by finely adjusting the target edges; Slide-Loss is used for classification loss, which enhances the classification robustness by dynamically adjusting the category probabilities and markedly improves the classification accuracy, especially in the sample imbalance dataset. Experimental results show that, compared to YOLOv8, the proposed FlexVisionNet-YOLO method improves precision, recall, mAP0.5, and mAP0.5:0.9 by 3.5%, 7.1%, 4.4%, and 13.6%, respectively. Its effectiveness in detecting radiation anomalies surpasses that of other models.

Keywords:

radiation anomaly detection; object detection; optical remote sensing image; RepViT; deep feature fusion

1. Introduction

Remote sensing technology is mainly characterized by a digital imaging mode, which has the advantages of wide range, high efficiency, and data diversification in information acquisition. It plays a crucial role in the economic and social construction of each country and is widely used in many fields, such as the observation of earth ecology, the monitoring of marine environments [1,2,3,4,5], and the assessment of atmospheric pollutants [6]. However, as passive remote sensing data, optical remote sensing images are easily affected by radiation factors during the imaging process, leading to radiation anomalies. These radiation anomalies not only affect the accuracy of the subsequent interpretation but may also lead to a waste of resources and decision-making errors. With the increasing volume of remote sensing data, traditional visual recognition methods are no longer efficient and accurate in detecting these anomalous regions. Therefore, it is essential to research and apply efficient automated detection methods. The application of target detection technology meets this need, offering a new solution for detecting radiation anomalies in optical remote sensing images by improving detection efficiency and accuracy.

Target detection is categorized into two-stage and one-stage detection [7]. The two-stage detection model was first proposed by Girshick et al. [8], which detects targets by searching for regions in the image that may contain objects, with representative algorithms such as Fast R-CNN [9], SPP-net [10], and Faster R-CNN [11]. However, most two-stage detectors are computationally cumbersome and intensive. To solve this problem, single-stage target detectors have emerged [12], which directly process the entire image and complete the detection task with a single network. Compared with the two-stage target detection methods, its detection speed has been greatly improved. In recent years, with the leapfrog development of artificial intelligence technology, target detection and recognition have been used in many applications in the industrial field, agricultural field, transportation field, remote sensing field, and so on. In industry, Zalpour et al. [13] used the Faster R-CNN network to recognize oil storage tanks, aiming to reduce false alarms and improve processing speed. Then, using the circular detection method, several targets were selected from the region of interest. Two modes, CNN and HOG, were used to extract features from different types of oil tanks and classify the tanks using a support vector machine. Xu et al. [14] used the YOLOv3 network, adopting DenseNet as the network backbone, increasing the detection scale, and replacing the residual unit with a convolutional layer to identify oil storage tanks. Li et al. [15] realized high detection accuracy and precision for surface defects in cold-rolled steel strips by improving the YOLO network. Li et al. [16] proposed an improved algorithm based on YOLOv4; Wang et al. [17] proposed a deformable convolutional network (DCNet) for hybrid defect pattern recognition; Zhao et al. [18] raised a fast R-CNN algorithm with deformable convolution for recognizing small targets and used it for steel defect detection. Agricultural field: Zhou et al. [19] used a semi-autonomous multisensor field phenotyping platform to acquire wheat images as well as maximum entropy profiling and morphological reconstruction techniques to realize wheat spike counting. Fernandez-Gallego et al. [20] used a combination of filtering and great value search to effectively improve the recognition accuracy of wheat spikes in the field. Gene-Mola et al. [21] used RGB, Depth, and NIR intensity images of orchards collected by alignment and then input them into the Faster R-CNN network as channel overlays for experiments, and the results showed the effectiveness of multisensor information in fruit target detection. Tu et al. [22] detected passion fruits by inputting the fused RGB images and Depth images into the MS-FRCNN network, respectively, and proved that the heterogenous images are more resistant to exposure than RGB images. However, there is also the problem of low detection accuracy due to missing image features. TU et al. [23] constructed Faster R-CNN, a passion fruit detection network, based on passion fruit RGB color and Depth vision data. Subsequently, the effectiveness of depth images for fruit target detection was validated. However, the model’s universality and robustness were not as strong as expected. Transportation field: Li et al. [24] proposed a YOLO-based vehicle detection algorithm for foggy weather, which enhanced the model’s vehicle detection ability under foggy conditions by adding a de-fogging module to the model for recovering more feature information. However, the performance was relatively average for other scenarios. Yue Li et al. [25] raised an improved Yolov8 road defect detection algorithm by using the SimAM attention mechanism and GHostConv. Hongwen Dong et al. [26] proposed PGA-Net, which can detect pixel-level surface defects by pyramidal feature fusion and adding global contextual attention; Yingchao Zhang et al. [27] proposed a multilevel attention-based (MLAB) YOLOv3 UAV image road detection model with a model mAP value up to 68.75%. Su Peng et al. [28] constructed a MOD-YOLO pavement damage detection model based on YOLOX, which compensated for the possible loss of information and insufficient sensory field in the previous YOLO series algorithms. They used it to detect cracks in civil infrastructure. In the field of remote sensing, Etten et al. [29] introduced the YOLOv2 model and proposed the YOLT method, which analyzed several challenges in target detection of remote sensing images and provided corresponding countermeasures. Nie et al. [30] proposed a ship detection and segmentation method based on Mask R-CNN [31], which utilizes a channel self-attention mechanism to adjust the weights of each channel and a spatial attention mechanism to adjust the weight of each pixel, achieving a better portrayal of target features. Li [32] proposed a network for ship detection in remote sensing images in 2019, which obtains positive sample information by rotating an a priori multiscale bounding box to match with the truth box and constructs a deep learning model to achieve steady performance in ship target detection such as sea clutter background, proximity target alignment, port docking terminals, target multiscale change, and other complex scenes. An et al. [33] utilized deep convolutional neural networks for ship target detection on High Score 3 SAR images [34] in 2018. After analyzing the characteristics of sea clutter distribution, it was preprocessed, and finally, the detector performance was improved by iteration. Ming et al. [35] proposed a critical feature capture network (CFC-Net) to improve detection accuracy by constructing a robust feature representation, refining the preset anchor frames, and optimizing label assignments. Ming also proposed a dynamic anchor frame learning (DAL) strategy, which improves the detection accuracy based on the anchor frame’s key feature-capturing ability to select high-quality anchor frames adaptively. Yang et al. proposed the SCRDetector [36], which reduces background noise interference on the detection results by constructing a supervised multidimensional attention network and introduces the IoU constant factor to improve the smoothing L1 loss for the boundary problem of rotated bounding box regression.

In addition, in the field of radiation anomaly detection in optical remote sensing images, although a large number of researchers have made significant progress, there are still some limitations. First, many traditional manual methods rely on manual feature extraction and expert knowledge, which are time-consuming, laborious, and difficult to apply efficiently when dealing with large-scale remote sensing data. Second, existing detection methods need to be more robust under complex backgrounds and diverse targets and are easily affected by noise and artifacts, leading to a decrease in detection accuracy. Therefore, improving the detection accuracy and enhancing the versatility and computational efficiency of the algorithms are still important issues to be solved.

In this paper, a novel multiscale multitopology dynamically fused radiation anomaly detection network, FlexVisionNet-YOLO, is proposed, aiming at solving the current problem of radiation anomaly detection in optical remote sensing images. The main contributions of this paper are as follows:

1. A multiscale multibranching feature extraction network is proposed. This backbone network is able to effectively capture both global and local features by integrating RepViT’s visual transformer network, improving target detection accuracy and stability. Its multiscale feature fusion mechanism improves target detection of different sizes and shapes, while the optimized self-attention mechanism significantly improves adaptability in handling complex scenes. The experimental results show that these improvements not only enhance the accuracy of detection, but also significantly improve the stability and robustness of the model when dealing with complex scenes, further validating the effectiveness and reliability of the method in practical applications.

2. In the Neck part of this paper, Dynamic Deep Feature Fusion Networks (DDFNs) are adopted, using SDI (Spatial Dynamic Integration) to replace the traditional Concat operation, and Dy_Sample (Dynamic Sampling) to replace the common up-sampling operation. SDI improves the flexibility and accuracy of feature fusion by dynamically integrating different spatial features and effectively capturing multiscale information. At the same time, Dy_Sample enhances the ability to capture the details of the important regions and improves the fidelity of the information in the up-sampling process through the adaptive sampling technique. The experimental results show that these improvements enable the DDFN to detect and categorize targets more accurately when processing complex remote sensing images, improving overall detection performance and reliability.

3. In the detection part of this paper, Inner-CIoU is used for border regression (box), and Slide-Loss is used for classification loss (cls). Inner-CIoU (Intersection over Union with Inner Refinement), by fine-tuning the target border, significantly improves the accuracy and consistency of edge localization and reduces the detection frame offset and misdetection phenomenon. Slide-Loss enhances the robustness of classification through the dynamic adjustment of category probability, especially when dealing with sample imbalanced datasets, and significantly improves the classification accuracy. The experimental results demonstrate that these improvements make the model in this paper better than the traditional methods in both edge localization and classification accuracy, enhancing the overall detection performance and reliability.

4. A self-made radiation anomaly dataset is used for model training, and the generalization ability is tested using real image data. The experimental results demonstrate that the network still maintains high performance in real application scenarios.

2. Materials and Methods

2.1. Dataset Introduction

In this study, a self-made radiation anomaly dataset was used, with image data from ZY3-02 [37], GF-1 [38], GF-2 [39], GF-6 [40], and GF-7 [41]. To better detect radiation anomalies in optical remote sensing images, as shown in Figure 1, the three RGB bands of multispectral and panchromatic images are split into three independent images for analysis in this study. Different radiations in each band are detected and processed individually, which improves the recognition accuracy of subtle radiation problems. Compared with the traditional synthetic analysis method, this band-splitting strategy can capture the radiation differences among the bands in a finer way, provide more comprehensive and detailed radiation anomaly detection results, and thus improve the accuracy and reliability of remote sensing image processing. In this paper, more than 800 multispectral and panchromatic images were selected and cropped to 640 × 640 size, and then Labme3.16 software was used to outline them; the outline range should not exceed one pixel size of the selected area. Finally, the cropped and sketched images were selected, from which 8300 were used for training, and the original images are shown in Figure 2. The dataset is classified into six categories, which are CCD, StripeNoise, RandomCode1, RandomCode2, ImageMissing, and Tap, as shown in Figure 2. Among them, ImageMissing shows that the image locally or whole appears as a black block or banded area and cannot express the surface information. StripeNoise shows that the image presents a regular striation distribution, appears in stripes, and has a regular shape. CCD shows that the image presents separate stripes, and there is mostly a difference between the left and right colors of the stripes. Tap shows that the whole scene or local image area continuously show irregular stripes; the pixel value of the image cannot express the surface. RandomCode means that the pixel spectrum of some areas of the image amplitude is misaligned, resulting in the image not being able to express the surface information. In RandomCode1, the stripes in the image are distributed horizontally, whereas in RandomCode2, the stripes are distributed vertically.

2.2. Infrastructure Network Architecture

In the field of detection, YOLO series models are known for their real-time performance and accuracy. The YOLOv8 model’s architecture is divided into three components: the feature extraction backbone, the feature enhancement module (Neck), and the detection head. Several modifications have been made to adapt to practical application requirements. As illustrated in Figure 3, the feature extraction backbone of YOLOv8 is responsible for extracting fundamental features from images. Unlike previous designs, YOLOv8 incorporates the C2f module, which integrates multilevel features through Cross-Layer Convolutional Fusion (C2f). This integration aids in capturing multiscale and contextual information, thereby potentially improving feature representation and detection accuracy. The structure of the C2f module is designed to capture semantic information and contextual relationships in the image, which may enhance the model’s generalization ability. Additionally, the C2f module aims to improve computational efficiency by simplifying the convolutional process and multilayer stacking. The feature enhancement module (Neck) utilizes cross-stage feature transmission to facilitate the transfer and fusion of feature information at different levels. Furthermore, YOLOv8 introduces an independent task processing module (decoupled head) in the detection head to separate the classification and localization tasks. This separation reduces the mutual interference between tasks, allowing for independent optimization. This design aims to improve the model’s classification and localization accuracy and accelerate the convergence process.

Overall, YOLOv8 significantly improves the model’s computational speed, detection accuracy, and robustness by redesigning the C2f module in the feature extraction backbone, the cross-stage feature transfer in the Neck module, and the independent processing module. These optimizations enable YOLOv8 to maintain excellent performance in complex environments, thereby further consolidating its leading position in the field of target detection.

2.3. Overall FlexVisionNet-YOLO Network Structure

To effectively detect various radiation problems in optical remote sensing images, the YOLOv8 model is deeply optimized and improved, and the FlexVisionNet-YOLO network is proposed. As depicted in Figure 4, firstly a multiscale, multibranch feature extraction network in the backbone network section is proposed, and RepViT is used as the backbone network [42]. The vision converter structure of RepViT improves fine-grained feature extraction ability, detection accuracy, and robustness. The model has a more lightweight design, which reduces the computational complexity. In addition, RepViT’s high adaptability and flexible scalability facilitate the integration of other advanced technologies, ensuring stable detection performance in complex environments. Secondly, a dynamic deep feature fusion network is proposed in the Neck section, which adopts the Dy_Sample module to replace traditional sampling and can adaptively adjust the sampling strategy through the introduction of static and dynamic factors to extract richer and more accurate features in feature maps of different scales and complexities. This adaptivity helps to improve the robustness and generalization capabilities of the model when dealing with diverse inputs. On the other hand, replacing the C2F module with the SDI module enhances the model’s information flow and interaction during feature fusion by introducing spatial transformations and feature reorganization. The SDI module not only integrates features from different levels more efficiently, but also reduces information loss and retains more spatial details and semantic information. Finally, the inclusion of Slide-Loss in the Detect section can improve the model’s loss function [43]. By dynamically adjusting the loss calculation, Slide-Loss can more effectively handle the difficult samples and balance the positive and negative samples in the detection process so as to improve the accuracy and robustness of the detection results. This improvement is particularly applicable to various radiation problems in optical remote sensing images, further enhancing the model’s detection capability in complex scenes.

2.4. Multiscale Multibranching Feature Extraction Network

Replacing the backbone network of YOLOv8 with RepViT has many significant advantages in the task of detecting Radiation anomalies in the sub-band of optical remote sensing images. As shown in Figure 5, a Multiscale Multibranch Feature Extraction Network is proposed to replace the entire backbone section with RepViT. This structural reparameterization-based multibranching visual Variation network architecture drastically improves computational efficiency and model inference speed by simplifying the multibranch structure in the training phase to a single-branch structure in the inference phase. First and foremost, the design concept of RepViT is to improve the model’s feature representation during the training phase through a multibranch network structure and to perform reparameterization in the inference phase to simplify the computational graph. This approach not only retains the high-capacity characteristics of complex models in the training phase but also keeps the computational overhead low in the inference phase. This is especially important for tasks that require processing large-scale, high-resolution remote sensing images, which are usually rich in details and complex features, and the traditional YOLOv8 backbone network may face computational bottlenecks and inefficiency in feature extraction. Second, RepViT enables the network to learn more diverse feature representations from different paths by introducing a multibranch design during the training phase. These diverse feature representations effectively capture minute details and complex patterns in remote sensing images, enhancing the model’s feature extraction and generalization capabilities. As for the inference stage, structural reparameterization merges the multibranch network into a simple single-branch network, which significantly reduces the model’s computation and memory occupation and improves inference efficiency. In contrast, the traditional YOLOv8 backbone network may face several major problems when dealing with complex remote sensing images. Firstly, the fixed network structure may not be able to cope with complex features in high-resolution remote sensing images, which could otherwise lead to inadequate feature extraction. Second, the traditional network is relatively low in computational efficiency and cannot fully utilize hardware resources for efficient image processing. In addition, the traditional YOLOv8 may need to be more robust when facing complex environments and variable remote sensing images, as it is susceptible to the influence of noise and environmental changes, leading to decreased detection accuracy.

In conclusion, replacing the backbone network of YOLOv8 with RepViT holds significant value as it effectively enhances model’s feature extraction ability and generalization performance while maintaining efficient computation, which is particularly suitable for the task of detecting radiation anomalies in sub-bands of optical remote sensing images.

2.5. Dynamic Deep Feature Fusion Networks

As shown in Figure 6, a dynamic depth feature fusion network is proposed to enhance feature fusion and extraction of radiation anomaly images. This includes replacing the original Concat module with SDI (Selective Deconvolution Integration) and adopting Dy_Sample (Dynamic Sampling) to replace the traditional fixed sampling module, significantly improving detection performance and efficiency. First of all, the SDI module realizes efficient fusion and enhancement of multiscale features through selective deconvolution operations. The multilevel feature mapping can better capture and retain low-level feature information in the image, improving the quality of feature fusion, reducing the redundancy of information, and improving the model’s feature expression capability and robustness. Additionally, the SDI module can better capture multiscale information and improve detection performance in complex scenes. While the original concat module is simple and easy to use, it faces issues of information redundancy, lack of multiscale fusion, and high computational overhead, limiting the model’s performance and efficiency. Second, as shown in Figure 6, the methods for generating sampling points are static sampling factor and dynamic sampling factor. The static sampling factor, for example, combines a fixed range factor through a linear layer with the pixel-shuffle technique (pixel-shuffle) to generate an offset

ο

, which is then summed with the original mesh positions

ς

to obtain a sampling set

δ

, as shown in Equation (1). In addition to linear layers and pixel-shuffling, Dynamic Sampling Factors introduces a dynamic range factor, a range factor is first generated and then used to adjust the offset

ο

, as shown in Equation (2), where

σ

denotes the sigmoid function used to generate the range factor.

ο = linear (χ)

(1)

δ = ς + ο

(2)

In addition, the Dy_Sample module overcomes the limitations of the traditional fixed sampling method in complex scenes by dynamically adjusting the sampling points and strategies, to sample according to the local features of the image adaptively. The common fixed sampling method is prone to undersampling or oversampling issues in the face of changing remote sensing image environments, which affects feature extraction accuracy. When dealing with remote sensing images with high resolution and complex features, the traditional YOLOv8 relies on the concat module and the fixed sampling module in the neck section, which have limited feature extraction capability, a fixed sampling strategy, and low computational efficiency. The concat module has limited feature fusion and enhancement capabilities and struggles to adequately capture and retain detailed features. The fixed sampling strategy needs more flexibility and adaptability, which can easily lead to under-sampling or over-sampling in complex scenes. Furthermore, the fixed sampling strategy cannot be adjusted in different scenarios, nor can it be dynamically adjusted according to the complexity of the image features, which affects the detection effect and efficiency of the overall model. The dynamic deep feature fusion network proposed in the Neck section of this paper can preserve the detailed features better and improve the feature fusion effect when processing remote sensing images. It has higher sampling flexibility and accuracy, thus significantly improving detection performance and efficiency. This improvement makes YOLOv8 perform better in the task of radiation anomaly detection in remote sensing images, and it can identify and localize radiation anomaly regions more accurately and efficiently.

2.6. Inner-CIoU Loss and Slide Loss Function

Although CIoU [44] considers aspect ratio and centroid distance when dealing with bounding box regression, its accuracy of bounding box localization still needs to be improved in some detection tasks with complex backgrounds and small targets. As shown in Figure 7, Inner-CIoU [45] can more accurately assess the overlap region between the predicted and real boxes by introducing centroid consistency and shape consistency, which improves the accuracy of bounding box localization. For the tiny radiation anomaly region in a complex background, Inner-CIoU can provide more accurate localization. In order to address the weak generalization and slow convergence of the CIoU loss function in detection tasks, Inner-IoU proposes using an auxiliary bounding box to calculate the loss and accelerate the bounding box regression. It controls the scale size of the auxiliary bounding box by introducing a scale factor. This method overcomes the limitations of existing approaches by using auxiliary bounding boxes of different scales for the dataset and detector, thereby improving its generalization capabilities. Equation (3) is the intersection and concatenation ratio of

I o U

;

B

denotes the area size of the anchor frame, and

B^{g t}

denotes the area size of the real bounding box; Equation (4):

L_{C I o U}

is the loss function of CIoU, where

ρ

is the Euclidean distance,

d

is the diagonal of the smallest bounding box,

v

is the trade-off parameter, and

α

is the consistency parameter of the width-to-height ratio. Equation (5):

I o U^{i n n e r}

is the intersection and concatenation ratio of the Inner-CIoU.

L_{Inner-CIou}

is the loss function of Inner-CIoU.

I o U = \frac{|B \cap B^{g t}|}{|B \cup B^{g t}|}

(3)

L_{C I o U} = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{d^{2}} + α v

(4)

L_{Inner-CIou} = L_{C I o U} + I o U - I o U^{i n n e r}

(5)

Radiation anomalies are usually rare events in remote sensing image radiation anomaly detection tasks, which leads to a sample imbalance problem in the dataset. Using BCE-Loss causes the model to predict the background (majority class) and ignore the anomalies (minority class), thereby reducing the accuracy and robustness of detection. To solve this problem, this paper adopts Slide-Loss instead of BCE-Loss. Slide-Loss is a loss function designed for the sample imbalance problem, which balances the contribution of positive and negative samples to the loss function by introducing a sample weighting mechanism that gives higher weights to the minority class samples. Specifically, Slide-Loss can adaptively adjust the weights according to the distribution of sample categories so that the minority class samples occupy a larger proportion in the loss function, thus improving the model’s sensitivity and ability to recognize the minority class. Compared to BCE-Loss, Slide-Loss can more effectively handle the sample imbalance problem and improve the model’s performance on minority classes.

3. Experiment and Result Analysis

3.1. Experimental Environment Setting and Evaluation Metrics

For the experimental hardware configuration, an Intel Xeon E5-2650 v4 processor with 12 physical cores and 24 threads at a base frequency of 2.2 GHz based on the Broadwell microarchitecture was used in this study. For graphics, the NVIDIA Titan V was chosen, a high-end GPU based on Volta architecture with 5120 CUDA cores and 12 GB of HBM2 memory. The operating system runs on Windows 10 Professional, and the deep learning framework utilizes PyTorch version 1.8. In addition, the specific hyperparameter configurations used in this study are detailed in Table 1. In the table of hyperparameters, lr0 and lrf are critical parameters. As shown in Table 2, the optimal configuration was determined through experiments with three sets of classic learning rate settings.

To precisely and realistically estimate the model’s performance, the chosen network learning metrics for this experiment include average precision mAP, Precision (P), Recall (R), Average Precision (AP), and the number of floating-point operations (GFLOPs). These indicators are calculated by the following formula.

P = \frac{TP}{TP + FP}

(6)

R = \frac{TP}{TP + FN}

(7)

{AP}_{i} = \int_{0}^{1} P (R) d R

(8)

m A P = \frac{1}{n} \sum_{i = 1}^{n} A P_{i}

(9)

TP and FN are the number of positive and negative samples detected for positive samples, while FP is the number of positive samples detected for negative samples.

AP in Equation (7) denotes the average precision of category i. The variable “n” in Equation (8) represents the overall amount of categories, and averaging the APs of n categories gives mAP. This paper use [email protected] as a representation of the mean average precision at an Intersection over Union (IoU) of 0.5. [email protected]:0.90 denotes the average precision calculated by taking the mean of precision values at intersection over union (IoU) criteria ranging from 0.5 to 0.90.

3.2. Analysis of Experimental Results of Different Models

In the comparison experiments of different models, as shown in Table 3, SSD (Single Shot MultiBox Detector) performs more poorly in terms of precision (0.684), recall (0.662), [email protected] (0.67), and [email protected]:0.9 (0.61) due to its limited feature extraction capability. Faster R-CNN significantly improves precision (0.814) and recall (0.784) through its two-stage detection framework but at the cost of higher computational complexity. YOLOv5 performs well in precision (0.844), [email protected] (0.752), and [email protected]:0.9 (0.73), and has an advantage in real-time. YOLOv6 maintains the efficiency of YOLOv5 and improves on [email protected] (0.76). YOLOv7 further optimizes multiscale feature extraction and performs better in accuracy (0.864) and [email protected] (0.81). YOLOv8, on the other hand, performs better in precision (0.947), recall (0.881), and [email protected] (0.765). FlexVisionNet-YOLO, proposed in this paper, comprehensively outperforms other models in precision (0.982), recall (0.952), [email protected] (0.973), and [email protected]:0.9 (0.901). Meanwhile, as shown in Figure 8, the detection results are visualized according to different models, respectively, and in the experimental comparison, the FlexVisionNet-YOLO proposed in this paper performs well in detecting the CCD problem, with a confidence level of 0.90. However, the traditional YOLOv8 model performs poorly in this task, with problems of incomplete detection and low evaluation indexes. Other models, such as YOLOv5, YOLOv6, and YOLOv7, showed incorrect detection of CCD problems as tap problems. In detecting stripe noise and CCD problems, the models in this paper show excellent performance in terms of confidence and detection area. In contrast, the YOLOv8 model incorrectly detects the whole image as a StripeNoise problem. Although YOLOv5, YOLOv6, and YOLOv7 perform well on StripeNoise detection, they perform poorly on CCD problem detection. As for other models, they exhibit inaccurate detection and low evaluation metrics. The FlexVisionNet-YOLO proposed in this paper performs well in detecting ImageMissing problems. Although the traditional YOLOv8 model performs well in detecting large-area ImageMissing problems, it fails in detecting small areas. YOLOv7 and YOLOv6 exhibit too many detection frames, with low confidence and false detections, respectively. At the same time, the YOLOv5 and Faster R-CNN models are more accurate in detection, and the SSD model shows offset detection frames. These issues do not affect this paper’s models. Due to the features of the RandomCode1 problem being very similar to those in the panchromatic image, all methods except the one presented in this paper perform poorly in detection results. For the RandonCode2 problem, all methods perform well except for the SSD model, which exhibits multiple detection frames. When it comes to the detection of the Tap problem, due to the close resemblance between Tap and CCD, the detection frames produced by the other models are inaccurate, resulting in low detection accuracy. In contrast, the FlexVisionNet-YOLO model is very accurate in controlling the range of detection frames and has a high confidence level. In summary, the FlexVisionNet-YOLO model proposed in this paper shows excellent performance in various detection tasks and significantly outperforms traditional methods.

3.3. Analysis of Ablation Experiment Results

This paper proposed a new method for detecting radiation anomalies in remote sensing images, which combines advanced network architecture and loss function design, and conducts a series of ablation experiments to validate its effectiveness. First of all, the use of RepViT as the backbone network can significantly improve the feature extraction capability. RepViT is a lightweight convolutional neural network designed to focus on the balance of efficiency and performance. RepViT has a significant reduction in model inference time, which is crucial for remote sensing image processing tasks with high real-time requirements. As shown in Table 4, the performance changes after replacing the backbone network are observed in specific ablation experiments. First, although the precision is slightly decreased, the recall is improved, which means that the model can detect more real positive samples. Second, mAP (mean average precision) has also improved, indicating an overall enhancement in the detection performance of the model. The slight decrease in the accuracy of method 1 is because RepViT focuses more on global features than local details during the feature extraction process in the initial stage, which results in a less accurate capture of some detailed features. However, this effect is acceptable in terms of overall performance. To compensate for this, the SDI module is introduced, which enhances the feature fusion capability by selectively integrating multilevel features, improving the accuracy and precision of detection. The Dy_Sample module, on the other hand, employs a dynamic sampling strategy that adaptively adjusts the sampling method according to the features of the input image, improving the feature expression capability, especially in high-resolution and complex feature images. The experimental results demonstrate a significant improvement in the precision, recall, and mAP of the model after the addition of the SDI and Dy_Sample modules. This improvement, particularly in the accuracy rate, compensates for the shortcomings caused by replacing the backbone network, thereby confirming the effectiveness and applicability of the method presented in this paper. By introducing the Inner-CIOU and Slide-Loss modules, the model in this paper achieves significant performance improvement in the detection part. Inner-CIOU provides more accurate bounding box regression, while Slide-Loss effectively handles the sample imbalance problem, thereby improving the model’s convergence speed and detection accuracy. The recall improves by 5.3%, which can also prove the effectiveness of the classification. As can also be seen in Figure 9, the model in this paper shows excellent results for the accuracy of the detection of each category and the accuracy of the bounding box control.

In summary, through the ablation experiments and Table 4 (Figure 9), the model of FlexVisionNet-YOLO proposed in this paper has achieved significant performance improvements in all aspects. In particular, the model shows higher efficiency and accuracy when processing complex scenes and high-resolution remote-sensing images.

3.4. Analysis of Inner-CIoU and Slide-Loss Experiment Results

As shown in Figure 10, the model proposed in this paper outperforms the traditional YOLOv8 model in terms of classification accuracy and edge control. First, the model in this paper introduces two optimization techniques: Inner-CIoU and Slide-Loss. Inner-CIoU (Intersection over Union with Inner Refinement) significantly improves the accuracy and consistency of the border positioning by making finer adjustments to the target borders and reduces the detection box offset and misdetection phenomenon. Slide-Loss enhances the robustness of classification through the dynamic adjustment of category probability, especially when dealing with sample imbalanced datasets, and significantly improves the classification accuracy. As can be seen from Figure 10, the FlexVisionNet-YOLO proposed in this paper outperforms other models in edge control detection when dealing with CCD problems. While in StripeNoise detection the YOLOv8 model suffers from misdetection and omission due to the small number of samples, the model in this paper demonstrates a high degree of detection accuracy with no misdetection. Combined with Inner-CIoU’s edge fine-tuning and Slide-Loss’s dynamic classification optimization techniques, this paper’s model shows higher detection accuracy and more reliable classification ability when dealing with complex scenes and variable environments, which is comprehensively better than the YOLOv8 model.

3.5. Quantitative Analysis and Comparison of the Performance of Each Category

A detailed quantitative analysis was conducted on the performance of the Yolov8 and FlexVisionNet-YOLO models across different categories. By comparing four metrics (Precision (P), Recall (R), [email protected], and [email protected]:0.9) for each model across six categories (Image-Missing, Tap, RandomCode1, CCD, StripeNoise, RandomCode2), it is evident from Table 5 that the average Precision (P) of the FlexVisionNet-YOLO model is generally higher than that of the Yolov8 model in all categories. For example, in the Image-Missing category, the P value of FlexVisionNet-YOLO is 0.993, while that of Yolov8 is 0.934; in the RandomCode1 category, the P value of FlexVisionNet-YOLO is 0.965, whereas that of Yolov8 is 0.914. Similarly, the average recall (R) of FlexVisionNet-YOLO is also higher than that of Yolov8 across all categories. In the CCD category, the R value of FlexVisionNet-YOLO is 0.967, while that of Yolov8 is 0.842. In terms of the [email protected] metric, FlexVisionNet-YOLO consistently outperforms Yolov8 in all categories, with higher average values. In the Tap category, the [email protected] value of FlexVisionNet-YOLO is 0.995, compared to 0.975 for Yolov8. Similarly, for the [email protected]:0.9 metric, FlexVisionNet-YOLO generally performs better than Yolov8 in most categories. Specifically, in the RandomCode1 category, the [email protected]:0.9 value of FlexVisionNet-YOLO is 0.834, while that of Yolov8 is 0.633. To further illustrate the above analysis, Figure 11 shows the performance changes of the two models during the training process. The red curve and the green curve in the graph represent the FlexVisionNet-YOLO model and the Yolov8 model, respectively. It is evident that FlexVisionNet-YOLO consistently outperforms Yolov8 in precision, recall, [email protected], and [email protected]:0.9. This indicates that while Yolov8 performs well in certain categories, FlexVisionNet-YOLO demonstrates higher overall accuracy and recall, especially in the crucial [email protected] and [email protected]:0.9 metrics, showcasing its superiority in object detection tasks.

3.6. Analysis of FlexVisionNet-YOLO Performance in Real Data

In this experiment, the detection results of the model are compared in cases of real images both with and without waveband splitting. As shown in Figure 12, the first row (A) demonstrates the detection results after splitting the wavebands, while the second row (B) shows the original image without splitting the bands. As can be seen from the figure, the detection results after waveband splitting are clearer, and the accuracy is improved. In ImageMissing, RandomCode1, CCD, and StripeNoise categories, the model is able to accurately label and give high confidence scores (0.97, 0.95, 0.84, 0.83, etc., respectively). Similarly, the original images without splitting the bands also demonstrated better detection, such as in the ImageMissing and RandomCode2 categories, with detection confidence scores of 0.92 and 0.96, respectively. So, it can be seen that the model is able to maintain high detection accuracy in different categories, whether splitting the bands or not splitting the bands. This shows that the model used in this paper has excellent generalization ability and robustness in practical applications and can effectively deal with different types of image data.

4. Discussion

In this study, a FlexVisionNet-YOLO network is proposed for radiation anomaly detection in optical remote sensing images. To verify the model’s detection performance, 100 images were selected from various types of satellite data (ZY-3, GF1, GF2, GF6, GF7) for testing. As shown in Figure 13, the FlexVisionNet-YOLO model shows consistent superiority in the comparison of accuracy and false detection rates on different satellite data. The histograms of accuracy and false detection rates (Figure 13) show that the FlexVisionNet-YOLO model achieved high detection accuracy in all categories and maintained the false detection rate at a low level (~5%). These results demonstrate the robustness and reliability of the FlexVisionNet-YOLO network under different conditions. In Section 3.4, Section 3.5 and Section 3.6, the performance of the model in this paper is further validated on the dataset and real images. The experimental results show that the FlexVisionNet-YOLO model not only achieves high accuracy and a low false detection rate on standard datasets but also exhibits excellent detection performance on real image tests. Especially in the case of high image complexity and diversity, the model in this paper can still maintain high detection accuracy and a low false detection rate, which proves its good generalization ability. Combined with the above analysis, the FlexVisionNet-YOLO network can significantly improve the accuracy and efficiency of radiation anomaly detection in optical remote sensing images. The experimental results fully demonstrate the effectiveness and robustness of the model, which not only performs well in dataset testing but also has significant advantages in real image applications. This provides new methods and ideas for optical remote sensing image processing and analysis, which has important theoretical significance and application value.

Although the FlexVisionNet-YOLO network has achieved significant improvements in detection accuracy and efficiency, there are still some problems to be solved. In the current experiment, the samples are now preprocessed to eliminate some of the bias color problems. However, in practical applications, although many of the bias color problems have little impact on the clear representation of features, these problems still exist and need to be further solved. In addition, further optimization of the computational efficiency of the model is still a key direction for future research. Therefore, it is planned to deal with the partial color problem more deeply in the subsequent research and to further improve the computational performance of the model by optimizing the algorithm structure and hardware acceleration techniques.

5. Conclusions

In this paper, a novel multiscale multitopology dynamically-fused radiation anomaly detection network, FlexVisionNet-YOLO, is proposed, aiming at solving the current problem of radiation anomaly detection in optical remote sensing images. Radiation anomaly detection is crucial in remote sensing image processing because accurate identification and classification of radiation anomalies not only helps to improve image utilization efficiency but also retains more valuable information. First, a multiscale, multibranching feature extraction network is proposed. This backbone network can effectively capture global and local features by integrating the RepViT visual transformer network, thereby improving the accuracy and stability of target detection. The multiscale feature extraction mechanism enhances the detection of targets of different sizes and shapes, while the optimized self-attention mechanism significantly improves adaptability in handling complex scenes. Second, the dynamic deep feature fusion network Dynamic Deep Feature Fusion Network (DDFN) is proposed in the Neck section, which adopts SDI to replace the traditional Concat operation as well as Dy_Sample (Dynamic Sampling) to replace the ordinary up-sampling operation. The SDI enhances the accuracy and stability of the feature fusion mechanism by dynamically integrating different spatial features. Dy_Sample enhances the ability to capture the details of important regions through an adaptive sampling technique and improves the fidelity of information in the up-sampling process. Finally, in the Detect section, Inner-CIoU is used for border regression (bbox) and Slide-Loss for classification loss (cls). Inner-CIoU (Intersection over Union with Inner Refinement) significantly improves the accuracy of border localization by fine-tuning the target border. The accuracy and consistency of edge localization reduce the detection frame offset and misdetection phenomena. Slide-Loss enhances the robustness of classification through the dynamic adjustment of category probability, especially when dealing with sample imbalanced datasets, and significantly improves the classification accuracy. The experimental results show that the FlexVisionNet-YOLO proposed in this paper outperforms other existing models in radiation anomaly detection and significantly improves detection accuracy and stability. Compared with YOLOv8, the FlexVisionNet-YOLO method proposed in this paper improves the precision, recall, mAP0.5, and mAP0.5:0.9 by 3.5%, 7.1%, 4.4%, and 13.6%, respectively. Through testing on datasets and real images, FlexVisionNet-YOLO demonstrates excellent detection performance, especially when dealing with complex scenes and sample imbalance data. This shows that the model in this paper is not only innovative in theory but also has high practical value in practical applications. In summary, FlexVisionNet-YOLO has significant advantages in improving the accuracy and computational efficiency of radiation anomaly detection in optical remote sensing images and can provide effective support for future remote sensing image processing and analysis.

Author Contributions

Conceptualization, J.C. and H.T.; methodology, J.C. and H.T.; validation, J.C.; formal analysis, J.C. and H.T.; investigation, J.C.; data curation, J.C. and H.T.; writing—original draft preparation, J.C.; writing—review and editing, J.C., H.Z. and X.T. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Demonstration System for High-Resolution Remote Sensing and Mapping Applications (Phase II) (42-Y30B04-9001-19/21) and the Construction of natural resources satellite remote sensing technology system and application demonstration (00000113).

Data Availability Statement

The data used in this study are available from the first or corresponding author.

Acknowledgments

This article is mainly thanks to the Land Satellite Remote Sensing Application Center, MNR, Beijing, China Demonstration System for High-Resolution Remote Sensing and Mapping Applications (Phase II) and Construction of Natural Resources satellite remote sensing technology system and application demonstration.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, H.; Zhou, Q.; Li, Q.; Hu, S.; Shi, T.; Wu, G. Determining switching threshold for NIR-SWIR combined atmospheric correction algorithm of ocean color remote sensing. ISPRS J. Photogramm. Remote Sens. 2019, 153, 59–73. [Google Scholar] [CrossRef]
XiaGS, B. DOTA: ALarge-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
Requena-Mesa, C.; Benson, V.; Reichstein, M.; Runge, J.; Denzler, J. EarthNet2021: A large-scale dataset and challenge for Earth surface forecasting as a guided video prediction task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1132–1142. [Google Scholar]
Xiong, Z.; Zhang, F.; Wang, Y.; Shi, Y.; Zhu, X.X. Earthnets: Empowering ai in earth observation. arXiv 2022, arXiv:2210.04936. [Google Scholar]
Wang, L.; Bi, J.; Meng, X.; Geng, G.; Huang, K.; Li, J.; Tang, L.; Liu, Y. Satellite-based assessment of the long-term efficacy of PM2. 5 pollution control policies across the Taiwan Strait. Remote Sens. Environ. 2020, 251, 112067. [Google Scholar] [CrossRef]
Tayara, H.; Chong, K.T. Object detection in very high-resolution aerial images using one-stage densely connected feature pyramid network. Sensors 2018, 18, 3341. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 142–158. [Google Scholar] [CrossRef]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Zalpour, M.; Akbarizadeh, G.; Alaei-Sheini, N. A new approach for oil tank detection using deep learning features with control false alarm rate in high-resolution satellite imagery. Int. J. Remote Sens. 2020, 41, 2239–2262. [Google Scholar] [CrossRef]
Xu, D.; Wu, Y. Improved YOLO-V3 with DenseNet for multi-scale remote sensing target detection. Sensors 2020, 20, 4276. [Google Scholar] [CrossRef]
Li, J.; Su, Z.; Geng, J.; Yin, Y. Real-time detection of steel strip surface defects based on improved yolo detection network. IFAC-PapersOnLine 2018, 51, 76–81. [Google Scholar] [CrossRef]
Li, M.; Wang, H.; Wan, Z. Surface defect detection of steel strips based on improved YOLOv4. Comput. Electr. Eng. 2022, 102, 108208. [Google Scholar] [CrossRef]
Wang, J.; Xu, C.; Yang, Z.; Zhang, J.; Li, X. Deformable convolutional networks for efficient mixed-type wafer defect pattern recognition. IEEE Trans. Semicond. Manuf. 2020, 33, 587–596. [Google Scholar] [CrossRef]
Zhao, W.; Chen, F.; Huang, H.; Li, D.; Cheng, W. A new steel defect detection algorithm based on deep learning. Comput. Intell. Neurosci. 2021, 2021, 5592878. [Google Scholar] [CrossRef]
Zhou, C.; Liang, D.; Yang, X.; Xu, B.; Yang, G. Recognition of wheat spike from field based phenotype platform using multi-sensor fusion and improved maximum entropy segmentation algorithms. Remote Sens. 2018, 10, 246. [Google Scholar] [CrossRef]
Fernandez-Gallego, J.A.; Kefauver, S.C.; Gutiérrez, N.A.; Nieto-Taladriz, M.T.; Araus, J.L. Wheat ear counting in-field conditions: High throughput and low-cost approach using RGB images. Plant Methods 2018, 14, 22. [Google Scholar] [CrossRef]
Gené-Mola, J.; Vilaplana, V.; Rosell-Polo, J.R.; Morros, J.-R.; Ruiz-Hidalgo, J.; Gregorio, E. Multi-modal deep learning for Fuji apple detection using RGB-D cameras and their Radiation capabilities. Comput. Electron. Agric. 2019, 162, 689–698. [Google Scholar] [CrossRef]
Tu, S.; Xue, Y.; Zheng, C.; Qi, Y.; Wan, H.; Mao, L. Detection of passion fruits and maturity classification using Red-Green-Blue Depth images. Biosyst. Eng. 2018, 175, 156–167. [Google Scholar] [CrossRef]
Tu, S.; Pang, J.; Liu, H.; Zhuang, N.; Chen, Y.; Zheng, C.; Wan, H.; Xue, Y. Passion fruit detection and counting based on multiple scale faster R-CNN using RGB-D images. Precis. Agric. 2020, 21, 1072–1091. [Google Scholar] [CrossRef]
Li, W. Vehicle detection in foggy weather based on an enhanced YOLO method. J. Phys. Conf. Ser. 2022, 2284, 012015. [Google Scholar] [CrossRef]
Li, Y.; Yin, C.; Lei, Y.; Zhang, J.; Yan, Y. RDD-YOLO: Road Damage Detection Algorithm Based on Improved You Only Look Once Version 8. Appl. Sci. 2024, 14, 3360. [Google Scholar] [CrossRef]
Dong, H.; Song, K.; He, Y.; Xu, J.; Yan, Y.; Meng, Q. PGA-Net: Pyramid feature fusion and global context attention network for automated surface defect detection. IEEE Trans. Ind. Inform. 2019, 16, 7448–7458. [Google Scholar] [CrossRef]
Zhang, Y.; Zuo, Z.; Xu, X.; Wu, J.; Zhu, J.; Zhang, H.; Wang, J.; Tian, Y. Road damage detection using UAV images based on multi-level attention mechanism. Autom. Constr. 2022, 144, 104613. [Google Scholar] [CrossRef]
Su, P.; Han, H.; Liu, M.; Yang, T.; Liu, S. MOD-YOLO: Rethinking the YOLO architecture at the level of feature information and applying it to crack detection. Expert Syst. Appl. 2024, 237, 121346. [Google Scholar] [CrossRef]
Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. Int. J. Comput. Vis. 2020, 128, 642–656. [Google Scholar] [CrossRef]
Nie, X.; Duan, M.; Ding, H.; Hu, B.; Wong, E.K. Attention Mask R-CNN for ship detection and segmentation from remote sensing images. IEEE Access 2020, 8, 9325–9334. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Li, Y.; Peng, C.; Chen, Y.; Jiao, L.; Zhou, L.; Shang, R. A Deep Learning Method for Change Detection in Synthetic Aperture Radar Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5751–5763. [Google Scholar] [CrossRef]
Quanzhi, A.; Zongxu, P.; Hongjian, Y. Ship Detection in Gaofen-3 SAR Images Based on Sea Clutter Distribution Analysis and Deep Convolutional Neural Network. Sensors 2018, 18, 334. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. Automatic Ship Detection Based on RetinaNet Using Multi-Resolution Gaofen-3 Imagery. Remote Sens. 2019, 11, 531. [Google Scholar] [CrossRef]
Li, J.; Tian, Y.; Xu, Y.; Zhang, Z. Oriented object detection in remote sensing images with anchor-free oriented region proposal network. Remote Sens. 2022, 14, 1246. [Google Scholar] [CrossRef]
Yang, X.; Liu, Q.; Yan, J.; Li, A.; Zhang, Z.; Yu, G. R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. arXiv 2019, arXiv:1908.05612. [Google Scholar] [CrossRef]
Chen, Y.; Xie, Z.; Qiu, Z.; Zhang, Q.; Hu, Z. Calibration and validation of ZY-3 optical sensors. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4616–4626. [Google Scholar] [CrossRef]
Chunling, L.; Zhaoguang, B. Characteristics and typical applications of GF-1 satellite. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 1246–1249. [Google Scholar]
Huang, W.; Sun, S.; Jiang, H.; Gao, C.; Zong, X. GF-2 Satellite 1m/4m Camera Design and In-Orbit Commissioning. Chin. J. Electron. 2018, 27, 1316–1321. [Google Scholar] [CrossRef]
Zhou, S.; Pan, H.; Huang, T.; Zhou, P. High Accuracy Georeferencing of GF-6 Wide Field of View Scenes towards Analysis Ready Data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5614512. [Google Scholar] [CrossRef]
Tang, X.; Xie, J.; Liu, R.; Huang, G.; Zhao, C.; Zhen, Y.; Tang, H.; Dou, X. Overview of the GF-7 laser altimeter system mission. Earth Space Sci. 2020, 7, e2019EA000777. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Lin, Z.; Han, J.; Ding, G. RepViT: Revisiting mobile cnn from vit perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 15909–15920. [Google Scholar]
Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. Yolo-facev2: A scale and occlusion aware face detector. Pattern Recognit. 2024, 155, 110714. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Xu, C.; Zhang, S. Inner-iou: More effective intersection over union loss with auxiliary bounding box. arXiv 2023, arXiv:2311.02877. [Google Scholar]

Figure 1. Display of sub-banded data.

Figure 2. Raw data presentation.

Figure 3. YOLOv8 basic framework.

Figure 4. FlexVisionNet-YOLO network framework.

Figure 5. Comparison of multiscale multiple-branch feature extraction backbone network and C2f.

Figure 6. Comparison of FlexVisionNet-YOLO Network and Yolov8 Neck.

Figure 7. Diagram of the Detect module.

Figure 8. Comparison of results from different models. (a) FlexVisionNet-YOLO, (b) Yolov8, (c) Yolov7, (d) Yolov6, (e) Yolov5, (f) Faster R-CNN, (g) SSD.

Figure 9. All kinds of problem results shown.

Figure 10. Comparison of detection frame and classification effects. (A): Yolov8 results, (B): FlexVisionNet-YOLO results.

Figure 11. Comparison of FlexVisionNet-YOLO and Yolov8 performance curves.

Figure 12. Real Image Data Detection Results. (A) split-band image data, (B) non-split-band image data.

Figure 13. Statistical results of unproblematic data.

Table 1. Hyper Parameter Setting.

Name	Parameter Setting
Lr0	0.001
Lrf	0.001
Momentum	0.927
Epoches	700
Batch_Size	16
Dfl	0.8
Patience	150
Cls	1
Optimizer	SGD

Table 2. Learning rate experiments.

lr0	lrf	P	R	[email protected]	[email protected]:0.9
0.01	0.01	0.921	0.931	0.942	0.876
0.001	0.005	0.924	0.934	0.936	0.887
0.001	0.001	0.982	0.952	0.973	0.901

Table 3. Comparative experimental results of different models.

Model	P	R	[email protected]	[email protected]:0.9
SSD	0.684	0.662	0.67	0.61
Faster R-CNN	0.814	0.784	0.704	0.68
Yolov5	0.844	0.756	0.752	0.73
Yolov6	0.832	0.74	0.76	0.72
Yolov7	0.864	0.762	0.81	0.74
Yolov8	0.947	0.881	0.929	0.765
FlexVisionNet-YOLO	0.982	0.952	0.973	0.901

Table 4. Results of ablation experiments.

Methods	RepViT	SDI	Dy_Sample	ReViT+SDI	SDI+Dy_Sample	Inner-CIoU+Slide	P	R	[email protected]	Time (ms)
Yolov8(basline)	-	-	-	-	-	-	0.947	0.881	0.929	5.8
Method(1)	✓	-	-	-	-	-	0.944	0.901	0.945	2.3
Method(2)	-	✓	-	-	-	-	0.95	0.911	0.93	7.6
Method(3)	-	-	✓	-	-	-	0.94	0.904	0.934	5.4
Method(4)	-	-	-	✓		-	0.954	0.912	0.936	5.8
Method(5)	-	-	-	-	✓	-	0.944	0.906	0.928	8.3
Method(6)	-	-	-	-	-	✓	0.962	0.934	0.941	5.5
FlexVisionNet-YOLO	✓	✓	✓	-	-	✓	0.982	0.952	0.973	3.7

Table 5. Comparative Experimental Results by Category.

Model	Class	Image-Missing	Tap	RandomCode1	CCD	StripeNoise	RandomCode2
Yolov8	P	0.934	0.982	0.914	0.956	0.935	0.964
Yolov8	R	0.8	0.964	0.792	0.842	0.964	0.962
Yolov8	[email protected]	0.897	0.975	0.875	0.87	0.964	0.987
Yolov8	[email protected]:0.9	0.729	0.923	0.633	0.701	0.705	0.902
FlexVisionNet-YOLO	P	0.993	0.997	0.965	0.986	0.97	0.997
FlexVisionNet-YOLO	R	0.836	0.997	0.0.939	0.967	0.994	0.992
FlexVisionNet-YOLO	[email protected]	0.917	0.995	0.965	0.982	0.994	0.994
FlexVisionNet-YOLO	[email protected]:0.9	0.875	0.99	0.834	0.816	0.969	0.934

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ci, J.; Tan, H.; Zhai, H.; Tang, X. Radiation Anomaly Detection of Sub-Band Optical Remote Sensing Images Based on Multiscale Deep Dynamic Fusion and Adaptive Optimization. Remote Sens. 2024, 16, 2953. https://doi.org/10.3390/rs16162953

AMA Style

Ci J, Tan H, Zhai H, Tang X. Radiation Anomaly Detection of Sub-Band Optical Remote Sensing Images Based on Multiscale Deep Dynamic Fusion and Adaptive Optimization. Remote Sensing. 2024; 16(16):2953. https://doi.org/10.3390/rs16162953

Chicago/Turabian Style

Ci, Jinlong, Hai Tan, Haoran Zhai, and Xinming Tang. 2024. "Radiation Anomaly Detection of Sub-Band Optical Remote Sensing Images Based on Multiscale Deep Dynamic Fusion and Adaptive Optimization" Remote Sensing 16, no. 16: 2953. https://doi.org/10.3390/rs16162953

APA Style

Ci, J., Tan, H., Zhai, H., & Tang, X. (2024). Radiation Anomaly Detection of Sub-Band Optical Remote Sensing Images Based on Multiscale Deep Dynamic Fusion and Adaptive Optimization. Remote Sensing, 16(16), 2953. https://doi.org/10.3390/rs16162953

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Radiation Anomaly Detection of Sub-Band Optical Remote Sensing Images Based on Multiscale Deep Dynamic Fusion and Adaptive Optimization

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Introduction

2.2. Infrastructure Network Architecture

2.3. Overall FlexVisionNet-YOLO Network Structure

2.4. Multiscale Multibranching Feature Extraction Network

2.5. Dynamic Deep Feature Fusion Networks

2.6. Inner-CIoU Loss and Slide Loss Function

3. Experiment and Result Analysis

3.1. Experimental Environment Setting and Evaluation Metrics

3.2. Analysis of Experimental Results of Different Models

3.3. Analysis of Ablation Experiment Results

3.4. Analysis of Inner-CIoU and Slide-Loss Experiment Results

3.5. Quantitative Analysis and Comparison of the Performance of Each Category

3.6. Analysis of FlexVisionNet-YOLO Performance in Real Data

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI