1 Introduction

Compared with computed tomography (CT), CXR is not only cheap but also has lower radiation, that can reduce the harm to human. COVID-19 is a respiratory disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1, 2] and has spread rapidly around the world. The epidemic grows even faster with severe new variants of coronavirus and has greatly damaged the global economy and health. Until May 2021, more than 160 million confirmed cases and 3.32 million deaths have been registered in more than 200 countries and territories. During the diagnosis and assessment of disease progression, the radiologists can perform multiple CXR examinations on the patient to accurately evaluate the curative effect since most COVID-19 infected patients were diagnosed with pneumonia [3]. However, with the increase of confirmed cases, it is not only a huge burden and time-consuming process performed for radiologists to check CXR images, but also difficult to ensure the accuracy of evaluation since annotations of CXR images are often highly influenced by clinical experience [4].

Recently, deep learning [5,6,7,8,9,10,11] models have attained significant advancements in the field of medical image analysis by training on enough labeled data and fine-tuning its millions of parameters [12, 13]. Therefore, it is becoming more and more important to use deep learning models to analyze CXR images of COVID-19 infected patients, to relieve the shortage of medical resources and the overload of doctors. Here, Ouyang et al. [14] used a dual-sampling attention network to detect of COVID-19 cases. [15] proposed a novel PSSPNN model for classification between COVID-19, secondary pulmonary tuberculosis, community-captured pneumonia, and healthy subjects. DenseNet-OTLS method [16] achieved better performances than state-of-the-art approaches in diagnosing COVID-19. [17, 18] both utilized CNN to segment COVID-19 infection in CT images. And Shi et al. [19] make a review of imaging data acquisition, segmentation, and diagnosis for COVID-19 using AI (artificial intelligence). The above works are all typical methods of COVID-19 image analysis.

Nevertheless, there are various types of noise in CXR images, such as ground-glass opacity, bilateral abnormalities, and interstitial abnormalities. Especially, low-dose CXR images susceptible to noise are complicated and fuzzy likely to interfere with the diagnosis of machines and doctors [20]. Therefore, obtaining clearer details in CXR images and improving the images quality by denoising is of great significance [21, 22].

Due to the high practical value, the medical image denoising method [20, 23,24,25,26,27] has been extensively studied for a long time. Mondal et al. [28] and Raj et al. [29] used discrete wavelet technology [30] for medical image denoising. The methods are simple to calculate and run faster, but they both had an unsatisfactory performance in removing Gaussian white noise (GWN) widely existing in medical images. In addition to classic filtering [31,32,33] and transform domain medical images denoising method [24, 25, 34], non-local mean (NLM) [35, 36] and block-matching and 3D filtering (BM3D) [37, 38] based on the self-similarity show promising denoising performance.

Although traditional medical image denoising algorithms can improve the quality of medical images to a certain extent, they usually need to manually selected parameters and complex optimized algorithms [39], and enable to preserve texture details effectively [20]. Recently, deep learning methods [40,41,42,43,44,45,46], given enough data, have significant advances in images denoising than those traditional hand-crafted methods. They are significantly different in several key respects. First of all, deep learning methods do not need to manually adjust the parameters and complicated optimization algorithm. Moreover, deep learning methods can be competent for many varied noise tasks through different training data. However, the proposed methods above still have some obvious weaknesses. (1) Most of these methods ignore the connection between shallow layers and deep layers. (2) Some of these deep networks fail to extract information from feature maps effectively. (3) Lack of efficient multi-resolution feature fusion method. Given these, in this paper, we proposed a novel multi-resolution parallel residual CNN for CXR images denoising. There is spatial information in the shallow layers with high resolution and semantic information in the deep layers with low resolution. We utilize the multi-resolution parallel convolution streams to connect the spatial and semantic information. The ECSA module is proposed to make the network focus more on texture details in CXR images with fewer parameters. We usually directly add or concatenate multiple resolution feature maps. However, they both provide limited expressive power to the network. Therefore, we design the AMFF method based on attention to improve the expression of the network.

The main contributions of this work are summarized as follows:

  1. (1)

    Multi-resolution parallel convolution flows are used to fuse information from high-resolution and low-resolution features. It is also used to enhance the robustness of the model.

  2. (2)

    An ECSA model combining effective channel and spatial attention is proposed to make the network pay more attention to the texture details of CXR images while reducing the parameters.

  3. (3)

    To improve the representation of the network, an attentional-based AMFF method is used, which adaptively fuses multi-resolution features, rather than simply combining and summing features.

  4. (4)

    To verify the impact of the MPR-CNN, we design abundant experiments for CXR images classification. The outstanding results demonstrate the ability of our network to detect of COVID-19 cases from denoised CXR images.

The remainder of this paper is organized as follows. Section 2 provides a brief survey of related work. In Sect. 3, our MPR-CNN was first presented and then illustrates the loss function and optimization. In Sect. 4, extensive experiments are conducted to evaluate. Finally, several summaries and future work are given in Sect. 5.

2 Related work

In this paper, we proposed the MPR-CNN model for CXR images denoising. With the rapidly growing CXR images of confirmed cases, there is a pressing necessity to enhance the images quality for improved COVID-19 detection. To better understand the composition and the core of the model, we briefly describe the representative methods for each of the central studied problems.

2.1 Deep learning methods for images denoising

Deep learning has become a dominant machine learning method in image processing, such as image classification [7], image recognition [47], and image denoising, which have demonstrated great potential and remarkable performance due to flexible and powerful plug-in components in deep learning [39]. Burger et al. [48] first utilized the multilayer perceptron (MLP) for image denoising and the extensive experiments demonstrate that MLP has similar or even better representation power than the hand-crafted BM3D. Besides, GANs [45, 46] that are frameworks to estimate generative models are also fine choices to suppress the noise. Generally, the framework consists of a generative network (G) and a discriminative network (D), ruling the game theory.

In terms of improving the efficiency of denoising, CNNs can be regarded as a modular part, and some classic optimization methods can be inserted to restore potential clean images, which is effective for processing noisy images. DnCNN [43] and IRCNN [49] both use a full convolution network with a signal-scale feature for image denoising. An encoder–decoder method was utilized in [50,51,52,53]. First, the input is gradually mapped to the low resolution representation, and then the stepwise reverse mapping is applied to the original resolution. Although these CNNs have achieved progressive results, they still have limitations. Full convolution networks do not use any downsampling operations, so the feature maps have more precise spatial details. However, these networks are less efficient in encoding contextual information due to their limited acceptance field. On the other hand, encoder–decoder methods lost fine spatial details, although gaining more context information.

2.2 Multi-resolution features fusion

Multi-resolution features fusion is an important process to improve the denoising of CXR images. The low-level features with higher resolution, contain more position and detailed information. However, they have less semantic information and more noise due to less convolution. In contrast, high-level features with richer feature information, but the resolution is very low, and the perception of details is unsatisfactory. The purpose of feature fusion is to merge the features extracted from the input into new features that are more expressive than the original one. The classic feature fusion methods are mainly divided into summation [54, 55] and concatenation [56]. Assuming the dimension of the two respective input features are p and q, and the dimension of the output feature Z by concatenation is shown in Eq. (1).

$$ {\text{Dim}}(Z) = p + q $$
(1)

The number of channels is increased, but the information in each channel is maintained the same. In contrast, assuming the two respective input features are x and y, and the value of output characteristic Z is shown in Eq. (2). Here, \(\lambda\) represents a constant.

$$ Z = x + \lambda y $$
(2)

However, they both provide limited expressive power to the network. Inspired by this reason, we design the AMFF method based on attention to improve the expression of the network.

2.3 Attention mechanism

Recently, lots of works [57,58,59,60] utilize channel attention or spatial attention to improve the performance of deep learning as an effective module. Hu et al. [57] first proposed a squeeze and excitation network (SENet) to pay attention to the relationship between channels. The weight of each channel is squeezed by global average pooling (GAP) and fully connection layers. Zhang et al. [60] propose a residual non-local attention network to address the issue that the uneven distribution of information in the corrupted images. [59] combines the channel and spatial attention to improve the feature extraction ability of networks.

The attention mechanism enables the network to learn where to concentrate and promotes the network to focus on the target object. The channel attention mechanism enhances or suppresses different channels for different tasks, by modeling the weights of each feature channel. The essence of spatial attention is to locate the target and perform some transformations or obtain weights. These attention mechanisms can improve the expression of the features by establishing dependencies between channels, or weighted spatial attention masks. However, these methods still need a large cost on memory and computation complexity.

3 Proposed method

In this section, we introduce the proposed CXR images denoising network MPR-CNN in detail, containing MNEB, ECSA, and AMFF. The ECSA module is designed to make the network focus more on texture details in CXR images and reduce the parameters by 1D convolution instead of full connection layer. The AMFF module based on attention, rather than simple concatenation or summation for feature fusion, is utilized to improve the expression of the network. The MNEB is utilized for fusing information from high and low resolution features, which is included the ECSA and the AMFF. Also, the whole network uses residual blocks to reduce the difficulty of network learning. Further, the SSIML1 loss and the cosine annealing strategy [61] are set to train our MPR-CNN. We will describe these methods in later subsections.

3.1 Network architecture

The network architecture of the proposed MPR-CNN consisted of ECSA, AMFF, and MNEB is shown in Fig. 1. Here, “DS” and “US” stand for downsampling and upsampling, respectively. First, the MPR-CNN applies a convolutional layer with the filter size of 1 × 3 × 3 × 48 to extract low-level features from the input X (noisy CXR images). Then, the feature maps pass through several layers of MNEB modules that will describe in Sect. 3.2. The MNEB is the fundamental building block of MPR-CNN. Next, we use a convolutional layer with filter size of 48 × 3 × 3 × 1 again to obtain the desired residual image R(X). At last, we can subtract R(X) from X to get the output (denoised CXR images).

Fig. 1
figure 1

Network architecture of the proposed MPR-CNN

3.2 Multi-resolution noise extraction block (MNEB)

The architecture of the MNEB is shown in the dotted box above Fig. 1. The full convolution with filter size of 48 × 3 × 3 × 48 is utilized to keep more precise spatial details and performing with filter size of 48 × 3 × 3 × 96 and 4 × downsampling with filter size of 48 × 3 × 3 × 192 on the original features to gain more context information. Then, we use the ECSA module that will describe in Sect. 3.3 to focus more on texture details in CXR images and reduce the parameters as well. Next, 2 × upsampling with filter size of 96 × 3 × 3 × 48 and 4 × downsampling with filter size of 192 × 3 × 3 × 48 are applied to restore to original feature maps size. Further, the AMFF which is utilized to fuse multi-resolution features will be described in Sect. 3.4. Finally, a convolutional layer with filter size of 48 × 3 × 3 × 48 is applied to extract the residual information from feature maps again. The MNEB module also uses residual learning as same as the whole network to reduce the difficulty of network learning. Multi-resolution parallel convolution streams are utilized for fusing information from high and low resolution features, as well as to enhance the robustness of the model.

3.3 Efficient channel and spatial attention (ECSA)

As shown in Fig. 2, the ECSA module is made up of channel attention and spatial attention, making the network focus more on texture details in CXR images and reduce the parameters as well. The channel attention branch is designed to enhance or suppress different channels for CXR images denoising by modeling the weights of each feature channel. Global average pooling (GAP) is applied to squeeze the input feature maps \(M_{C} \in R^{{H \times W \times C}}\) and yield a feature descriptor \(d \in R^{{1 \times 1 \times C}}\). The excitation operator usually passed through two fully connected layers to dimension reduction and cross channel interaction. However, dimension reduction has side effects on the prediction of channel attention. Therefore, we utilize the 1D convolution with kernel sizes of 5 and 2 paddings to replace the two fully connected layers. The complexity of this method is tiny, and the promotion effect is significant. Next, the sigmoid gating is applied to generate activations \(\hat{d} \in R^{{1 \times 1 \times C}}\). Finally, the output of the channel attention branch is obtained by multiplying MC and \(\hat{d}\).

Fig. 2
figure 2

Efficient channel and spatial attention

The spatial attention branch is designed to locate the target and perform some transformations. Given a feature map \(M_{S} \in R^{{H \times W \times C}}\), GAP, and global max pooling (GMP) are first applied to extract the information along the channel dimensions and then concatenating them to generate a feature map \(F_{S} \in R^{{H \times W \times 2}}\). Next, the \(F_{S}\) passes through a convolution layer and sigmoid activation to generate a spatial attention feature map \(\hat{F}_{S} \in R^{{H \times W \times 1}}\). Finally, the output of the spatial attention branch is obtained by multiplying \(M_{S}\) and \(\hat{F}_{S}\).

The overall pipeline of the ECSA module, a convolution layer with kernel size of 3 × 3 is first applied to extract the low-level features and PReLU is to improve the nonlinear characteristics of the network. After another convolution layer with kernel size of 3 × 3, the feature maps pass through both the channel and spatial attention in parallel. Next, we concatenated the feature maps along the spatial and channel dimensions. Finally, a convolution layer with kernel size of 3 × 3 is used to extract the residual information from feature maps again. The ECSA module is also a residual block.

3.4 Adaptive multi-resolution feature fusion (AMFF)

As shown in Fig. 3, we design the AMFF method based on attention rather than directly add or concatenate multiple resolution feature maps to improve the expression of the network. We first fuse the multiple resolution feature maps by element-wise sum as shown in Eq. (3) and get the feature maps Min.

$$ M_{{{\text{in}}}} = M_{1} + M_{2} + M_{3} $$
(3)

, where M1, M2, and M3 represent 1 ×, 2 ×, and 4 × feature maps, respectively. Then, the M passes through the GAP to extract the average information along the channel dimension and gain a feature descriptor \(D \in R^{{1 \times 1 \times C}}\). Further, we use global depthwise convolution (GDC) in that the number of convolution groups is the same as the channel number, and the size of convolution kernel is the same as that of input feature map, to assign each position a learnable weight and get a new descriptor \(\hat{D} \in R^{{1 \times 1 \times C}}\). Next, we still utilize the 1D convolution with kernel sizes of 5 and 2 padding to cross channel interaction and keep the channel dimension unchanged. Afterward, the sigmoid gating is applied to generate three different attention activations \(S_{1} \in R^{{1 \times 1 \times C}}\),\(S_{2} \in R^{{1 \times 1 \times C}}\), \(S_{3} \in R^{{1 \times 1 \times C}}\). Finally, the output Mout of the AMFF after recalibration and aggregation is defined in Eq. (4).

$$ M_{{{\text{out}}}} = M_{1} \cdot S_{1} + M_{2} \cdot S_{2} + M_{3} \cdot S_{3} $$
(4)
Fig. 3
figure 3

Schematic for adaptive multi-resolution feature fusion

3.5 Loss function and optimization

We propose the MSL1 loss to train our MPR-CNN by adding multi-scale structural similarity (MS_SSIM) [62, 63] and L1 loss. On one hand, MS-SSIM can preserve the contrast in high-frequency regions in CXR images, on the other, L1 loss can keep the color and brightness of CXR images. SSIM is a theoretical method based on image structure similarity and the general form of the SSIM index between signal x and y is defined as Eq. (5), where, l, c, and s represent the light, contrast, and structure. \(\alpha\), \(\beta\) and \(\gamma\) are parameters to allocate the weight of the three attributes in SSIM.

$$ {\text{SSIM}}(x,y) = l(x,y)^{\alpha } \cdot c(x,y)^{\beta } \cdot s(x,y)^{\gamma } $$
(5)

Here, we set \(\alpha = \beta = \gamma = 1\) and the final SSIM index is shown in Eq. (6), where \(\mu _{x}\) and \(\mu _{y}\) are the mean values of signal x and y, \(\sigma _{x}\) and \(\sigma _{y}\) are the variances, and \(\sigma _{{xy}}\) are the covariances of signal x and y. C1 and C2 are small normal numbers to avoid the situation where the denominator is zero in Eq. (6).

$$ {\text{SSIM}}(x,y) = \frac{{(2\mu _{x} \mu _{y} + C_{1} )\;(2\sigma _{{xy}} + C_{2} )}}{{(\mu _{x}^{2} + \mu _{y}^{2} + C_{1} )\;(\sigma _{x} ^{2} + \sigma _{y} ^{2} + C_{2} )}} $$
(6)

Furthermore, MS-SSIM performs SSIM evaluation on images of different resolutions through downsampling which can merge more structural information. Thus, the MS_SSIM loss is shown in Eq. (7), where X denotes the noisy CXR image and the Y represents the clean CXR image.

$$ \begin{aligned} & {\text{MS}}\_{\text{SSIM}}((R(X),V(X - Y)) \\ & \quad = \frac{1}{N}\sum\limits_{{i = 1}}^{N} {\left\{ {1 - {\text{SSIM}}(R(X_{i} ) - V(X_{i} - Y_{i} ))} \right.} \} \\ \end{aligned} $$
(7)

And, Eq. (8) describes the L1 loss.

$$ {\text{L}}1(R(X),V(X - Y)) = \frac{1}{N}\sum\limits_{{i = 1}}^{N} {\left| {R(X_{i} ) - V(X_{i} - Y_{i} )} \right|} $$
(8)

The overall MSL1 loss is given by Eq. (9), which expresses the loss error between the desired residual image V(X–Y) and estimated one R(X) from noisy CXR image, and the \(\theta\) is a constant which is set to 0.2 for all the experiments by ablation study.

$$ {\text{MSL}}1 = (1 - \theta )\;{\text{L}}1 + \theta * {\text{MS}}\_{\text{SSIM}} $$
(9)

We utilize PSNR as the evaluation for our MSL1 loss, which is shown in Table 6 in Sect. 4.4.

In addition, as shown in Eq. (10), the cosine annealing strategy is set as an optimization method and decreases the learning rate from initial value 5e−4 to 5e−6 during training. Here, the \(\eta\) stands for initial value and T is empirically set as 5.

$$ \eta _{t} = \frac{1}{2}\left( {1 + \cos \left( {\frac{{t\pi }}{T}} \right)} \right)\eta $$
(10)

4 Experiments

In this section, we first describe the datasets and then give the implementation details. Next, we compare our MPR-CNN with some state-of-the-art denoising methods. Furthermore, ablation studies are designed to explore the impact of each of our architectural components and choices on the final performance. Finally, we innovatively verify the impact of the MPR-CNN for CXR images classification.

4.1 Datasets

We evaluate the denoising performance of our MPR-CNN via COVID-19 radiography database [64], which consists of 1341 normal CXR images, 1345 CXR images of viral pneumonia, and 219 CXR images of COVID-19, collecting by several research organizations. The size of CXR images is 1024 × 1024. Here, we randomly select 400 normal CXR images, 400 CXR images of viral pneumonia, and 170 CXR images of COVID-19 as training data. Then, we select 30 images in each category as validation data, and 15 images in each category as test data. To speed up the training process and keep as much detail as possible in each CXR image, we extract the training data to 128 × 128 and gain 16 × 10,552 patches (16 is the mini-batch size and 10,552 represents an iterative number for one training epoch) as training label by scaling and rotation. The main noise in CXR image is the granular noise, which is caused by the receiving device (film), and the granular noise accords with Gaussian distribution. So we decided to add white Gaussian noise (WGN) in the patches with standard deviation to simulate the low-dose noisy CXR images with \(\sigma = [0,{\text{ }}55]\) different level noise as the input of MPR-CNN (Fig. 4).

Fig. 4
figure 4

Denoising results of different methods on normal CXR image with σ = 15. a Original CXR image, b noisy CXR image/24.731 dB, c NLM/38.313 dB, d DnCNN/38.402 dB, e IRCNN/38.432 dB, f FFDNet/38.682 dB, g ESRGAN/38.786 dB, h MPR-CNN/38.784 dB

Also, we further make classification experiments to verify the impact of the MPR-CNN for CXR images classification. To balance classification data while classifying, we collect another 605 CXR images of COVID-19 from three datasets: (1) Fig. 1 COVID-19 chest X-ray Dataset [65], (2) COVID-19 Image Data Collection [66], and (3) ActualMed COVID-19 chest X-ray Dataset [67]. There are three types of cases that are COVID-19, normal, and viral pneumonia. The detailed component distribution of the classified dataset is shown in Table 1.

Table 1 Component distribution of the classified dataset

4.2 Implementation details

The proposed MPR-CNN is end-to-end trainable that does not require any pre-training of sub-modules. The model is trained with the Adam optimizer (\(\beta _{1} = 0.9\), and \(\beta _{2} = 0.999\)) that is an extension of the stochastic gradient descent algorithm, and the cosine annealing strategy is set as an optimization method and decreases the learning rate from initial value 5e-4 to 5e-6 during training. The mini-batch size is set as 16 and 10,552 iterations for one training epoch. We train 30 epochs to fit our model. Specifically, we apply Pytorch 1.4.0 and Python 3.5 to train and test the proposed MPR-CNN in CXR image denoising on the Ubuntu 16.04 from a PC, composed of an Intel Core i7-7800X CPU with 3.50 GHz, a RAM 16G, and two Nvidia GeForce RTX 2080 Ti GPU.

4.3 Comparisons with state-of-the-art denoising methods

In this subsection, we test the denoising performance of our MPR-CNN in terms of both subjective and objective evaluation, comparing with 7 state-of-the-art denoising methods, such as NLM [35], BM3D [37], DnCNN [43], IRCNN [49], FFDNet [53], SRGAN [45], and ESRGAN [46].

In terms of subjective evaluation, we compare the denoising ability for different noise levels and different scaling factors with peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index. PSNR is a measure to evaluate the ability of model to remove noise, while the SSIM is a measure of the similarity of two images. The value is higher, the corresponding denoising method has a better performance. Table 2 describes the average PSNR (dB) and SSIM of different methods on test data with different noise levels of 15, 25, and 40. The proposed MPR-CNN achieves the best performance on noise levels of 15, 25, and 50 in SSIM, although the value of PSNR is a little bit lower than ESPGAN when σ = 15. Also, when σ = 25 the value of PSNR is 0.84 dB, 0.627 dB, and 0.034 dB more than BM3D, DnCNN, and ESRGAN, respectively. Especially, the value of SSIM is 0.037, 0.020, and 0.011 more than NLM, IRCNN, and SRGAN. It is noted that our MPR-CNN achieves excellent results on denoised tasks of different noise levels (Fig. 5).

Table 2 Average PSNR (dB) and SSIM of different methods on test data with different noise levels of 15, 25, and 40
Fig. 5
figure 5

Denoising results of different methods on CXR image of viral pneumonia with σ = 25. a Original CXR image, b noisy CXR image/20.306 dB, c BM3D/36.482 dB, d DnCNN/36.931 dB, e IRCNN/36.973 dB, f SRGAN/37.154 dB, g FFDNet/37.263 dB, h MPR-CNN/37.577 dB

Table 3 describes the average PSNR (dB) and SSIM of different methods on three types of CXR images with noise levels of 30. It is noted that our MPR-CNN is also superior to competing methods on each type of CXR image and has a better denoising performance on types of viral pneumonia and COVID-19 than the normal type.

Table 3 Average PSNR (dB) and SSIM of different methods on three types of CXR images with noise levels of 30

Moreover, to evaluate the ability of the proposed model for the blind Gaussian denoising, we also added WGN to Fig. 6a with standard deviation σ = {10, 15, 20, 25, 30, 35, 40, 45} and the line chart of PSNR and SSIM are shown in Fig. 7. The blue solid lines represent the denoising result of our MPR-CNN, and one can clearly see that the values of PSNR and SSIM of our MPR-CNN are higher than other competing methods at most time, although the value of PSNR is a little bit lower than ESPGAN when σ < 25. Figure 7 demonstrates that our proposed MPR-CNN is robust for the blind CXR images denoising.

Fig. 6
figure 6

Denoising results of different methods on CXR image of COVID-19 with σ = 40. a Original CXR image, b noisy CXR image/16.732 dB, c NLM/33.566 dB, d DnCNN/34.403 dB, e IRCNN/34.982 dB, f FFDNet/35.192 dB, g ESRGAN/35.626 dB, h MPR-CNN/35.632 dB

Fig.7
figure 7

Value of PSNR and SSIM of denoised results of Fig. 6a using different method

Average PSNR (dB) and SSIM of different methods on test data with different scaling factors are shown in Table 4. Here, we set three scaling factors, × 2, × 4, and × 8, and the corresponding CXR images sizes are 512 × 512, 256 × 256, and 128 × 128. The noise level is still set to 25. According to Table 3, it is noted that our MPR-CNN has a better performance in denoising CXR images with different scaling factors than other methods.

Table 4 Average PSNR (dB) and SSIM of different methods on test data with different scaling factors

For computation time, we select 4 state-of-the-art denoising methods to perform the test for CXR images denoising. The size of the CXR image is set as 128 × 128, 256 × 256, and 512 × 512 as illustrated in Table 5. From that, we can find that the inference time of our MPR-CNN is very competitive in contrast to other popular methods.

Table 5 Computation time of 4 popular denoising methods for the noisy images of sizes 256 × 256, 512 × 512, and 1024 × 1024

4.4 Ablation studies

We design ablation studies to explore the impact of each of our architectural components and choices on the final performance. All the ablation experiments use the same test data, adding WGN with standard deviation \(\sigma = 25\) to simulate the low-dose noisy CXR images. First, we analyze the impact of different loss function for denoising CXR images in Table 6. It shows that the proposed MSL1 loss with \(\theta = 0.2\) has the most outstanding denoised performance than other loss function, which increases 0.243 dB more than L1 loss and 0.46 dB more than MS-SSIM. Furthermore, the SSIM also has a certain promotion. It could be concluded that MSL1 loss can preserve the contrast in high-frequency regions in CXR images and keep the color and brightness as well.

Table 6 Average PSNR and SSIM of denoised CXR images by different loss functions

Then, we study the influence of the number of multi-resolution streams in the MNEB for the CXR images denoising quality in Table 7. According to Table 7, we can note that the MNEB with two different resolution streams is better than one of single one and three different resolution streams have the best performance. Therefore, it could be concluded that increasing the number of streams can provide a significant improvement for CXR images denoised and the MNEB is important to improve the CXR images quality.

Table 7 Result of different components of multi-resolution streams in MNEB

Finally, in Table 8, we make ablation studies on the impact of proposed ECSA and AMFF for CXR images denoised. From the first three columns, we can note that the AMFF based on attention, rather than simple concatenation or summation for feature fusion, can improve the expression of the network, which increases by 0.318 dB more than summation and 0.108 dB more than concatenation. Moreover, it is also evident from Table 8 that the ECSA module has a positive effect on our MPR-CNN, which, respectively, increases by 0.019, 0.020, 0.020 in SSIM.

Table 8 Influence of individual components of MNEB

Extensive ablation experiments prove that the proposed MS-SSIM loss can preserve more detailed information in CXR images as well as the MNEB, and the ECSA and AMFF both have positive influences on the final CXR images quality.

4.5 Verify the impact of the MPR-CNN for CXR images classification

In this subsection, we not only make the denoising experiments but also use the denoised CXR images by MPR-CNN to classify CXR images. To evaluate the effectiveness of the MPR-CNN, we use three classic classified networks: ResNet18 [47], VGG19 [68], DenseNet121 [69] (Fig. 8).

Fig. 8
figure 8

Comparison of the confusion matrices of different models denoised by MPR-CNN. First line: classification using noisy CXR images. Second line: denoised CXR images by MPR-CNN, then feed into classification models

We use a classified dataset that has been introduced in Sect. 4.1 for CXR images classification. The images size are set to 512 × 512, which adding WGN with standard deviation \(\sigma = 20\) to simulate the low-dose noisy CXR images. Here, the vertical data represents the true value, while horizontal data stands for the predicted one. Especially, the number of diagonals represents the correct classifications. Moreover, the normal cases correctly classified, respectively, increase by 5 and 6 after denoising using VGG19 and DenseNet121. Then, the viral Pneumonia cases correctly classified, respectively, increase by 63 and 38 after denoising using ReseNet18 and VGG19. Especially, the correctly classified COVID-19 cases, respectively, increase by 11, 17, and 4 using ResNet18, VGG19, and DenseNet121. Hence, it can clearly note that the MPR-CNN has a positive impact on the CXR images classification.

The classification effects between different models denoised by DnCNN and MPR-CNN are shown in Table 9. To quantify the classified networks, we calculated the test accuracy (ACC), sensitivity (SEN), and precision (PRE) of each infection type on the above classified dataset. Here, the higher value the SEN corresponds to the lower the probability of missing positive cases. Moreover, the higher value the PRE results in the lower the probability of misdiagnosing negative cases. After denoising by MPR-CNN, the ACCs of the ResNet18, VGG19, and DenseNet121 are, respectively, improved by 8.96%, 8.53%, and 8.52%, while the PREs, respectively, improved by 7.41%, 7.26%, and 7.37%. Comparing to DnCNN, the SENs have improved by 0.56%, 1.13%, and 1.56%, respectively, using ResNet18, VGG19, and DenseNet121. Meanwhile, the classification performance of denoised CXR image by MPR-CNN is very close to original one. The ACCs have just decreased by 0.57%, 0.30%, and 0.32% using ResNet18, VGG19, and DenseNet121.

Table 9 Comparison of classification effects between different models denoised by DnCNN and MPR-CNN

Furthermore, it could be concluded that classification models fed into CXR images by MPR-CNN have a lower probability of missing COVID-19 cases, as well as a lower probability of misdiagnosing negative cases.

5 Conclusion

In this paper, we propose a novel MPR-CNN for CXR images denoising and special application for COVID-19 that can improve the images quality. Multi-resolution parallel convolution streams are utilized for fusing information from both high and low resolution features. The ECSA module is proposed to make the network focus more on texture details in CXR images as well as to reduce the parameters. The AMFF method based on attention is utilized to improve the expression of the network rather than simple concatenation or summation for feature fusion. The MSL1 loss is utilized to preserve the contrast in high-frequency regions in CXR images and keep the color and brightness as well. The extensive experiments demonstrate that all the proposed methods have significant impacts on CXR images denoising. Comparing to competing methods, our MPR-CNN has the best performance in both subjective visual evaluation and objective indicators. It is noted that our proposed MPR-CNN is very robust for blind CXR images denoising. Moreover, extensive experiments show that the proposed MPR-CNN has a positive impact on CXR images classification and detection of COVID-19 cases from denoised CXR images. On the whole, the proposed MPR-CNN can provide a more clear and rigorous diagnostic basis both for radiologists and machines. We will continue to focus on the development of COVID-19, and our future work will concentrate on effectively reducing the noise artifacts in COVID-19 CXR images with the current powerful method. Improving the quality of COVID-19 CXR images, to classify and detect of COVID-19 cases more accurately from denoised CXR images.