Abstract
Presentation attack detection approaches have achieved great progress on various attack types while adversarial learning technology has become a new threat to these approaches. Now few works are devoted to developing a robust detection method for both physical spoofing faces and digital adversarial faces. In this paper, we find that fake face images from printed photos and replayed videos have a different optical characteristic from the real ones, and the adversarial samples generated by various attacking methods retain this characteristic. By exploring this characteristic, we propose the Spectral Characteristic Presentation Attack Detection (SCPAD), a new approach that detects presentation attacks by reconstructing the color space of input images, which also performs well on adversarial samples. More specifically, a new HSCbb color space is manually constructed by studying the difference in albedo intensity between real faces and fake faces. Then the difference between real and spoofing faces can be effectively magnified and modeled by color texture features with the shallow convolutional network. The experimental results show that our proposed method consistently outperforms the state-of-the-art methods on adversarial faces and also achieves competitive performance on fake faces.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
With the development of machine learning, biometric authentication technology is widely applied in different situations, including human-computer interactions (HCI), security and surveillance, traffic control, and public areas such as airports, subway stations, and event centres [1,2,3]. This application has prompted substantial development in the field of computer vision for more than the last decade now. Biometric authentication, especially face recognition, plays a crucial role in intelligent city construction. Algorithms in this area have achieved extraordinary performance thanks to considerable advancements in neural networks, particularly deep learning (DL). Despite all successes recorded in face recognition through the use of DL models, the security of face recognition algorithms remains an open issue, which has gathered a lot of research attention.
Because of the low cost, presentation attack has been a common but serious threat to the security of identity validation system based on face recognition. Nowadays, it is easy for people to spoof a biometric system as massive photos can be collected from the Internet. Usually, there are three popular ways to physically spoof biometric systems, namely, printed photo, replayed video and 3D mask [31]. Among these three types, the printed photo is obtained from still face images of clients, the replayed video from the face videos of clients, and the 3D mask is made with silicone rubber according to the client’s face shape.
Early thoughts on the research of presentation attack detection (PAD) were interactive. While detecting fake faces interactively, users were asked to do some specified actions within a limited time (e.g., blink and nod). Such PAD methods depended on precise motion detection and were vulnerable to replay-video attacks, so now the technology for presentation attack detection shifts to static and non-interactive detection, which are easy development and have no user interaction.
Static PAD methods tried to find the difference in the texture or spectrum between real and fake face images [19]. Local Binary Pattern (LBP) descriptors are the most widely used technique in texture-based PAD methods. Boulkenafet et al. [9] used a multi-level LBP descriptor to analyze the color texture of face images. The development of the Convolutional Neural Network (CNN) brought adaptive ways to extract features. Liu et al. [24] designed a CNN to get the depth map for distinguishing real and fake faces.
Later studies did some effort to make a connection between face image signals and vital signals(e.g., pulse or heart rate). Hernandez-Ortega et al. [20] utilized pulse cues for video replay attack detection. Skin blood flow analysis was adopted [36] too. Lin et al. [23] used Remote Photo Plethysmography (rPPG) in their patch CNN-based method to help the image feature extraction.
For 2D spoof face images, Patel et al. [30] mentioned four main interfering factors that cause image distortion: 1) Spoof medium surface reflection, 2) Color distortion, 3) Moiré Pattern, and 4) Face shape deformation. Wen et al. [40] pointed out that the color distortion of printed attacks is due to the quality of the printer and photo paper, while the color distortion of replay attacks is mainly caused by the fidelity and resolution of the screen. They then proposed a series of feature descriptors to distinguish real and spoof faces. Additionally, both paper and digital screens have different reflective properties than the skin of a face [42]. All in all, current studies tried to find the relationship between the designed features or descriptors, but they failed to physically explain why the descriptors can make sense.
There are still two main shortcomings in current CNN-based PAD methods, even though they achieve much better performance than the classic methods. First of all, many researchers failed to explain why the features or textures they extracted were useful for PAD. Some literature proposed that an image may be subject to various distortions due to spoofing medium, camera, and printing [30, 40], but the relationship between their chosen feature and the distortions the image suffered is still not clear. The lack of interpretability causes another serious problem: PAD methods are not considered to be robust to potential adversarial samples (faces). Recent work has shown that the small, deliberate designed, and generated perturbation added to the input images for a classifier can cause the classifier to give wrong labels [14, 29, 41]. When the classifier is trained to distinguish the spoofing one from all faces, the adversarial faces could be able to fool the classifier, so that a malicious attacker can pretend to be an authenticated user [6].
To solve this problem, we have to explore the essential difference between spoofing and real faces. Real and spoofing faces can be very distinctive in the different chrominance spaces. For instance, Li et al. [25] analyzed the impact of different color spaces on face anti-spoofing and presented a CNN model based on color features, achieving good performance by constructing a learnable space. But the learnable space is not robust to the adversarial face and is not explainable enough. So other chrominance clues might be deserved to be explored for recognizing the spoofing faces as well as adversarial faces. Considering the existence of hemoglobin under real faces, the albedo of the real face has differences from the spoofing face in some bands, i.e., blue-related channels. Thus, it is available to construct a new color space (dependent space), consisting of some color channels to emphasize the albedo difference. For photo-print and video-replay spoofing faces, the albedo difference cannot be concealed during the presentation. Also, it is difficult for gradient-based adversarial attack methods to generate a pixel with some colors because of the difference between the color gamuts from the digital and real worlds [26].
To study the albedo difference, we first study the albedo curves of materials for capturing images, which are used to show spectral characteristic. Specifically, aiming at the problem of detecting spoofing faces created by printed photos and replayed videos, we focus on the spectral albedo curves of printing papers, digital screens (e.g., a mobile phone’s screen), and the human skin. As shown in Fig. 1, a face image in a device is generated through illumination and reflection. The imaging process can be divided into three steps: 1) Illumination, 2) Reflection and 3) Capture. The process of illumination is influenced by the light source and air, while the process of capture is influenced by the camera. The difference between real and fake faces occurs during reflection because different mediums have different optical reflection characteristics.
Furthermore, we show that the spectral albedo curve of human skin has obvious an difference from the curve of printing papers and electronic device screens in the blue-violet spectral region. Based on the difference between spectral albedo curves of different materials for showing images, it is possible to reconstruct a new color space called HSCbb to emphasize the difference and distinguish spoofing faces. Therefore, by abandoning the color channels with weak robustness against variation of illumination, a new spoofing face detection algorithm under the new color space (SpectralCharacteristicPresentationAttackDetection, SCPAD) can be more robust to complicated illumination. We also build a shallow backbone to extract the feature in HSCbb space. Due to the low linearity and depth, the proposed shallow backbone is robust against adversarial faces.
The main contributions of our work are summarized as follows:
-
1.
We construct a color space physically based on the spectral characteristic by reflecting the albedo of spoofing faces, which is different from the real faces in a spectral wavelength.
-
2.
We propose a new method (SCPAD) with the shallow network to extract the discriminative feature for distinguishing spoofing faces, which is also robust against adversarial attacks.
2 Related work
Many approaches for non-interactive detecting presentation attacks have been presented in the last decade. These approaches can be divided into (i) handcraft-features-based methods and (ii) deep-learning-based methods.
2.1 handcraft-features-based presentation attack detection
The handcraft-features-based PAD methods usually aim to design a series of features, and two categories of solutions are illustrated in the following.
(1) Extracting Texture Features: Boulkenafet et al. [8] mentioned the contribution of chrominance information in face anti-spoofing and analyzed how the color texture affects the classification result. Cai et al. [11] used Gray Level Coccurrence Matrix (GLCM), LBP, and deep feature to solve the problem of illumination change.
(2) Extracting Other Features: Some methods also considered the interframe information. Some motion-based methods exploit impulsive movements of the facial parts in the input videos [4]. Hernandez-Ortega et al. utilized pulse cues from videos [20]. Wang et al. presented a face liveness detection approach based on blood flow analysis [36].
For the classic methods, related methods for PAD should be affirmed. However, since the classical methods are difficult to extract features adaptively, such methods on handcraft feature extracting are not universally applicable to some special cases. The performance of the classic methods may become unsatisfactory because of the impairment of the face images’ quality, which usually happens in complex environments (e.g., light deficiency).
2.2 Deep-learning-based presentation attack detection
To date, face anti-spoofing methods mainly focus on convolutional neural networks (CNNs). Atoum et al. [5] used patch and depth-based CNN to extract features. Considering the difference between presentation attack face and real face, Li et al. [25] learned a color-liked space to extract some color texture features. Quan et al. [32] considered learning a face anti-spoofing by fewer samples through a semi-supervision framework. Also, there are some works about recognizing spoofing faces based on multi-source data, such as depth image [17, 37, 39].
As a whole, the deep-learning-based methods achieve excellent performance for PAD, especially in some complicated environments and with a large amount of training data. But these methods are vulnerable to some elaborately designed adversarial samples, which makes results unstable. The main reason is that the feature extracted by CNN is often not interpretability, which means that no stable relationship between the face image and feature is built.
3 Proposed method
In most relevant works, PAD is considered a classification problem. Thus, mainstream PAD solutions try to extract some features which characterize the difference between real and spoofing faces, such as local and global texture, spectrum, histogram, etc. Recent works prefer to extract those features by convolutional neural networks or to put it differently, by a combination of different convolutional filters. But the above works aren’t very successful to explain why the features make sense on the PAD problem. The lack of interpretability of the PAD algorithms may lead to a hidden disaster: vulnerability against adversarial attack. To solve this problem, we concentrate on the cause of face images, especially the spoofing ones.
In this section, we first explain the albedo difference between real and spoofing faces from the view of hemoglobin and then propose a spectral-characteristics-based face anti-spoofing method called Spectral Characteristic Presentation Attack Detection (SCPAD). This SCPAD method builds a new HSCbb color space and recognizes a spoofing face by a shallow convolutional network. The main architecture of our method is presented in Fig. 2.
The process of light absorption and reflection in skin. The incident light is absorbed by melanin in epidermis and by hemoglobin in dermis [45]
3.1 Motivation
The methods mentioned in Section 1 inspired us to consider the following problem: how do vital signals, especially skin blood flow, cause image distortion? Svaasand et al. [35] researched the optical characteristics which determine the albedo of the skin with blood flow, their work reveals the principle of how skin blood cause spoof face color distortion. In addition, according to current studies [8, 25] about detecting spoofing face by color feature extraction, we can know the importance of color information for PAD.
Therefore, we could draw the basic conclusions from the above analysis: (1) Color distortion is the main reason for the 2D spoof face image distortion. (2) Skin blood plays an important role in 2D spoof face image color distortion. Based on the above conclusions, the key point of our proposed method is to design a color space to highlight the color distortion in 2D spoof face images. By studying the spectral characteristic in the albedo of face images, we finally build an HSCbb color space to highlight the color distortion in presentation attack images.
3.2 HSCbb color space
In early works using manually extracted features for face anti-spoofing, e.g. [5, 8, 11], the authors trained the classifiers with face samples in RGB color space. As the most used color space for image processing, RGB space includes three color components (i.e., red, green, and blue), and it mainly considers the information difference between luminance and chrominance. But for face anti-spoofing, luminance is often an interference factor, so it is necessary to eliminate its influence. Then, in the following works for face anti-spoofing [5, 8, 25], the face images are transformed into new color spaces, e.g., HSV and YCbCr, to focus on the influence of chrominance.
3.2.1 Spectral characteristic in albedo
Since the color space of the face image is influencing the performance of the face anti-spoofing classifiers, it may be possible to find an effective color space for face anti-spoofing problems, just like CHROM for heart rate detection [7]. Due to the existence of hemoglobin under human face skin (the process of light absorption and reflection in the skin is shown in Fig. 3 [45]), there should be some specific spectral characteristic in the albedo. The process of light reflection in the printed paper is shown in Fig. 4 [21] and the process of luminescence of digital screen (OLED and LCD screen) is shown in Fig. 5 [27]. The color space must reflect the difference between the real face and the spoofing one and eliminate the impact of luminance. The absorption coefficient versus the wavelength of facial skin was studied. It tells us that the human face has some special spectral characteristics as revealed in (1) [35]:
where \(\lambda \) indicates the wavelength of light shining on the skin (the unit is Hz).
The process of light reflection in paper with ink [21]
The process of luminescence of OLED and LCD screen [27]
Equation (1) [35] is an empirical-analytical approximation of the absorption coefficient of hemoglobin, where the spectral dependence is 80% oxygenated blood at the physiological concentration (150g l-1 hemoglobin or H=0.41), and the wavelength is the level of the nanometer (nm) [35]. The proof and derivation of (1) can be checked in other studies [35], which is not the focus of our work. Based on (1), the albedo coefficient (100% minus absorption coefficient) versus wavelength of hemoglobin is shown in Fig. 6 (the blue curve). Also, the albedo coefficient versus the wavelength of printed paper and digital screen are shown in Fig. 6 (the red and orange curves). The albedo coefficient is calculated as (2) [42].
where \(I_{s}\) is the specular light intensity, I is the incident light intensity. \(c_{a}\) measures the proportion of specularity in the overall measured intensity (i.e., the albedo coefficient), which is invariant to the intensity scaling. It is noteworthy that the unit of I and \(I_{s}\) is lux and \(I, I_{s}\in [0, \infty ]\), while \(c_{a}\) is a dimensionless scalar and \(c_{a}\in [0, 1]\).
The albedo coefficient versus the wavelength of three different materials, including hemoglobin (the blue curve), printed paper (the red curve), and digital screen (the orange curve). When the wavelength is between 390 nm and 480 nm, the albedo coefficient curves of the three materials are quite different. The green area represents the difference region between printed paper and the digital screen while the cyan area represents the difference region between digital screen and hemoglobin
In Fig. 6, we can see that the albedo coefficient of hemoglobin is close to 0 when the wavelength is higher than 390 nm and lower than 480 nm, and it almost becomes 100% when the wavelength is higher than 510 nm. Besides, the albedo coefficient of printed paper is close to 100% when the wavelength is higher than 420 nm. The albedo coefficient of the digital screen is increasing with the wavelength when the wavelength is between 420 nm and 480 nm, and it becomes beyond 70% when the wavelength is higher than 480 nm.
3.2.2 The discriminative color space
According to the above spectral characteristic in albedo, we can construct a color space to reflect this difference and then distinguish real faces and faces presented by printed paper or replayed video. Among the four common color spaces RGB, HSV, YCbCr, and Lab, there is a total of 12 channels. In the RGB color space, the three primary colors, i.e., red (R), green (G), and blue (B), are defined. In the HSV color space, the hue and the saturation dimensions define the chrominance of the image while the value dimension corresponds to the luminance. The YCbCr space separates the RGB components into luminance (Y), chrominance blue (Cb), and chrominance red (Cr). The Lab space has three channels, among which L represents luminance, while a and b represents two chrominance colors.
Due to the spectral characteristic of hemoglobin, the printed paper and digital screen, if a channel mainly reflects the blue-violet band (with the wavelength higher than 390 nm and lower than 480 nm, shown in Fig. 7) of a pixel, it may be a useful channel for constructing an anti-spoofing-effective color space. Among the 12 channels, B, H, S, V, Y, Cb, L and b reflect the components related to the blue-violet band of visible light. To further eliminate the influence of the light intensity variance, we have to abandon B, V, Y, and L when constructing the new color space for face anti-spoofing. Finally, we only retain four channels, i.e., H, S, Cb, and b, for constructing the new HSCbb color space to distinguish spoofing faces. Remarkably, the HSCbb is not a channel-independent color space, which has four nonorthogonal channels.
Considering that the performance of the constructed color space may be impacted by some other environmental factors, we use a \(4 \times 4\) linear mapping layer to make the constructed color space precisely appropriate for spoofing faces in specific environments, which can be regarded as fine-tuning of the constructed color space.
3.2.3 Robustness against adversarial samples
Most adversarial attacks try to modify some pixel values in an image to mislead a classifier or a detector [12, 34]. In the physical world, the change of pixel values can be considered as a kind of modification to the local albedo of an object [44], thus an adversary is always essentially trying to change some local albedo if he wants to create a spoofing face image. Based on the above understanding, we can know that the goal of an adversarial perturbation on a spoofing face is to reduce the gap of albedos in some special wavelengths between real and spoofing faces.
According to current research [18], we can know that the adversarial sample is an unexpected result of the convolution neural network’s over-linearization and over-parametrization. Thus, we can naturally consider avoiding a CNN to be too linearized and parameterized. In our method, we choose a shallow CNN for reducing the number of parameters to about 2.1 million. Also, the color channels we have chosen contribute to improving the nonlinearity:
Equation (3)-(6) indicate the transformation from the commonly used color channels (R, G, and B) to our chosen channels (H, S, Cb, and b), respectively. \(f(\cdot )\) indicates the calibration function and is calculated as (7):
We can see that the representations of H, S, and b color strength are nonlinear so the input in the constructed HSCbb color space could help reduce the over-linearization of the CNN to a certain extent. Most gradient-based adversarial attack methods is trying to find a perturbation \(\Delta \textbf{X}\) to modify the spoofing face (clean sample) \(\textbf{X}\) to a deceptive sample \(\mathbf {X'}\) by utilizing the gradient of the loss function \(\nabla _{\textbf{X} }L(\textbf{X}, y)\). If we express the clean sample \(\textbf{X}\) with a color space (A, B, C) in the form like \((\textbf{X} _{A}, \textbf{X} _{B}, \textbf{X} _{C})\), the gradient \(\nabla _{\textbf{X} }L(\textbf{X}, y)\) can be written as \(\nabla _{(\textbf{X} _{A},\textbf{X} _{B},\textbf{X} _{C} ) }L(\textbf{X}, y)\), then the \(\Delta \textbf{X}\) can be written as \(g(\frac{\partial L(\textbf{X} ,y)}{\partial \textbf{X}_{A}}, \frac{\partial L(\textbf{X} ,y)}{\partial \textbf{X}_{B}},\frac{\partial L(\textbf{X} ,y)}{\partial \textbf{X}_{C}})\), where \(g(\cdot )\) indicates some linear combination of \(\frac{\partial L(\textbf{X} ,y)}{\partial \textbf{X}_{A}}\), \(\frac{\partial L(\textbf{X} ,y)}{\partial \textbf{X}_{B}}\) and \(\frac{\partial L(\textbf{X} ,y)}{\partial \textbf{X}_{C}}\), decided by dense layers. For instance, most current attack methods use the gradient in RGB space. Thus, if a malicious sample generated from the gradient of the loss function, \(\frac{\partial L(\textbf{X} ,y)}{\partial \textbf{X}_{H}}\), \(\frac{\partial L(\textbf{X} ,y)}{\partial \textbf{X}_{S}}\), \(\frac{\partial L(\textbf{X} ,y)}{\partial \textbf{X}_{Cb}}\), \(\frac{\partial L(\textbf{X} ,y)}{\partial \textbf{X}_{b}}\) have more impacts on the generation of adversarial perturbations \(\Delta \textbf{X}\), so it is feasible to enlarge \(\Delta \textbf{X}\) in a constructed HSCbb color space, in which the feature map is extracted.
3.3 Shallow convolutional network
In our work, we build a shallow network as the feature extractor and binary classifier in an end-to-end way. After the input color space of the backbone network has been transformed to the proposed model, the dimension of an input image should be \((4, input\_size, input\_size)\), while the size of the image is \((input\_size, input\_size)\) (e.g., (32, 32) for the Replay-Attack dataset [13]). Since the space of the input face image has been designed according to the spectral characteristics of the albedo, the backbone network to extract the feature should focus on the spatial information in a face image. Thus, the backbone network should be shallow and with fewer convolutional filters. Also, a shallow backbone can prevent the classifier from getting much semantic information, which is mainly related to the identity of the face’s owner rather than the image’s vitality. We simplify the architecture of VGG19 [33] by removing several convolutional layers and changing the layout of various layers. This shallow network has only 9 layers, including 5 convolutional layers, 2 pooling layers, and 2 dense layers. All convolutional layers are bundled with the ReLU activation function and employ \(3\times 3\) filter size. The first three convolutional layers are in a group and followed by one pooling layer and the other two convolutional layers are in the second group and followed by one pooling layer. The first and fourth convolutional layer’s padding strategy is ’same’, while other convolutional layers’ are ’valid’. The pooling layers are 2-dimension max-pooling layers with a \(2\times 2\) pooling window. After every pooling layer, there is a dropout layer with a dropout-rate=0.25, and after the first dense layer, there is a dropout layer with a dropout-rate=0.5. The dropout layer is used to prevent over-parametrization. the first dense layer decreases the length of the feature vector from 4096 to 512, and the second dense layer outputs a feature vector with 2 elements.
For the implementation of our proposed method, TensorFlow 1.14 is used to implement the proposed networks. For the optimization solver, Root Mean Square Prop (RMSprop) is adopted with the learning rate beginning at 0.0001 and the decaying value at 0, and the setting value of rho to 0.99. The Binary Cross Entropy is used as the loss function, while the accuracy is chosen as the training metric.
The existence of adversarial samples can be regarded as a malicious usage of the network’s over-parametrization, so one method being useful for preventing over-parametrization is also useful against adversarial samples. It is widely accepted that a network with fewer parameters (most of the time it means a shallow architecture) is more robust against over-parametrization, while a deep neural network with billions of parameters can be vulnerable. The pooling layers can reduce the complexity of the network, especially the number of parameters in the dense layer. The experimental result shows that our constructed network with the shallow architecture has good robustness against adversarial samples.
4 Experiments
In this section, we first introduce the detail of the Replay-Attack dataset [13] and OULU-NPU dataset [10] about experiments and verify the necessity of each step of the proposed method. Finally, the presentation attack detection result is compared with state-of-art methods to prove the effectiveness of the whole algorithm.
4.1 Dataset
In the context, two datasets, i.e., Replay-Attack and OULU-NPU, are used for face anti-spoofing to validate the performance of our proposed method on clean samples (without adversarial perturbation):
-
Replay-Attack: The IDIAP Replay-Attack dataset [13] consists of 1300 video clips of real and attack attempts to 50 clients, which are divided into 3 subject-disjoint subsets for training, development, and testing (15, 15, and 20, respectively). The genuine videos are recorded under two different lighting conditions: controlled and adverse. Two types of attacks is created: replay attacks and print attacks. In the replay attacks, high-quality video and images of the real client are replayed on iPhone 3GS and iPad display devices. For the print attacks, high-quality images were printed on A4 papers and presented in front of the camera. Here is a part of the face images from the Replay-Attack dataset shown in Fig. 8.
-
OULU-NPU: The OULU-NPU dataset [10] consists of 4950 real access and attack videos and attempts 55 clients. Similar to the Replay-Attack database, all clients are divided into 3 subject-disjoint subsets for training, development, and testing (20, 15, and 20, respectively). These videos were recorded using the front cameras of six mobile devices in three sessions with different illumination conditions and background scenes. Two types of fake faces are created: printed photo and replayed video attacks. The attacks were created using two printers and two display devices. For the replayed video attacks, the original face videos were recorded by 6 different cell phones. Here is a part of the face images from the OULU-NPU dataset shown in Fig. 9.
To test the adversarial-defending ability of our proposed method, we have to create an adversarial dataset of face anti-spoofing, because there is no ready-made face anti-spoofing dataset. To generate our new dataset, we choose three common adversarial attack methods: FGSM [18], BIM [22], and deepfool [28] to generate the adversarial data from the two above datasets by employing the mainstream attack methods. The three attack methods are commonly used to research convolutional networks’ adversarial robustness. They are gradient-based methods, which means they search the adversarial perturbation in the decreasing direction of the gradient of the target model’s loss function. It should be noted that in this paper, the target model indicates the PAD model, while the target model’s loss function indicates the PAD model’s loss function. In other words, the adversarial attack methods are to attack all the PAD models.
For performance evaluation, we follow the overall protocol associated with two datasets. For the OULU-NPU dataset, we follow protocol 1 to evaluate our proposed method. The results are reported in terms of Equal Error Rate (EER) and Average Classification Error Rate (ACER) on the test set. EER is used to measure the error rate when positive and negative samples are with an equal error rate, while ACER is used to measure the average error rate of positive and negative samples. Besides, there is a variable epsilon, which means the strength of the adversarial perturbation. We choose four different epsilons 0.5, 0.05 and 0.005 to generate the adversarial test samples.
4.2 Experimental environment
The hardware used in this experiment consists of rack servers equipped with 1 TB hard drives, 64 GB of memory, and Intel 6th generation processors. One Nvidia GeForce RTX 2080 Ti discrete graphics card with 11 GB video memory is used to accelerate computing.
4.3 Ablation experiment
Table 1 presents the results of using the same shallow convolutional network with different channels and spaces. For the Replay-Attack dataset, the best EER and ACER are obtained in the Cb channel, which is much better than those obtained in other channels. The best EER on the OULU-NPU dataset is obtained in the H channel, while the best ACER on the OULU-NPU dataset is obtained in the b channel. It can be seen that deep features in H, S, Cb, and b channels achieve the best performance compared to other channels. Among the different spaces, the HSCbb space achieves the best performance (Table 2).
4.4 Comparison with existing methods
As shown in Table 3, all experiments are executed to compare other networks’ performance on the OULU-NPU dataset, Replay-Attack dataset, and their adversarial datasets. It can be seen from Table 3 that the best performance is provided by CDCN on the Replay-Attack dataset in clean scenario, while our method obtains the best results on the OULU-NPU dataset. In the adversarial scenario, our method achieves the best performance on both two datasets.
Table 3 also shows some trends of face anti-spoofing methods against an adversary by considering the performance of the state-of-the-art methods. It is worth mentioning that both FeatherNets, LMFD-FAS, and our proposed method are lightweight networks. The experiment shows that the shallow architecture ensures the two lightweight networks’ robustness against adversarial samples. For different face anti-spoofing methods, their performances have different change trends relative to the epsilon by considering the performance of Resnet50 on the Replay-Attack dataset in Table 3. The EER and ACER decrease subtly when epsilon decreases from 0.5 to 0.005. Considering the FeatherNets’ performance on the Replay-Attack dataset, the EER , and ACER decrease sharply when epsilon decrease from 0.5 to 0.05, while their changes are subtle when epsilon decrease from 0.05 to 0.005. For every method mentioned in Table 3, its ACER and EER from the same attack method always decrease with the decrease of the epsilon.
There are also some trends in different attack methods. For every method mentioned in Table 3, with the same epsilon, the ACER and EER from the FGSM attack method are not more than those from the BIM attack method. It is because the BIM attack method searches the adversarial perturbation by iteratively using the FGSM attack method.
As shown in Table 4, we tested the model size and resource usage comparison of the proposed method and state-of-the-art methods. It can be seen from Table 4 that FeatherNets has the least parameters and achieved the fastest running speed per sample with the least GPU memory usage. Our method achieves a competitive result on the model size and time and memory usage.
5 Conclusion
We proposed a lightweight network architecture, namely, Spectral Characteristic Presentation Attack Detection (SCPAD) with a constructed color space to focus on the albedo difference between real and spoofing face images for the task of face anti-spoofing. Furthermore, we evaluated the adversarial robustness of the state-of-the-art presentation attack detection (PAD) methods and our method. The experiment results showed that the proposed method achieved competitive performance in the clean scenarios, while it is more robust in the adversarial scenarios than other current PAD methods. Among the state-of-the-art methods, networks that have a shallow architecture performed better robustness against adversarial samples. Compared with the state-of-the-art PAD methods, the performance of our proposed method on clean scenario still has space to be improved. Also, the performance of the lightweight model against spoofing face made by 3D mask needs to be tested in the future.
References
Adeniyi JK, Adeniyi AE, Oguns YJ, Egbedokun GO, Ajagbe KD, Obuzor PC, Ajagbe SA (2022) Comparison of the performance of machine learning techniques in the prediction of employee. ParadigmPlus 3(3):1–15
Ajagbe SA, Oki OA, Oladipupo MA, Nwanakwaugwu A (2022) Inves-tigating the efficiency of deep learning models in bioinspired object detection. In 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET), pp 1–6 . https://doi.org/10.1109/ICECET55527.2022.9872568
Ajagbe SA, Amuda KA, Oladipupo MA, Oluwaseyi FA, Okesola KI (2021) Multi-classication of alzheimer disease on magnetic resonance images (mri) using deep convolutional neural network (dcnn) approaches. Int J Adv Comput Res 11(53):51
Arora G, Tiwari K, Gupta P (2019) Liveness and threat aware sele face recognition. Sele Biometrics. Springer, Berlin, pp 197–210
Atoum Y, Liu Y, Jourabloo A, Liu X (2017) Face anti-spoong using patch and depth-based cnns. In 2017 IEEE international joint conference on biometrics (IJCB), pp 319–328 IEEE
Bisogni C, Cascone L, Dugelay J-L, Pero C (2021) Adversarial attacks through architectures and spectra in face recognition. Pattern Recogn Lett 147:55–62
Bobbia S, Macwan R, Benezeth Y, Mansouri A, Dubois J (2019) Unsupervised skin tissue segmentation for remote photoplethysmography. Pattern Recogn Lett 124(JUN.):82–90
Boulkenafet Z, Komulainen J, Hadid A (2016) Face spoong detection using colour texture analysis. IEEE Trans Inform Forensics Secur 11(8):1818–1830
Boulkenafet Z, Komulainen J, Hadid A (2016) Face spoong detection using colour texture analysis. IEEE Trans Inform Forensics Secur 11(8):1818–1830
Boulkenafet Z, Komulainen J, Li L, Feng X, Hadid A (2017) Oulu-npu: A mobile face presentation attack database with real-world variations. In 2017 12th IEEE International conference on automatic Face & Gesture recognition (FG 2017), pp 612–618 IEEE
Cai G, Su S, Leng C, Wu J, Wu Y, Li S (2019) Cover patches: A general feature extraction strategy for spoong detection. Concurr Comput Pract Experience 31(23):4641
Chen D, Xu R, Han B (2019) Patch selection denoiser: An effective approach defending against one-pixel attacks. In International conference on neural information processing, pp 286–296 . Springer
Chingovska I, Anjos A, Marcel S (2012) On the effectiveness of local binary patterns in face anti-spoong. In 2012 BIOSIG-proceedings of the Inter-national conference of biometrics special interest group (BIOSIG), pp1–7 . IEEE
Dong Y, Fu QA, Yang X, Pang T, Zhu J (2020) Benchmarking adversarial robustness on image classication. In 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Fang M, Damer N, Kirchbuchner F, Kuijper A (2022) Learnable multi-level frequency decomposition and hierarchical attention mechanism for generalized face presentation attack detection. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3722–3731
Feng H, Hong Z, Yue H, Chen Y, Wang K, Han J, Liu J, Ding E (2020) Learning generalized spoof cues for face anti-spoofing. arXiv preprint arXiv:2005.03922
George A, Marcel S (2021) Cross modal focal loss for rgbd face anti-spoong. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7882–7891
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572
Hernandez-Ortega J, Fierrez J, Morales A, Galbally J (2019) Introduction to face presentation attack detection. Handbook of biometric anti-spoong. Springer, Berlin, pp 187–206
Hernandez-Ortega J, Fierrez J, Morales A, Tome P (2018) Time analy-sis of pulse-based face anti-spoong in visible and nir. In Proceedings of the IEEE conference on computer vision and pattern recognition Workshops, pp 544–552
Inoue S, Kotori Y, Takishiro M (2012) Paper gloss analysis by specular reection point spread function (part i)-measurement method for psf of paper on specular reection phenomenon. Jpn TAPPI J 66(8):879–886
Kurakin A, Goodfellow IJ, Bengio S (2018) Adversarial examples in the physical world. In Articial intelligence safety and security, Chapman and Hall/CRC, London pp 99–112
Lin B, Li X, Yu Z, Zhao G (2019) Face liveness detection by rppg features and contextual patch-based cnn. In: Proceedings of the 2019 3rd Interna-tional conference on biometric engineering and applications, pp 61–68
Liu Y, Jourabloo A, Liu X (2018) Learning deep models for face anti-spoong: Binary or auxiliary supervision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 389–398
Li L, Xia Z, Jiang X, Roli F, Feng X (2018) Face presentation attack detection in learned color-liked space. arXiv preprint arXiv:1810.13170
Li L, Xia Z, Jiang X, Roli F, Feng X (2020) Compactnet: learning a compact space for face presentation attack detection. Neurocomputing 409
Luo Z, Wu S-T (2015) Oled versus lcd: Who wins. Opt. Photonics News 2015:19–21
Moosavi-Dezfooli S-M, Fawzi A, Frossard P (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2574–2582
Mygdalis V, Pitas I (2022) Hyperspherical class prototypes for adversarial robustness. Pattern Recog 125:108527
Patel K, Han H, Jain AK (2016) Secure face unlock: Spoof detection on smartphones. IEEE Trans Inform Forensics Secur 11(10):2268–2283
Perdana RN, Ardiyanto I, Nugroho HA (2021) A review on face anti-spoong. IJITEE (International Journal of Information Technology and Electrical Engineering) 1
Quan R, Wu Y, Yu X, Yang Y (2021) Progressive transfer learning for face anti-spoong. IEEE Trans Image Process 30:3946–3955. https://doi.org/10.1109/TIP.2021.3066912
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In International conference on learning representations
Su J, Vargas DV, Sakurai K (2019) One pixel attack for fooling deep neural networks. IEEE Trans Evol Comput 23(5):828–841
Svaasand LO, Norvang L, Fiskerstrand E, Stopps E, Berns M, Nelson J (1995) Tissue parameters determining the visual appearance of normal skin and port-wine stains. Lasers Med Sci 10(1):55–65
Wang S-Y, Yang S-H, Chen Y-P, Huang J-W (2017) Face liveness detection based on skin blood ow analysis. Symmetry 9(12):305
Wang Y, Song X, Xu T, Feng Z, Wu X-J (2021) From rgb to depth:domain transfer network for face anti-spoong. IEEE Trans Inform Forensics Secur 16:4280–4290
Wang C-Y, Lu Y, Yang S-T, Lai S-H (2022) Patchnet: A simple face anti-spoong framework via fine-grained patch recognition. IEEE/CVF conference on computer vision and pattern recognition (CVPR) 2022:20249–20258
Wang Z, Zhao C, Qin Y, Zhou Q, Qi G, Wan J, Lei Z (2018) Exploiting temporal and depth information for multi-frame face anti-spoong. arXiv preprint arXiv:1811.05118
Wen D, Han H, Jain AK (2015) Face spoof detection with image distortion analysis. IEEE Trans Inform Forensics Secur 10(4):746–761
Xiong Z, Xu H, Li W, Cai Z (2021) Multi-source adversarial sample attack on autonomous vehicles. IEEE Trans Vehr Technol 70(3):2822–2835
Yu H, Ng T-T, Sun Q (2008) Recaptured photo detection using specularity distribution. In 2008 15th IEEE International conference on image processing, pp 3140–3143 IEEE
Yu Z, Zhao C, Wang Z, Qin Y, Su Z, Li X, Zhou F, Zhao G (2020) Searching central difference convolutional networks for face anti-spoofing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5295–5305
Zeng X, Liu C, Wang Y-S, Qiu W, Xie L, Tai Y-W, Tang C-K, Yuille AL (2019) Adversarial attacks beyond the image space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4302–4311
Zhang X, Feng X, Xia Z (2019) Analysis of factors on bvp signal extraction based on imaging principle. In Proceedings of the 2019 3rd International conference on biometric engineering and applications, pp 48–55
Zhang P, Zou F, Wu Z, Dai N, Mark S, Fu M, Zhao J, Li K (2019) Feathernets: Convolutional neural networks as light as feather for face anti-spoofing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflicts of interest
This work was supported by the National Natural Science Foundation of China (No. No.62002199), the Natural Science Foundation of Shandong Province (No.ZR2020QF109), the Key Research and Development Program of Shaanxi (Nos. 2021ZDLGY15-01, 2021ZDLGY09-04, and 2021GY-004), and Shenzhen Science and Techonlogy Program (No. GJHZ20200731095204013)
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dang, C., Xia, Z., Dai, J. et al. SCPAD: An approach to explore optical characteristics for robust static presentation attack detection. Multimed Tools Appl 83, 14503–14520 (2024). https://doi.org/10.1007/s11042-023-15870-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15870-4