Abstract
The outbreak of coronavirus disease 2019 (COVID-19) occurred at the end of 2019, and it has continued to be a source of misery for millions of people and companies well into 2020. There is a surge of concern among all persons, especially those who wish to resume in-person activities, as the globe recovers from the epidemic and intends to return to a level of normalcy. Wearing a face mask greatly decreases the likelihood of viral transmission and gives a sense of security, according to studies. However, manually tracking the execution of this regulation is not possible. The key to this is technology. We present a deep learning-based system that can detect instances of improper use of face masks. A dual-stage convolutional neural network architecture is used in our system to recognize masked and unmasked faces. This will aid in the tracking of safety breaches, the promotion of face mask use, and the maintenance of a safe working environment. In this paper, we propose a variant of a multi-face detection model which has the potential to target and identify a group of people whether they are wearing masks or not.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
COVID-19 is a virulent disease that has unfolded across the world. The pandemic ailment has ended in an important worldwide fitness difficulty that has profoundly impacted humanity and the manner we see the truth and our everyday lives. The unfolding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), some other very contagious respiration ailment, commenced in Wuhan in December 2019. Before COVID-19 was declared a global pandemic, 7711 human beings were infected and 170 deaths were disclosed in China. Coronavirus has been given the designation COVID-19 in line with the World Health Organization (COVID contamination 2019) (W. H. Organization et al. 2020). COVID-19 has inflamed greater than 13,039,853 humans and brought about greater than 571,659 fatalities in more than 200 countries across the world, in line with a World Health Organization (WHO) report (beginning July 12, 2020), ensuing in a fatality price of round 37, as compared to a demise price of below 1 percent from flu. Individual to man or woman transmission of the radical COVID generating COVID contamination 2019 (COVID-19) has been reported, but it seems that transmission of the radical COVID inflicting COVID contamination 2019 (COVID-19) also can be from an asymptomatic transporter without coronavirus symptoms. There is, but if any, clinically accepted antiviral drug or antibodies that have been proven to be powerful in opposition to COVID-19. It has hastily elevated over the planet, inflicting big well-being, economic, environmental, and social issues for the complete human population.
Individuals have to put on facial veils to keep away from the hazard of contamination transmission, and a social hole of a minimum of 2 m (Rota et al. 2003) has to be maintained among human beings to save their character from character unfold of sickness, consistent with WHO. Furthermore, numerous public provider establishments require customers to apply their offerings simplest if they put on veils and cling to secure social segregation. As a result, face veil identity and secure social separation checking have ended up an essential PC vision (Memish et al. 2013) project on the way to help the worldwide society. This look illustrates a technique for stopping the transmission of infection via way of means of constantly looking if people are adhering to secure social practices which include casting off their face coverings and carrying them openly. Deep learning techniques (Liu et al. 2018) have recently demonstrated a significant significance in object detection. Facial detection research necessitates expression recognition, face tracking, and position estimation, according to (Khan et al. 2019; Licheng et al. 2019). The objective is to recognize the face in a single photograph. Face detection is a tough task since faces fluctuate in size, shape, color, and other characteristics throughout time.
The World Health Organization (WHO) reports proposed that the two primary courses of transmission of the COVID-19 infection are respiratory beads and actual contact. In this investigation, clinical covers were characterized as careful or technique covers that are level or creased (some are molded like cups); they are appended to the head with lashes. They are tried for adjusted high filtration, satisfactory breath ability, and alternatively, liquid infiltration obstruction. The examination investigated a bunch of video transfers/pictures to distinguish individuals who are consistent with the public authority rule of wearing clinical covers. This could assist the public authority with making a suitable move against individuals who are resistant. In the current situation, everybody has been feeling down and discouraged about the condition of the world; a huge number of individuals are biting the dust every day, and for a considerable lot of us, there is next to no (regardless) we can do. To help in any little manner conceivable, we chose to apply PC vision and profound figuring out how to tackle a genuine issue: Best-case situation—we can utilize our undertaking to help other people. As software engineers, designers, and PC vision/profound learning specialists, we let our abilities become the interruption and our sanctuary. To create this dataset, we had the brilliant idea of
-
Capturing faces in their natural state.
-
Then writing a custom computer vision Python software to detect face masks on them, resulting in a fictitious (but still useful) dataset. Once you practice face landmarks to the problem, this approach is absolutely plenty less difficult than it sounds.
We can use facial landmarks to routinely deduce the location of face systems such as:
-
Eyes
-
Eyebrows
-
Nose
-
Mouth
-
Jawline
To prepare a custom face cover locator, breaking our venture into two unmistakable stages was required, each with its own separate sub-steps.
-
Preparing: Here, stacking our face veil discovery dataset from plate, preparing a model on this dataset, and afterward serializing the face cover locator to circle was the focus.
-
Individual entries are indicated with a black dot, a so-called bullet.
-
The text in the entries may be of any length.
-
Sending: Once the face veil identifier is prepared, the accompanying advance of stacking the cover finder, performing face recognition, and afterward characterizing each face as with veil or without veil can be executed.
To prevent the spread of infection during COVID-19 outbreak, almost everyone wears a veil. As a result, traditional facial recognition innovation, such as network access control, face access control, facial participation, facial security checks at railway stations, and so on, is virtually always inadequate. Consequently, It is vital to improve the current face recognition technology’s recognition performance on hidden faces. The majority of today’s advanced face recognition systems rely on a huge number of face samples and are based on deep comprehension. In a real-time scenario, their algorithm (Teboulbi et al. 2021) tracks persons wearing or not wearing masks and provides social separation by producing an alarm if there is a violation in the scene or in public locations. This can be combined with current embedded camera infrastructure to allow these analytics, which can be used in a variety of verticals as well as in offices and airport terminals/gates.
In this task, we will go over our two-stage COVID-19 face cover identifier as well as our PC vision/profound learning pipeline. After that, we will run a check on the dataset we will be utilizing to develop our own face cover indicator. We will then show how to utilize Keras and TensorFlow to execute a Python script on our dataset to produce a face cover identifier. This Python code will be used to generate a face cover identification and a survey of the findings. We will continue to run two more Python programs to detect face covers while video transfers take place now that the COVID-19 face cover detector is ready. We will wrap off this piece by taking a peek at the results of our face veil finder.
The remainder of the paper goes like this: Section II describes the methodology, followed by Section III that describes a CNN certainty by iteratively blocking parts. MobileNet V2 description is provided in Section IV. Section V describes Grad-Cam. Overview of model description and formulation is shown in Section VI followed by evaluation metrics in Section VII. Limitations of the face detection system are provided in Section VIII. Section IX describes how to overcome limits on facial recognition tools. Section X describes the conclusion followed by Section XI future work.
2 Methodology/Approach
To utilize facial imprints to develop an informational collection of facial covers, we started with a picture of an individual who does not wear a facial veil. Following this, face recognition was applied to figure the area of the jumping enclosing the image. When we knew where in the picture the face is, we could extricate the face region of interest (ROI), and from that point, we applied facial milestones, permitting us to restrict the mouth, face, and eyes. To apply covers, we required a picture of a veil (with a straightforward and top-notch picture) and afterward added the cover to the identified face and afterward resized and turned in like manner to put it over the face. This cycle is rehashed for all information of images.Training: This progression included preparing for the picture of appearances with veil and without cover individually with a fitting algorithm.Deployment: Once the models were prepared, we proceeded onward to the stacking cover identifier and perform face identification, at that point for characterization of each face.At the point when an image had been moved, the request happened normally. It was then possible to apply some interpretation ability strategies for neural association understanding. The UI presented two of the going with methods: Grad CAM envisioned how parts of the info picture influence a CNN yield by investigating the enactment maps, and Occlusion Sensitivity imagined how parts of the information picture affect, which is shown in Fig. 1.
3 A CNN certainty by iteratively blocking parts
3.1 Picture classification algorithm from PyTorch
Convolutional neural network (CNN) has several well-designed and pre-built networks, such as AlexNet, ResNet, Inception, LeNet, MobileNet, and so on. Because of its lightweight and effective adaptable organized model, I chose the MobileNetV2 for our circumstance.
We utilize two related demonstrating techniques to manage and test the viability of face mask usage by segments of the general population in reducing SARS-CoV-2 transmission and, as a result, in lowering the appropriate age number, Re (the ordinary number of new cases achieved by a single overpowering individual at a given point in the scourge). The basic model employs a fanning cycle to examine the reduction in transmission caused by the use of face masks, as well as the achievable adequacy of two control variables in lowering Re for the microbe. The degree to which the general population employs face masks (basically, the likelihood that a person would wear a cloak on any given day) and the appropriateness of the cover in decreasing transmission are the control factors (which relate to an extent of covers that connect from unpleasant penetrable covers (Mohamed et al. 2010, Piccardi 2004) to fronts of clinical standard). The goal of this model is to see if there are any obvious limit ranges in which the two control variables may lower Re to the point where they can be relied on to halt or stop the spread of the epidemic. We mimic the aftereffects of those who wear face masks regularly, or shortly after they begin to experience adverse effects.
We modify the fundamental SIR definition and include free-living SARS-CoV-2 particles delivered by internal breath from globules in the oculum and by continuous contact with facial apertures from fomite inoculum stored on surfaces (§2b(i)). The model is intended to examine the possible effects of wearing a face mask during times of lockdown, which are then dispersed once the lockdown is lifted. Because of the flexible nature of this displaying framework, a distinction may be seen between the capacity of face masks to decrease transmission from sullied persons (where sign verbalization is used) and the security provided by face masks on weak individuals. The final point might be beneficial, as the face mask decreases oculum internal breathing. It might also be negative; for example, if there is a constant manual variation in the face mask, the likelihood of transmission increases. We will most likely provide a varied, yet almost clear displaying framework to test hypotheses regarding face mask use in conjunction with other pandemic tactics, as well as scaling from one lead to people outcomes. SARS-CoV-2 is a new illness to humanity; thus, the conclusions should be made in this context, considering the gaps in our understanding regarding certain limits.
Customary object detection: A traditional item reputation version can apprehend the trouble of recognizing many veiled and unmasked faces in photographs. The majority of the time, object vicinity involves finding and categorizing matters in photographs (if there ought to be a prevalence of various items). Although traditional algorithms depend considerably on feature engineering, Haar cascade (Rafael et al. 2012) and HOG (Dalal and Triggs 2005) have proven to be beneficial for such situations. In the age of deep learning, it is miles viable to layout neural networks that keep away from those computations and do now no longer require any similarly feature engineering.
Multistage detectors: The detection cycle is split into numerous ranges in a multi-level identifier. A two-level indicator, which includes RCNN, measures and shows a listing of areas of hobby primarily based totally on precise pursuit. After that, the eCNN spotlight vectors are freely deleted from every locale. Several regional proposal networks primarily based totally on algorithms, which include fast RCNN (Girshick, 2015) and faster RCNN (Ren et al. 2015), have carried out extra accuracy and desired consequences than maximum single-level detectors.
Single-stage detectors: A one-level indicator conducts recognition in an unmarried step, ostensibly over an intensive examination of ability regions. These computations leave out the place proposition level utilized in multi-level detectors, making them quicker in general, however, on the value of a few accuracy losses. One of the maximum famous unmarried-level calculations, You Only Look Once (YOLO), changed into released in 2015 and attained close to non-stop performance. The single-shot detector is a tool that detects an unmarried shot. SSD is some other famous item identity technique that produces exact results. RetinaNet and feature pyramid networks are one of the best indicators and use critical misfortune. Adding more stages of learned transformations, specifically a module for feedforward connections in deconvolution and a new output module, enables this new approach and provides a potential path forward for further detection (Figs. 2 and 3).
4 MobileNetV2
It uses depthwise separable convolution as an efficient building component, based on ideas from MobileNetV1 (W. H. Organization et al. 2020). V2, on the other hand, adds two additional aspects to the architecture:
-
Linear bottlenecks between the layers and
-
Shortcut connections between the bottlenecks. The basic structure is shown below.
The overall point of the objective’s face has a major influence on the acknowledgment score. When a face is associated with recognition programmers, several points are frequently employed (profile, frontal, and 45 degree are normal). Anything less than a frontal perspective has an impact on the calculation’s capacity to generate a face format. The greater the score of any future matches, the more plain the picture (both enlisted and test picture) and the higher it’s objective.
Loads of each layer in the model are predefined depending on the ImageNet dataset. The loads show the cushioning, steps, part size, input channels, and yield channels (Table 1).
4.1 Step 1: data visualization
In the first phase, we imagined the total number of images in our collection is divided into two classes. There are 690 photographs in the ‘yes’ class and 686 pictures in the ‘no’ class, as can be seen.
4.2 Step 2: data augmentation
After that, we expanded our dataset to include a larger number of images for our preparation. We rotated and flipped every single photograph in our dataset throughout this process of information development. Following information expansion, we now have a total of 2751 images, with 1380 images in the ‘yes’ category and 1371 images in the ‘no’ category as shown in Fig. 4.
4.3 Step 3: Splitting the data
We divided our data into two sets: the preparation set, which contains the images on which the CNN model will be trained, and the test set, which contains the images on which our model will be tested. In this case, we will choose split size =0.8, which means that 80 percent of the absolute photographs will go to the preparation set, while the remaining 20 percent will go to the test set. Following separating, we discovered that the optimal level of images had been distributed to both the preparation set and the test set, as previously mentioned.
4.4 Step 4: Building the model
Following that, we built our sequential CNN model using several layers such as Conv2D, Max Pooling2D, Flatten, Dropout, and Dense. In the final dense layer, we use the softmax capability to generate a vector that represents the probability of each of the two classes. Because there are just two classes, we used the ‘adam’ streamlining agent and ‘paired cross-entropy’ as our unfortunate job. Furthermore, the MobileNetV2 may be used to improve accuracy.
4.5 Step 5: Pre-training the CNN model
Following the construction of our model, we created the ‘train generator’ and ‘approval generator’ in order to adapt them to our model in the next step. We discovered that the preparation set has 2200 images and the test set contains 551 images.
4.6 Step 6: Training the CNN model
This is the first step, in which we fit our photographs from the preparation and test sets to the sequential model we built using the Keras library. I have made a model for 30 different ages (cycles). Nonetheless, we can plan for a larger number of ages to obtain more precision in the event of over-fitting. Our model has a precision of 96.19 percent with the preparation set and a precision of 98.86 percent with the test set after 30 years. This indicates that it is well-prepared and not over-fitted. We added a shortcut connection that takes an input from the first CNN layer and feds the output to the final layer of CNN because it reduces the information loss problem (Fig. 5).
4.7 Step 7: Labeling the information
We identify two probabilities for our outcomes when we finish creating the model. ‘0’ denotes ‘no veil’ and ‘1’ denotes ‘mask.’ I am also using RGB values to define the coloring of the limit square shape. ‘RED’ for ‘without-veil’ and ‘GREEN’ for ‘with-mask.’
4.8 Step 8: Importing the face detection program
Following that, we want to use it to detect whether we are wearing a face veil via our PC’s camera. To do so, we must first implement facial recognition. For this, I am using Haar feature-based cascade classifiers to identify the face’s highlights.
OpenCV designed this course classifier to recognize the frontal face by preparing a huge number of images. The .xml file for this purpose should be downloaded and used to recognize the face. I have saved the record to my GitHub repository.
4.9 Step 9: Detecting the faces with and without masks
In the closing advance, we applied the OpenCV library to run an endless circle to make use of our net digital digicam wherein we identified faces using the cascade classifier. The code webcam = cv2.VideoCapture(zero) manner using webcam. The version will assume the threat of each one of the classes ([without-cover, with-mask]). In mild of which chance is higher, the mark may be picked and proven round our countenances. Moreover, we will download the DroidCam software for each Mobile and PC to make use of our portable digital digicam and alternate the motivation from zero to at least one in webcam= cv2.VideoCapture(1).
5 Grad-Cam
Several previous works (Inbaraj and Jeng 2021) have claimed that if more number of CNN layers are applied to an input data, then it captures higher-level visual constructs. Furthermore, CNN layers contain mainly spatial information that is lost in fully connected layers. For this reason, the final convolutional layers will provide the best compromise between high-level semantics and detailed spatial information. These layers’ neurons search the image for semantic class-specific information. Grad-CAM assigns importance values to each neuron for a specific decision of interest using gradient information flowing into the last convolutional layer of the CNN.
6 Model description and formulation
Figure 2 shows the model structure. In this section, we mainly explained each and every symbol present in Equations (1–7) (Richard et al. 2020). Face mask wearers and non-face mask wearers are two distinct populations, both with persons who fall into the following categories.: helpless (S); uncovered, for example inactively tainted (E); asymptomatically irresistible \({I_A}\); apparently irresistible \({I_S}\); and taken out (R). The eliminated class incorporates people who recuperated from contamination and the individuals who kicked the bucket. Coming into touch with inoculum generated by persons infected with SARS-CoV-2 can taint powerless people. We distinguish between inoculum generation by irresistible individuals, which provides free-living inoculum, and inoculum take-up and illness in helpless people. The inoculum can be obtained by inhaling transitory bead (D) forms that can be seen all over or by coming into touch with a decaying repository of inoculum stored by infected persons in the environment as fomites (F), which can survive for up to 72 h on certain surfaces. Rapid evidence of bead inoculum coexists with a more gradual rot of fomite (Fig. 2). As a result, there are two sets of transmission rates: A and S for inoculum production by asymptomatic and suggestive individuals, respectively, and D and F for inoculum take-up and illness of helpless people from bead and fomite inoculum, respectively. A handful of these limits are influenced by wearing a face mask (cf. mi in Fig. 2). Face masks reduce the amount of bead inoculum that escapes irresistible persons (Richard et al. 2020) by trapping a larger number of drops behind the veil \({m_A, m_s <1}\). Face masks also reduce the amount of bead inoculum breathed in by capturing a larger number of beads in the air, lowering the take-up transmission rate \({\beta D}\) by \((m_D)\). (Fig. 2). At first, we assume that coverings have no effect on the risk of inoculum accessing inoculum from surfaces \({\beta F}\) with \({m_F}\) = 1. In any event, the model considers how wearing a veil might increase the risk of fomite illness contamination \({m_F > 1}\), for example, by exposing the face to more consecutive touch while changing the cover. We note that critical PPE, for example, a full face-hood, could act to lessen the danger of fomite disease \({m_F<1}\). Furthermore, sterilization interventions such as hand-washing may be proven by reducing the life expectancy of fomite inoculum \((\beta F)\), and additional cleaning of surfaces or the use of faster self-sanitizing surfaces can be demonstrated by reducing the life expectancy of fomite inoculum \((\tau F)\). We will focus on the effects of face masks and lockdown times in this section. The model is designed and settled as a fundamental deterministic differential condition model, which is summarized below for completion. The model may be quickly rebuilt in a stochastic framework with progress probabilities, as shown. It is also simple to divide the target country into metapopulations with varying contact rates, such as between urban communities and country zones or across age-groups in the population, and to geographically segment the population with limited inoculum pools. We use the model to look at substantial levels of how wearing a face mask complements a significant control method that involves the lockdown of a portion of the population. Accepting that lockdown reduces transmission rates \((\beta i, i= A,S,D,F)\) by a predetermined amount, q, we may simulate this. It reduces the amount of inoculum generated by irresistible persons in open zones, which reduces the amount of inoculum available in the D and F pools, as well as the number of time susceptibles, spend in touch with that inoculum. Lockdown aims to reduce generally attractive spread rates by a factor of \(q^2\) in the model along these lines.
In Table 2, we compared the accuracy obtained from the state-of-the-art models with our proposed model. It was discovered from this table that our proposed model has a higher accuracy (Figs. 6 and 7).
7 Evaluation metrics
We can evaluate our machine learning algorithm using a variety of metrics. The confusion matrix is used to assess how well a model performs on test results. True positive, true negative, false positive, and false negative are the four groups in the confusion matrix (Margherita et al. 2020). False positive indicates the presence of a model-predicted entity that is not true. The term ‘false negative’ refers to the entity’s nonexistence. To be more accurate, it may be assumed that the prediction of the entity’s nonexistence was incorrect. True positive defines the entity’s actual presence when it is confirmed and identified. To be more specific, the model here attempts to detect the existence of something that is actually present and can be proven right. True negative identifies the entity’s nonexistence, which must be shown and indicated. To be more specific, the model seeks to classify the absence of something that is not present and must be proven incorrect.
In this case, we select overall accuracy (OA), average accuracy (AA), and kappa accuracy (KA). Overall accuracy tells us how many of the references were correctly mapped out of all of them. The overall accuracy is typically expressed as a percentage, with 100 percent accuracy representing a perfect classification in which all reference sites were correctly classified. The kappa coefficient assesses the degree of agreement between classification and truth values.
Equations 8, 9, and 10 for calculating the kappa accuracy (KA), overall accuracy (OA), and average accuracy (AA) are given as follows (Margherita et al. 2020):
8 Limitations of face detection system
-
1.
Poor image quality limits facial recognition’s effectiveness: The quality of an image has an impact on how effectively facial recognition algorithms perform. The visual quality of scanning video is poor when compared to that of a digital camera. Even high-definition video is typically 720p (progressive scan) at best. These numbers are around 2MP and 0.9MP, respectively, but a low-cost digital camera may attain 15MP. It is easy to see the difference.
-
2.
Small image sizes develop more difficult facial recognition: When a face recognition algorithm detects a face in an image or a still from a video clip, the relative size of the face compared to the overall picture size influences how effectively the face will be detected. Because of the limited picture size and the fact that the target is far away from the camera, only 100 to 200 pixels of the identified face are visible on one side. Furthermore, scanning a picture for varied face sizes is a processor-intensive activity. Most algorithms enable us to choose a face-size range to assist eliminating false positives in detection and speed up picture processing.
-
3.
Different face angles can throw off the reliability of facial recognition The recognition ranking is heavily influenced by the relative angle of the target’s face. Several angles are generally employed when a face is used in a recognition program (profile, frontal, and 45 degree are common). Something other than a frontal perspective has an influence on the algorithm’s capacity to generate a prototype for the face. The greater the score of any resultant matches, the more exact and better the resolution of the picture (both enrolled and probing image.
-
4.
Data processing and storage might limit technology for facial recognition Although the high-definition video has a lower resolution than digital camera footage, it still takes up a lot of disk space. Because processing every frame of film is a huge job, only a small percentage (10 percent to 25 percent) of it is actually processed through a recognition device. To minimize total processing time, agencies may employ computer clusters. Adding computers, on the other hand, necessitates huge data transfer via a network, which might be constrained by input–output restrictions, further slowing processing performance.
9 How to overcome limits on facial recognition tools
As technology advances, more high-quality cameras will become available. PC companies will be able to transmit more data, and processors will be able to operate faster. Face recognition algorithms will be better prepared to identify faces from a photograph and remember them in a database of selected persons. The fundamental mechanisms that underpin the current computations, such as darkening parts of the face with shades and veils or altering one’s hairdo, will be able to function properly. Changing how photographs are captured is a quick way to overcome a significant number of these barriers. When using checkpoints, for example, individuals are expected to organize themselves and channel via a single point. Cameras would be able to focus on each subject with more precision, resulting in indisputably more valuable frontal, higher-goal test images. Regardless, widespread use necessitates a greater number of cameras. Biometric applications that are progressing are promising. They include face recognition, as well as signals, articulations, stride, and vascular instances, such as iris, retina, palm print, ear print, voice acknowledgment, and scent marks, as well as iris, retina, palm print, ear print, voice acknowledgment, and fragrance marks. A combination of modalities is unparalleled in terms of improving a framework’s capacity to produce outcomes with more confidence. Related efforts focus on increasing the ability to acquire data from a distance where the target is aloof and often unaware.
Without a doubt, security issues surround this breakthrough and its application. Finding a balance between public security and people’s protection rights will be a hot topic in the next years, especially as technology progresses.
10 Conclusion
This paper presents an inventive method to enhance the recognition of articles on face for our situation face cover wherein we play out a quick one shot output of veil. It outflanks or is at standard with different papers with a comparative plan in any event when our model is tried with lower-quality live recordings. The model was carefully tested with probable false-positive prospects that resulted in shambles of shirts folded over faces, handkerchiefs over the lips, and so on, and our model stood out as being more effective. Instead of a basic image classifier, the preparation included a dedicated two-class object identification. The problem with this method is that a face mask, via way of means of definition, hides a part of the face. Because the face mask detector cannot locate the face if sufficient of it’s far concealed, the face mask detector will now no longer be used. To get around this, we created a two-elegance item detector with mask elegance and without mask elegance. The version changed into progressed in methods via way of means of combining an item detector and a specialized mask class. For starters, the item detector changed to be capable of hitting upon folks carrying masks that the face detector could not able to hit upon as a result of the masks overlaying an excessive amount of the face.
11 Future work
Our model is made with a limited source of data, as the performance of neural networks increases with the data it is trained on, so we will try to incorporate more data and make it more robust and fault-proof. Furthermore, many causes of mistakes, such as brightness, posture, or partial image capture, can affect this detection. We will keep working to increase the technology’s accuracy. We are also working in to expand this project to make sure if a person is wearing a mask correctly or not. Often wearing masks below the nose is considered useless. We can even extend our project as a mode of surveillance, by using the face mask detection in street camera video. This will help in following the social distancing rules as given by the government which can be deployed in public areas like offices, railway stations, and airports.
Data availability
Inquiries about data availability should be directed to the authors.
References
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1, pp 886–893, https://doi.org/10.1109/CVPR.2005.177
Grandini M, Bagli E, Visani G (2020) Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756
Inbaraj XA, Jeng J-H (2021) Mask-GradCAM: object identification and localization of visual presentation for deep convolutional network. 2021 6th international conference on inventive computation technologies (ICICT), pp 1171–1178, https://doi.org/10.1109/ICICT50816.2021.9358569
Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868
Khan A, Sohail A, Zahoora U, Qureshi AS (2019) A survey of the recent architectures of deep convolutional neural networks. Artif Intell, Rev
Liu L et al (2018) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318
Memish ZA, Zumla AI, Al-Hakeem RF, Al-Rabeeah AA, Stephens GM (2013) Middle east respiratory syndrome coronavirus infections family cluster. New Engl J Med 368(26):2487–2494
Mohamed SS, Tahir NM, Adnan R (2010) Foundation displaying and foundation deduction proficiency for entity recognition. 2010 sixth International Colloquium on Signal Processing and its Applications, Mallaca City, pp 1–6, https://doi.org/10.1109/CSPA.2010.5545291
Negi A, Chauhan P, Kumar K, Rajput RS (2020) Face mask detection classifier and model pruning with keras-surgeon. In: 2020 5th IEEE international conference on recent advances and innovations in engineering (ICRAIE), IEEE. pp 1–6
Organization WH et al. (2020) Coronavirus disease 2019 (covid-19): senerio report, 96
Padilla R, Costa Filho CFF, Costa MGF (2012) Evaluation of haar cascade classifiers designed for face detection. World Acad Sci Eng Technol 64:362–365
Piccardi M (2004) Foundation deduction strategies: an analysis. 2004 IEEE international conference on systems, man and cybernetics (IEEE Cat. No.04CH37583), The Hague, vol.4, pp 3099–3104, https://doi.org/10.1109/IC-SMC.2004.1400815
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time detection of objects with networks of area proposal. In: Proc Adv Neural Inf Process Syst, pp 91–99
Rota PA, Oberste MS, Monroe SS, Nix WA, Campagnoli R, Icenogle JP, Penaranda S, Bankamp B, Maher K, Chen M-H et al (2003) Novel coronavirus characterisation linked with extreme acute respiratory syndrome. Science 300(5624):1394–1399
Stutt ROJH, Retkute R, Bradley M, Gilligan CA, Colvin J (2020) A modelling framework to assess the likely effectiveness of facemasks in combination with ‘lock-down’ in managing the COVID-19 pandemic. Proceedings of the Royal Society A
Su X, Gao M, Ren J, Li Y, Dong M, Liu X (2022) Face mask detection and classification via deep transfer learning. Multimed Tools Appl 81(3):4475–4494
Teboulbi S, Messaoud S, Hajjaji MA, Mtibaa A (2021) Real-time implementation of AI-based face mask detection and social distancing measuring system for COVID-19 prevention. Sci Program 2021:8340779. https://doi.org/10.1155/2021/8340779
Xiao J, Wang J, Cao S, Li B (2020) Application of a novel and improved VGG-19 network in the detection of workers wearing masks. J Phys Conf Series 1518(1):012041
Funding
No funding from any source.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
There is no conflict of interest.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Informed consent
No such consent is required in studies.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Banik, D., Rawat, S., Thakur, A. et al. Automatic approach for mask detection: effective for COVID-19. Soft Comput 27, 7513–7523 (2023). https://doi.org/10.1007/s00500-022-07700-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07700-w