Abstract
Human skin detection is an essential phase in face detection and face recognition when using color images. Skin detection is very challenging because of the differences in illumination, differences in photos taken using an assortment of cameras with their own characteristics, range of skin colors due to different ethnicities, and other variations. Numerous methods have been used for human skin color detection, including the Gaussian model, rule-based methods, and artificial neural networks. In this article, we introduce a novel technique of using the neural network to enhance the capabilities of skin detection. Several different entities were used as inputs of a neural network, and the pros and cons of different color spaces are discussed. Also, a vector was used as the input to the neural network that contains information from three different color spaces. The comparison of the proposed technique with existing methods in this domain illustrates the effectiveness and accuracy of the proposed approach. Tests were done on two databases, and the results show that the neural network has better precision and accuracy rate, as well as comparable recall and specificity, compared with other methods.
1 Introduction
An active area of research in image processing is face recognition, i.e., the ability to recognize a face and identify the person. Face recognition relies on detecting the face in the image, and the first phase of face detection in color images is segmentation of the image into human skin and non-human skin. This segmentation is challenging because of the differences of illumination between images, use of different cameras in taking the images, different characteristics of various lenses, and range of human skin colors due to ethnicity. One of the most important issues is that some pixels are common between human skin and other entities such as soil, desk surfaces, walls, and other items [4]. These challenges make image processing of skin color detection very difficult, and there is no exact mechanism to distinguish between skin color and non-skin color.
For skin detection, a color space must be chosen. There are a number of color spaces that can be used for skin detection. The most common color spaces are RGB, YCbCr, and HSV [12]. Each of these color spaces has its own characteristic. Combining of information is a strong method that is being used in data processing. One of the most important advantages of combining information is that not only the useful information from the vectors are kept but also the redundant information are omitted. Owing to these different characteristics, we tried to compare different color spaces and utilize the useful information from these three color spaces. Unlike the Gaussian and rule-based methods, for which small samples have been used to define their classifiers, we have created a large dataset containing different varieties of pixels. The neural network is a strong classifier; however, it has been used in skin detection only a few times.
The neural network has several parameters that can be modified to change the results, e.g., the number of nodes in the hidden layer, the random initial weights, and the thresholds that are assigned to the output of the network. These parameters make the neural network very flexible unlike other methods used for skin detection. The tests have been done on two databases in this field, and the results are reported to show the precision, recall, specificity, and accuracy. Our method outperformed other methods in terms of precision and accuracy, and had comparable results in recall and specificity to other methods.
In Section 2, we will discuss different methods that have been used for skin detection. The Gaussian method and rule-based methods are two types of methods that have been used in skin detection. In Section 3, we will discuss our method, which is skin detection using the neural network.
2 Methods for Skin Color Detection
Three methods – Gaussian, rule-based, and neural networks – are discussed in this section.
2.1 Gaussian Methods
This technique, which was proposed in Reference [14], uses the Gaussian model to find the human skin in an image. The RGB image is transformed to the YCbCr color space. The density function for the Gaussian variable X= (Cb Cr)T∈ R2 is
where
The parameters were calculated using training images. For each pixel value, the density function is calculated. Only the (CbCr) value is used because the Y component has illumination information that is not related to skin color. The probability value of more than a specified threshold is considered as skin. The final output is a binary image where the non-skin pixels are shown by black and the human skin by white.
2.2 Rule-Based Methods
Skin detection based on rule-based methods has been used in several research efforts as the first step in face detection. Chen et al. [5] analyzed the statistics of different colors. They used 100 images for training, consisting of skin and non-skin, to calculate the conditional probability density function of skin and non-skin colors.
After applying Bayesian classification, they determined the rules and constraints for the human skin color segmentation. The rules are
with α= 100, β1= 10, β2= 70, γ1= 24, γ2= 112, σ1= 0 and σ2= 70.
Although this method worked on some images perfectly, the results were not reliable in images with a complex background or uneven illumination.
Kovac et al. [9] introduced these rules for skin segmentation in RGB space, where each pixel that belongs to human skin must satisfy the following relations:
For indoor images:
For images taken in daylight illumination:
Kong and Zhe [8] presented rules that used the information from both HSV and normalized RGB color spaces. They suggested that although in normalized RGB, the effect of intensity has been reduced, it is still sensitive to illumination. Therefore, they also used HSV for skin detection. Each pixel that satisfies these rules is considered to be a human skin pixel:
2.3 Neural Network Methods
The neural network has been used in skin color detection in a number of research projects. Doukim et al. [6] used YCbCr as the color space with a multilayer perceptron (MLP) neural network. They used two types of combining strategies, and several combining rules were applied. A coarse to fine search method was used to find the number of neurons in the hidden layer. The combination of Cb/Cr and Cr features produced the best result.
Seow et al. [11] used the RGB as the color space, which was used with a three-layered neural network. Then, the skin regions were extracted from the planes and were interpolated to obtain an optimum decision boundary and the positive skin samples for the skin classifier. Yang et al. [15] used the YCbCr color space with a back-propagation neural network. They took the luminance Y and sorted it in ascending order, dividing the range of Y values into some intervals. Then, the pixels whose luminance belonged to the same luminance interval were collected. In the next step, the covariance and the mean of Cb and Cr were calculated and used to train the back-propagation neural network. Another example of methods of human skin color detection using the neural network can be found in Al-Mohair et al. [3].
3 Methodology
The neural network is a strong tool in learning; therefore, we decided to use the neural network for learning pixel colors so that we can distinguish between what is a face skin pixel and what is a non-face skin pixel. We decided to use information from more than one color space instead of using just the information from one color space. We gathered around 100,000 pixels for face and 200,000 for non-face pixels from images chosen from the Web. Figure 1 shows some of the samples for human skin, and Figure 2 shows some of the samples for the non-skin. For human skins, we chose skins from different ethnicities.
Choosing images for the non-skin is a rather difficult task because it is an enormous category, i.e., everything that is not human skin is non-skin. We tried to choose images from different categories, especially those that are very similar to human skin color, such as sand or surfaces of desks. We used such things in training the neural network so that it can distinguish them from human skin.
For the implementation, the MLP neural networks were used. Several entities can be used as the input of the neural network, namely, RGB, HSV, and YCbCr (in this case, Y is not used because it has illumination information that is not suitable for skin detection). The number of outputs can be one or two. If there is just one output, then a threshold can be used. For example, if 1 is considered as being human skin and 0 as non-skin, an output of >0.5 indicates that the input pixel belongs to skin, and less than that means the input pixel belongs to non-skin. For two outputs, one output belongs to skin and the other to non-skin. The larger amount identifies the class for the pixel.
Around 50% of samples were used for training and the rest for validation. Different numbers of neurons were examined in the hidden layer. As mentioned before, some pixels have the same value between skin and non-skin; these pixels were assigned to the skin category. The neural network was trained with different nodes in the hidden layer, ranging from 2 to 40 nodes. The networks that produced better results were chosen for the test images. Figure 3 shows the recognition rate for different nodes in the hidden layer, when the neural network has only one output, using the CbCr as the input to the neural network. For most of the networks, having 16 or 20 nodes in the hidden layer produced better results in comparison with other number of neurons in the hidden layer.
The color spaces used included RGB, CbCr (eliminating Y because it contains illumination information), and HS (eliminating V because it contains illumination information). Also, a combination of the different color spaces, CbCrRGBHS, was used as the input. Another method that was used is the boosting method. We used three different boosting methods – AND, OR, and VOTING. The outputs of the three different neural networks (RGB, CbCr, and HSV) were used. If the output shows that the pixel belongs to human skin, it is considered as 1 and otherwise as 0. In the AND method, all outputs should confirm that the pixel belongs to human skin so that the final decision is human skin for that pixel. In the OR method, if only one output shows that the pixel belongs to human skin, then that is enough for the final decision to consider that pixel as human skin. In the VOTING method, two outputs out of three should show that the pixel belongs to human skin so that the final decision considers that pixel as human skin. For more information about boosting and the mathematical methods that show how boosting can increase the detection rate, refer to References [7, 10]. The methods mentioned in References [7, 10] are mostly mathematical methods for boosting in machine learning problems, which are not relevant to neural networks and skin detection.
We trained several different neural networks and tested the results on the UCD [13] and VT-AAST [1, 2] databases, using MATLAB (developed by MathWorks) for implementation. The UCD database contains 94 images from different races. The images vary from one person in the image to multiple persons. The VT-AAST database contains 286 images that offer a wide range of difference in illumination and race. Both databases also contain the images after cropping the face skin.
The experimental results are reported as precision, recall, specificity, and accuracy.
Precision or positive predictive value (PPV):
Sensitivity or true positive rate (TPR) equivalent with hit rate, recall:
Specificity (SPC) or true negative rate:
Accuracy (ACC):
In the skin detection experiments, P is the number of the skin pixels and N is the number of the non-skin pixels. TP is the number of the skin pixels correctly classified as skin pixels. TN is the number of the non-skin pixels correctly classified as non-skin pixels. FP is the number of the non-skin pixels incorrectly classified as skin pixels. FN is the number of the skin pixels incorrectly classified as non-skin pixels.
4 Experimental Results
In the first experiment, we chose the CbCr component of the YCbCr, HS component of the HSV, and RGB as the input of the neural network. The Y and V components were excluded because they do not affect skin detection.
The results for CbCr, HS, and RGB on the UCD database are listed in Table 1. The CbCr experiments have a better accuracy rate than HS and RGB. The HS experiment detects more skin pixels correctly than the other color spaces.
Precision | Recall | Specificity | Accuracy | |
---|---|---|---|---|
CbCr | 67.55 | 46.23 | 92.34 | 80.52 |
HS | 56.66 | 63.06 | 83.37 | 78.17 |
RGB | 81.06 | 26.45 | 97.87 | 79.56 |
In another experiment, we trained a neural network using the HSV color space. We included V to see the effect of excluding V on recognizing skin pixels. The results for HS and HSV on the UCD database are listed in Table 2. As it can be seen, the accuracy of using HS as input was higher than HSV; also, the recall decreased by about 30%, which is very high. Thus, removing the V component produced better results.
Precision | Recall | Specificity | Accuracy | |
---|---|---|---|---|
HS | 56.66 | 63.06 | 83.37 | 78.17 |
HSV | 58.72 | 35.28 | 91.45 | 77.05 |
In another experiment, we used the boosting method to increase the performance of the neural networks. We used AND, OR, and VOTING among the three outputs of neural networks trained with CbCr, HS, and RGB. The results on the UCD database are shown in Table 3. The AND operation has the highest precision and specificity compared with the other two methods; however, it had much lower recall, which means that many skin pixels are considered as non-skin pixels. In terms of recall, the OR method recognizes much more skin pixels correctly compared with the other two methods; however, also, more non-skin pixels are recognized as skin pixels. The VOTING method has the highest accuracy among these three methods.
Precision | Recall | Specificity | Accuracy | |
---|---|---|---|---|
AND | 78.40 | 21.45 | 97.96 | 78.35 |
OR | 55.62 | 70.36 | 80.65 | 78.01 |
VOTING | 73.34 | 39.53 | 95.05 | 80.82 |
Also, we generated a vector consisting of the information of the color spaces CbCrRGBHS and showed the results on the UCD database in Table 4. This shows an improvement compared with the previous methods.
Precision | Recall | Specificity | Accuracy | |
---|---|---|---|---|
CbCrRGBHS | 77.73 | 51.35 | 95.92 | 81.93 |
Another neural network was designed having the same input but different nodes in the output. In this experiment, two nodes were chosen for the output, one for the skin and the other for the non-skin. If the value of one output is higher, it indicates that the pixel belongs to that class. Our results for CbCr, RGB, and HS on the UCD database are listed in Table 5. Comparing Table 5 with Table 1 shows that there are improvements when we used two outputs for the neural network instead of one. RGB has a higher accuracy than CbCr and HS; however, CbCr detects more human skin correctly than the other two methods.
Precision | Recall | Specificity | Accuracy | |
---|---|---|---|---|
CbCr | 62.16 | 60.54 | 87.30 | 80.44 |
HS | 69.55 | 42.28 | 93.62 | 80.46 |
RGB | 78.28 | 39.52 | 96.20 | 81.93 |
Figure 4 shows the receiver operating characteristic (ROC) graph for the CbCrRGBHS vector with two nodes in the output for the validation set. The results for CbCrRGBHS vector on the UCD database are listed in Table 6. The results show that in terms of accuracy and recall, some improvement was achieved; however, the precision decreased.
Precision | Recall | Specificity | Accuracy | |
---|---|---|---|---|
CbCrRGBHS | 71.30 | 60.25 | 93.43 | 82.36 |
Table 7 shows the results of other methods discussed compared with our best results in using the UCD database. Comparing the other methods with the result we have from the CbCrRGBHS vector shows that our result is better in precision, specificity, and accuracy. Our method accepts much less non-skin pixels as skin compared with other methods.
Precision | Recall | Specificity | Accuracy | |
---|---|---|---|---|
Gaussian | 54.96 | 66.82 | 81.12 | 77.46 |
Chen | 63.75 | 51.13 | 89.98 | 80.02 |
Kovac | 62.51 | 69.09 | 85.71 | 81.45 |
Kong | 37.47 | 14.58 | 91.61 | 71.87 |
CbCrRGBHS | 71.30 | 60.25 | 93.43 | 82.36 |
One thing to mention is that there is a trade-off between precision and recall. If we want to have high recall (detecting more skin pixels correctly), then it is highly probable to detect many non-skin pixels as human skin, which will reduce the precision, and vice versa. Table 3 shows this, where the AND and VOTING methods have a high precision but low recall; in contrast, the OR method has a high recall but low precision.
Also, we applied the designed neural network on the VT-AAST database using CbCrRGBHS as the input. Table 8 shows the results of our method compared with other methods on the VT-AAST database. Our results are better in precision and accuracy, and comparable in recall and specificity to the best results.
Precision | Recall | Specificity | Accuracy | |
---|---|---|---|---|
Gaussian | 30.00 | 60.37 | 79.84 | 77.40 |
Chen | 31.62 | 54.59 | 83.10 | 79.53 |
Kovac | 31.81 | 65.02 | 80.05 | 78.17 |
Kong | 45.97 | 29.51 | 95.03 | 86.83 |
CbCrRGBHS | 54.26 | 59.07 | 93.36 | 88.56 |
To improve the result, two operations were done on the output image from the neural network. The first operation was using the filling method, in which the holes in the components are filled. This method is useful, and we achieved increase in recall; the reason is that the output image from the neural network may have had some undetected part inside the face or other parts of the body components, which can be recovered by using this method. The other operation is the opening method, which involves applying erosion followed by a dilation method. Erosion is a morphological operation in image processing. A small disk (which is called a structuring element) is defined that will be moved on the image. The center of the structuring element is placed on each pixel of the binary image, which has the value of 1. If all the image pixels that are covered by the structuring element are 1, then that pixel (which is covered by the center of the structuring element) is considered as 1; otherwise, it is zero in the final image. In the dilation process, like erosion, the center of the structuring element is placed on the image (those pixels that are 1). Now, any zero pixels that are covered by the structuring-element pixels are changed to 1 in the final image.
The structuring element that we used was 3 * 3. This size had better results than other structuring elements. Tables 9 and 10 show the results after applying the filling and opening methods; the modified method has been called CbCrRGBHS+. The results in Tables 9 and 10 show that the recall has increased. Also, there are small increases in precision and accuracy. Figure 5 illustrates some of our experimental results in the images from the UCD database. These are produced using the CbCrRGBHS vector and two outputs for the neural network. The second image is the output from the neural network, and the third image is that after applying the morphological operation. We first filled the holes that were in the image. Thereafter, we applied the opening operation. Figure 6 illustrates some of our results on the VT-AAST database. There are some false positives in some of the images. That is because some objects’ colors are very similar to human skin color.
Precision | Recall | Specificity | Accuracy | |
---|---|---|---|---|
CbCrRGBHS | 71.30 | 60.25 | 93.43 | 82.36 |
CbCrRGBHS+ | 73.43 | 65.54 | 93.08 | 83.45 |
Precision | Recall | Specificity | Accuracy | |
---|---|---|---|---|
CbCrRGBHS | 54.26 | 59.07 | 93.36 | 88.56 |
CbCrRGBHS+ | 54.77 | 62.61 | 93.07 | 88.76 |
5 Conclusion
In this article, we used neural networks for detecting human skin in color images, using a variety of color spaces. The results show that the neural network has acceptable performance in detecting human skin in color images. Reaching 100% detection is not possible because there are many pixels in color images that are common between human skin and other things.
One method for improvement may be using many images that do not contain faces, and adding pixels that were classified incorrectly as face skin to the training pixels and then using them in the training. Also, we can train networks with different initial weights, different random sets of skin and non-skin pixels, and different permutations of the images that were presented to the network. Although the networks will have close detection and error rates, the errors will be different from one another. A combination of the networks using the AND, OR, and VOTING methods can also be used.
Bibliography
[1] A. S. Abdallah, A. L. Abbott and M. A. El-Nasr, A new face detection technique using 2D DCT and self-organizing feature map, Proceedings of World Academy of Science, Engineering and Technology21 (2007), 15–19.Search in Google Scholar
[2] A. S. Abdallah, M. A. El-Nasr and A. C. Abbott, A new colour image database for benchmarking of automatic face detection and human skin segmentation techniques, Proceedings of World Academy of Science, Engineering and Technology20, (2007), 353–357.Search in Google Scholar
[3] H. Al-Mohair, J. Saleh and S. Suandi, Human skin color detection: a review on neural network perspective, International Journal of Innovative Computing, Information and Control8 (2012), 8115–8131.Search in Google Scholar
[4] S. Alshehri, Neural networks performance for skin detection, Journal of Emerging Trends in Computing and Information Sciences3 (2012), 1582–1585.Search in Google Scholar
[5] H. Chen, C. Huang and C. Fu, Hybrid-boost learning for multi-pose face detection and facial expression recognition, Pattern Recognition Society, Elsevier41 (2008), 1173–1185.10.1016/j.patcog.2007.08.010Search in Google Scholar
[6] C. A. Doukim, J. A. Dargham, A. Chekima and S. Omatu, Combining neural networks for skin detection, Signal & Image Processing: An International Journal (SIPIJ)1 (2010), 1–11.10.5121/sipij.2010.1201Search in Google Scholar
[7] Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence14 (1999), 771–780.Search in Google Scholar
[8] W. Kong and S. Zhe, Multi-face detection based on down sampling and modified subtractive clustering for color images, Journal of Zhejiang University8 (2007), 72–78.10.1631/jzus.2007.A0072Search in Google Scholar
[9] J. Kovac, P. Peer and F. Solina, Human skin color clustering for face detection, EUROCON 2003, Computer as a Tool, The IEEE Region 82 (2003), 144–148.Search in Google Scholar
[10] R. Schapire, Y. Freund, P. Bartlett and W. Lee, Boosting the margin: a new explanation for the effectiveness of voting methods, in: Machine Learning: Proceedings of the Fourteenth International Conference, 1997.Search in Google Scholar
[11] M. Seow, D. Valaparla and V. Asari, Neural network based skin color model for face detection, in: Proceedings of the 32nd Applied Imagery Pattern Recognition Workshop (AIPR’03), pp. 141–145, 2003.Search in Google Scholar
[12] S. Singh, D. S. Chauhan, M. Vatsa and R. Singh, A robust skin color based face detection algorithm, Tamkang Journal of Science and Engineering6 (2003), 227–234.Search in Google Scholar
[13] UCD database. Available from http://ee.ucd.ie/~prag/. Accessed May 2014.Search in Google Scholar
[14] Y. Wu and X. Ai, Face detection in color images using Adaboost algorithm based on skin color information, in: 2008 Workshop on Knowledge Discovery and Data Mining, pp. 339–342, 2008.10.1109/WKDD.2008.148Search in Google Scholar
[15] G. Yang, H. Li, L. Zhang and Y. Cao, Research on a skin color detection algorithm based on self-adaptive skin color model, in: International Conference on Communications and Intelligence Information Security, pp. 266–270, 2010.10.1109/ICCIIS.2010.67Search in Google Scholar
©2015 by De Gruyter
This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.