Experimental Study on Lip and Smile Detection

This paper presents a lip and smile detection method based-on the normalized RGB chromaticity diagram. The method employs the popular Viola-Jones detection method to detect the face. To avoid the false positive, the eye detector is introduced in the detection stage. Only the face candidates with the detected eyes are considered as the face. Once the face is detected, the lip region is localized using the simple geometric rule. Further, the the red color thresholding based-on the normalized RGB chromaticity diagram is proposed to extract the lip. The projection technique is employed for detecting the smile state. From the experiment results, the proposed method achieves the lip detection rate of 97% and the smile detection rate of 94%. Paper ini menyajikan medote pendeteksi bibir dan senyum berdasarkan diagram tingkat kromatis RGB ternormalisasi. Metode ini menggunakan metode Viola-Jones yang populer untuk mendeteksi wajah. Untuk menghindari kesalahan positif, detektor mata diperkenalkan pada tahapan deteksi. Hanya kandidat wajah dengan mata yang telah terdeteksi yang dianggap sebagai wajah. Setelah wajah dideteksi, bagian bibir ditempatkan dengan menggunakan aturan geometris sederhana. Selanjutnya, batasan warna merah berdasarkan pada diagram kromatisitas RGB ternormalisasi digunakan untuk mengekstrak bibir. Teknik proyeksi digunakan untuk mendeteksi keadaan tersenyum. Dari hasil percobaan, metode yang diusulkan mencapai 97% untuk tingkat deteksi bibir dan 94% untuk tingkat deteksi senyum.


Introduction
Recently, the applications that employ the image processing techniques increase significantly. Human computer interface is developed in such that the computer behaves as natural like the human. Thus, human facial features recognition becomes a topic that is paid attention by researches. One of the extensively researched areas is the lip detection, which plays an important role in recognizing human expressions and activities [1][2][3][4][5][6][7][8]. In [9][10][11], the lip detection is used to detect smile expression. While in [12] [13] the lip detection is used by the audiovisual speech recognition system.
In [1], a spatial fuzzy clustering is proposed for color lip segmentation. In this method, both the distributions of data in feature space and the spatial interactions between neighboring pixels are considered in the clustering process. A hybrid method that integrating several distinct extraction techniques is proposed to detect mouth features [2]. They employ two stages, i.e. course mouth detection and fine mouth detection. In the course mouth detection stage, a color segmentation based on the HSV color space is employed to determine the location of lip in the facial image. In the fine mouth detection, the curve fitting of the mouth and mouth template matching are used to extract the mouth features: top of the upper lip, bottom of the lower lip, left and right mouth corners.
The rule-based lip detection technique based-on the normalized RGB color space is proposed by [3]. They defined a cresent area on rg plane using a quadratic polynomial discriminant function for detecting the lip pixels. In [6], a mouth is localized using edge projection technique. It first detects the face region and extracts the intensity valleys to detect the iris candidates. Based on a pair of iris candidates, the projection method is adopted to detect the mouth. In [4], the edge based detection and the segmentation are applied separately. Then 60Jurnal Ilmu Komputer dan Informasi, Volume 4, Nomor 2, Juni 2011 the results of two methods are fused to detect the outer lip contour.
The popular Haar-like classifiers are employed in [7][11] [13] to detect the lip region. To overcome the problem of false positive, the geometric relations checking between the detected face features is employed in [11]. The similar method is also adopted in [7]. The smile detection system is proposed in [9]. It first uses Adaboost algorithm to detect the face region in the first image frame and locate the standard facial features position. Then the optical flow is used to track the position of left and right mouth corners. A smile is detected if the distance between the tracked left and right mouth corners is larger than a threshold. In [11], three sets of feature vectors: the lip lengths, the lip angles, and the mean intensities of the cheek areas, are used by a linear discriminant function to detect the smile or non-smile face. In [2], the Artificial Neural Network is employed to classify the mouth movements as smile, neutral, or sad.
In this paper, researcher proposes a lip color segmentation based on the normalized rgb chromaticity diagram. The proposed lip color segmentation extracts the lip region from the face region detected using the Viola-Jones face detector. The eye detection is added to overcome the problem of false face detection. Further, the projection technique is employed to detect the inner area of the lip to determine the smile or nonsmile face.
The rest of paper is organized as follows. Section 1 presents the fundamental theory used in this research. Section 2 discusses the proposed method. Section 3 presents the experimental results. Conclusion is covered in section 4. Haar-like features use the change in contrast values between adjacent rectangular groups of pixels [7]. Figures 1(a) and 1(b) show the tworectangle feature which is calculated as the difference of the sum of the pixels within two rectangular regions. Figure 1(c) shows threerectangle feature which is calculated as the subtraction of the sum of the pixels within two outer rectangles from the sum of the pixels in the center rectangle. Figure 1(d) shows the four-rectangle feature which is calculated as the difference between diagonal pairs of rectangles. Figure 2. Schematic of cascade classifier [14].
Rectangle features which are described above could be computed rapidly if researcher use the integral image [14]. Integral image is an intermediate representation for the image which is expressed as: where row_sum(x,y) is the cumulative row sum. The Adaboost learning algorithm is used to select a small set of features and train the classifier [15]. Basically, the Adaboost is used to boost the performance of a weak/simple classifier. Hence the weak classifier is designed to select the single rectangle feature which best separates the positive and negative samples. Mathematically, a weak classifier h j (x) is defined as where x is a 24×24 pixel sub-window, f j is a feature, j T is a threshold and p j is a parity that defines whether x should be classified as positive sample or negative sample. The cascade scheme is constructed to increase detection performance while reducing computation time [14]. The scheme is to use simpler classifiers which reject the majority of sub-windows before more complex classifiers to achieve low false positive rates. Figure 2 shows the schematic of cascade classifier. A positive result from the first classifier triggers the evaluation of a second classifier. A positive result from the second classifier triggers a third classifier, and so on. A negative result at any point leads to the immediate rejection of the subwindow. The cascade scheme attempts to reject as many negatives as possible at the earliest stage possible.  In the normalized RGB color space, the distribution of human skin is a shell-shaped area, which is called as skin locus [16]. Figures 3, 4, and 5 show the skin color locu proposed by [3] [17], [18], [19] respectively. In figure 3, the upper bound quadratic function of skin locus is g = -1.3767r2 + 1.0743r + 0.1452, while the lower bond quadratic function is g = -0.776r2 + 0.5601r + 0.1766. In figure 4, the upper bound quadratic function is g = -5.05r2 + 3.71r -0.32, while the lower bound quadratic function is g = -0.65r2 + 0.05r + 0.36. In figure 5, the boundary of skin locus is expressed as line-G : g = r (5) line-R : g = r -0.4 (6) line-B : g = -r + 0.6 (7) line-up : g = 0.4 line-c : (g -0.33)2 + (r ± 0.33)2 = 0.0004 (9) Figure 5. Skin locus proposed by [19]. Figure 6 shows the overview of proposed system. It starts with the Viola-Jones face detector [14] to detect face in the image. The Viola-Jones face detector has three important aspects: a) image feature called integral image; b) learning algorithm based-on Adaboost; c) combining classifiers using a cascade scheme. The Viola-Jones face detector is very fast method to detect face with high detection rate. However, the algorithm has the drawback, i.e. it produces the false positive.To reduce the false positive, the eye detector is introduced in the second stage. Researcher apply the eye detector to each face candidate obtained by the face detector. If the eye is detected, then the face candidate is considered as the face, otherwise researcher reject the face candidate. It is noted here that even the Viola-Jones approach is applied to detect the eye, since WKH LPDJH UHJLRQ LV OLPLWHG WR WKH IDFH FDQGLGDWH ¶V region only, the false eye detection (false positive) is avoided.

Methodology
Once face is detected, the simple geometric rule is adopted to localize lip region. The lip region is localized by considering the lower part of the face region defined empirically as follows: After lip region is localized, the next step is the lip detection using color segmentation based on the normalized RGB chromaticity diagram as discussed in the following section. Further, the smile detection based on the openness of the mouth/lip is applied to the lip area to detect smile or non smile expressions.
Similar to the skin color locus, a lip detection method based on the r -g color space is proposed by [3]. They defined the two discriminant functions for extracting lip pixels as The lip region described above is defined empirically, therefore in some cases it might fail to detect the lip as observed from our experiments.
To overcome the drawback, researcher propose a new lip detection method which is the extension of our previous research [19,20]. In [20], researcher utilize the normalized RGB chromaticity diagram for extracting the red color of traffic signs. Since the colors of lips range from dark red to purple [3], researcher might extract the lip color using the similar way. In the method, to extract the lip pixels researcher employ the following rule: If g-r < TR, then assign pixel as lip (16) where TR is a threshold. Rather than defining the threshold TR fixedly, researcher find the TR automatically by developing a new transformed image called Igr using the following equations: +CN L ksrr E :srr H C; F :srr H N;o trr (17) using this transformed image, the optimal threshold TR could be found by employing the 2WVX ¶V WKUHVKROGLQJ PHWKRG > @ From the previous lip detection stage, a blob image (black and white) is obtained as shown in figures 7(b) and (d). In this work, researcher assume that a smile is detected when the mouth is open, i.e. appearing the teeth as shown in figure  7(c). Therefore our smile detection method searches the black area inside the lip region by projecting the blob image into y-coordinate.

Result and Analysis
Our algorithm is implemented using MATLAB. Researcher tested the algorithm using one hundred (100) Table I shows the detection results. From the table, researcher obtain that our proposed method shows the best detection rate, both the lip detection rate and the smile detection rate. From the table, researcher could see that the detection rate of M2 is lower than M1. It is caused by the fact that some of the lip regions obtained by Viola-Jones mouth detector are too small (not covering all the mouth/lip regions). Thus the smile could not be detected properly. In M3 and M4, the lip segmentation method proposed by [3] could not extract the white color of the teeth from the lip color properly. Therefore the smile detection rates of those methods are very low. Figure 9 show some of the detection results, where images in the first column are the original images, images in the second column are the localized lip regions and the bounding boxes of the detected lips, images in the third column are the blob images representing the extracted lips.

Conclusion
The lip and smile detection method based on the normalized RGB chromaticity diagram is presented. The method works efficiently in extracting the lip from the face image. Further, a simple projection technique is employed to detect the smile or non-smile state by considering the openness of the mouth. In future researcher will extend our approach to detect the smile from other features, not only openness PRXWK ¶V IHDWXUHV Dnd also to recognize the other human expressions. Further, the real-time implementation will be conducted.